Wednesday, May 2, 2018

MaSuRCA 3.2.6 official

I have released the official 3.2.6 version of MaSuRCA.  It is available on the ftp site here:, and also on github.

Upgrading is easy, simply remove the 3.2.4 (or older) version and install this one.  Please use this version going forward. Big thanks to all users who reported errors and bugs!  

Please see this post for the list of improvements in 3.2.6 version: 

Thursday, April 19, 2018

Reporting issues with MaSuRCA on github

MaSuRCA is now on github.  Github has an excellent system for reporting bugs/issues with the software.  I encourage all users of MaSuRCA to utilize this resource and report issues here

Also if you are having a problem, please check the github issues page to see if the problem has been addressed already.

Thursday, April 5, 2018

Tree tobacco plant assembled with MaSuRCA

I am glad to see assemblies of the novel genomes that used MaSuRCA published.  Here is a recent data note published in BMC:

This is an assembly of tree tobacco Nicotiana glauca from Illumina-only data (350bp fragment Paired End library and ~4,000bp fragment mate pair library), that yielded N50 contig size of about 31Kbp.  The assembly size (~3.2Gbp) was bigger than the estimated genome size (~2Gbp) which points to relatively high heterozygosity of the plant.

Monday, March 26, 2018

Pre-release version maSuRCA 3.2.6beta

Over the past several weeks I have been working on improving stability of MaSuRCA.  I thank all users who reported problems to me and I have been addressing these problems in the code. The improved pre-release version of MaSuRCA 3.2.6beta is posted here:

This is a maintenance release.  There are no new features from 3.2.4 version, but there are many stability and performance improvements based on the feedback from the users (AGAIN BIG THANKS EVERYONE!!!) and my own use of MaSuRCA with the assemblies that I run.

List of major improvements:

1. occasional failure on overlapcorrection workaround
2. Illumina-only assembly unitig consensus failure workaround
3. running mega-reads on SGE grid improvements in performance and stability
4. cleaned up the code and improved re-starting assemblies with Illumina-only data
5. Updated version of MUMmer4 included
6. Improved compilation and install script on platforms where @ is present in the PWD
7. fixed bugs and improved performance of the assembly polishing code
8. speed and stability improvements to the Oxford Nanopore correction code
9. fixed bug that resulted in gap filling running in endless loop

The complete list of bugfixes and improvements for masurca and its submodules can be found on github

I would like this release to be a stable point before I continue adding new features.  Please let me know in the comments if you have any issues with this release.  I will remove the beta status after 2 weeks of testing and post it as an official release.

Monday, January 22, 2018

MaSuRCA is now on github

MaSuRCA has new home on github at MaSuRCA combines jellyfish, QuORUM, and other modules into one repository. The individual modules are submodules in the repository. The master branch of the masurca repository tracks the latest working commits. To checkout and compile MaSuRCA do the following:

git clone

git submodule init

git submodule update


MaSuRCA will compile under build/inst/bin/

To create a distribution, run make install. This will create MaSuRCA-3.2.4.tar.gz distributable tarball.

EDIT: to compile MaSuRCA from development tree, you will need the following dependencies:
swig and yaggo ( and Both must be available on the path.

Please post all questions and bug reports under "issues" in github:

Friday, January 12, 2018

New MaSuRCA version 3.2.4

I have just finished testing a new release of MaSuRCA version 3.2.4. The major improvement in this version is ability to run the hybrid assembly (Illumina+PacBio/Oxford Nanopore data) on a grid.  At this point only SGE is supported, and I am working on SLURM support which will be implemented shortly. Other improvements include:

1. gzippped fasta/fastq input files of PacBio/Oxford Nanopore reads supported
2. general speed and accuracy improvements
3. minor bugfixes based on user feedback

The new version is designed in such a way to allow mammalian genome assembly on a grid of computers with 128Gb of RAM.

The new release is available here

I am now updating the MaSuRCA manual to reflect the new options for grid execution, and I will upload it later today.

Thursday, November 2, 2017

News article about Bread wheat genome assembled with MaSuRCA and Falcon

According to this article in Nature news, we scooped the IWGSC (International Wheat Genome Sequencing Consortium) to publication the Bread wheat assembly, the most complete and contiguous to date.

Wednesday, October 25, 2017

Note on ploidy estimation in MaSuRCA

Since the 3.2.2 version, MaSuRCA uses modified algorithms and settings for assembly of heterozygous diploid/polyploid genomes. Therefore there is a ploidy setting that is auto-computed and saved in PLOIDY.txt.  Valid values for PLOIDY are 1 and 2. Editing this file will result in forcing the assembler to use ploidy as indicated.

Ploidy 1 means haploid and ploidy 2 means diploid. This is a gross over-simplification that is used I the assembler for the time being.

Ploidy for non-clonal genomes is always 2,  but for the internal algorithms ploidy 1 means that the genome is relatively inbred and ploidy 2 means that it is relatively outbred. The reasoning is that in most genomes there is a proportion of the sequence that is conserved between the two haplotypes, and then there is proportion of sequence that is divergent.  I treat ploidy as measurement of ratio of the total amount of unique sequence in the genome / haploid genome size. This is a number between 1 and 2.  1 means no divergence ( the homologous chromosomes are identical) and 2 means two haplotypes are 100% different.  At this time I do not treat this as a floating parameter between 1 and 2,  but instead I set a threshold in the middle based on heuristical computation.  This is an over-simplification and I will introduce a refinement of this parameter in later versions.

Triticum Aestivum (Bread Wheat) assembly paper is out.

Triticum Aestivum (Bread Wheat) assembly paper has just appeared in GigaScience.  100 CPU-years to assemble Bread Wheat 16Gb genome with MaSuRCA and Falcon!!!

Wednesday, September 27, 2017

MaSuRCA hybrid assembly strategy and recent results on Illumina and Oxford Nanopore data video presentation

I have just uploaded to YouTube ( my presentation on the latest MaSuRCA mega-reads results on assembly of Illumina and Oxford Nanopore MinION human genome data.  In this presentation I describe a de novo human genome assembly of NA12878 data set with N50 contig size of over 1Mb from $10000 worth of sequencing data and outline MaSuRCA mega-reads strategy.