Thursday, March 7, 2019

MaSuRCA version 3.3.1

Today I am releasing a new version of MaSuRCA assembler, 3.3.1.  This version has no new features, only performance improvements and bugfixes. The release is available from the usual github download page:
https://github.com/alekseyzimin/masurca/releases/tag/v3.3.1

I am currently working on the MaSuRCA 4 version.  This version will replace CABOG assembler with Flye (https://github.com/fenderglass/Flye) for hybrid assembly of Illumina paired end + Oxford Nanopore/PacBio long reads. This will result in significant performance improvements, as Flye takes about 1 day on a 64-core server to assemble error-corrected human 20x data set, and CABOG takes about a week on 300-core cluster to do the same task. 

You can use Fly now to assemble error corrected reads output by MaSuRCA.  To do that you can stop the assembly after the following file has been generated:
mr.41.15.15.0.02.1.fa -- for nanopore assemblies
mr.41.15.17.0.029.1.fa -- for pacbio assemblies

and then run the Flye assembler as follows:

GS=`cat ESTIMATED_GENOME_SIZE.txt` && <flye_path>/bin/flye --nano-corr <mr.41.15.15.0.02.1.fa or mr.41.15.17.0.029.1.fa> -t <number_of-threads> -g $GS -m 2000 -o flye_assembly -i 0