Friday, September 15, 2017

New version MaSuRCA 3.2.3

I have just finished testing the new version of MaSuRCA, version 3.2.3.  The only new notable added feature in the new version is gap closing for assemblies that use PacBio/Oxford Nanopore data.  The other changes are all improvements related to stability, usability and speed:

1. Added scaffold gap closing for hybrid assemblies that use PacBio/Oxford Nanopore
2. Improved the speed and stability of filter for Illumina mate pairs
3. Ploidy and Estimated genome size for the genome are now saved and can be read from ESTIMATED_GENOME_SIZE.txt and PLOIDY.txt files.
4. run Nucmer multi-threaded when SoapDenovo2 used as contigger/scaffolder, for filtering out redundant small contigs after gap closing.
5. updated MUMmer to the latest version
6. many small performance improvements to avoid re-running steps if they have been run on assembler re-start

The new version is available from my ftp site:


  1. If possible, could you please add documentation to the software package (e.g. readme file)? It's difficult to find the Quick Start Guide, unless you get lucky and peruse the FTP index.

    Also, obtaining the software via the UMD Genome Group request sends a link to the previous version (3.2.2); not the current version (3.2.3).

  2. Ha! And, as Murphy's Law would have it, I see a link to the Quick Start Guide over there ---->

    However, I only ended up at this blog after struggling to get started with the software after downloading/installing and not finding any documentation within the software package itself.

  3. How would one specify multiple sets of Illumina data (PE and/or MP)?

    Do you modify the config file to have multiple lines, like this:

    PE= pe 180 20 /FULL_PATH/frag01_1.fastq /FULL_PATH/frag01_2.fastq
    PE= pe 180 20 /FULL_PATH/frag02_1.fastq /FULL_PATH/frag02_2.fastq

    Or, is it just a single line, like this:
    PE= pe 180 20 /FULL_PATH/frag01_1.fastq /FULL_PATH/frag01_2.fastq /FULL_PATH/frag02_1.fastq /FULL_PATH/frag02_2.fastq

  4. Hi ! sorry for asking questions here, I couldn't find an adequate forum for users ....
    We are doing a hybrid illumina+minion assembly, using masurca 3.2.3. and the assembly just dies in the overlapcorrection stage. some chunks say : ERROR: Bad alignment ends a_end = 0 b_end = 0 and then all hell break loose, we get segfaults, insults, and then it's dead. Would you have any opinion about this ? Thanks

    1. I have never seen this kind of problem. Tru re-running overlaps by deleting genome.ovlStore, 1-* and 3-*

  5. Dear Aleksey,

    I using MaSuRCA 3.2.3.At this moment I am running a illumina only assembly with SOAP_ASSEMBLY=1.
    I have previously done the assembly using PE data only, then I soft linked the previous data into a new directory and launched the assembly adding 2 MP libraries that I named s3 and j3 and j5.
    The assembly consistently fails and the SOAPdenovo.err says at the end:

    Import reads from file:
    Cannot open ../j3.cor.clean.fa. Now exit to system...

    However, there is no file(neither j5). The only file is sj.cor.fa that was created after renaming both library fastqs:

    devel 102G Oct 26 16:11 sj.cor.fa
    -rw-r--r-- 1 fcruz devel 1.2G Oct 26 16:11 sj.cor.log
    -rw-r--r-- 1 fcruz devel 228M Oct 26 14:11 pe.cor.log
    -rw-r--r-- 1 fcruz devel 62G Oct 26 11:59 quorum_mer_db.jf
    -rw-r--r-- 1 fcruz devel 106G Oct 26 11:34 j5.renamed.fastq
    -rw-r--r-- 1 fcruz devel 117G Oct 26 11:20 j3.renamed.fastq

    Is this a bug or a particular problem? Do this version creates a generic sj. file with all libraries in there?

    Thanks in advance,

    sj.cor.clean.fa work1

    1. Softlinks will break the assembly. You can simply re-run the assembly in the existing folder where PE-only assembly has been run, masurca will re-use the appropriate files.

    2. Hi Aleksey,

      As its just a 848 Mb genome, I've launched the assembly from scratch again. I found the same error. It's looking for j3.cor.clean.fa files. Therefore I decided to split sj.cor.clean.fa and .cor.clean.rev.fa into the corresponding j3 and j5 files. Finally, I have relaunched masurca and it finished with just one error at line 199 of j3 j5 < sj.cor.clean2.fa

      I don't think this was a big issue, as i did split the files myself by a simple grep. The only weird things is that assembly length of asm2.scafSeq2 is 1.666 Gb in scaffolds and 891.5 Mb. Is this normal? Is like having half of the sequence contain in gaps?

      I have also noticed in a different project that worked fine using CA instead SOAPdenovo. An also naming the pe libraries with numbers (p3 and p5) and the jumping libraries with two-letter code (sA and sC).

      It seems to me that there is some issue when the jumping libraries are named with numbers.

      Let me know what do you think about this.

      Thanks in advance,

  6. Hi,

    I tried running this new version and it crashed (after 11 days of running). This was on a single HPC node, with 500GB of RAM allocated for the MaSuRCA job.

    Here are some snippets of error logs. Would you happen to have any thoughts on how to avoid this?

    slurm-94620.out (tail)

    compute_psa 6601202 2632582819
    Refining alignments
    Generating assembly input files
    Coverage of the mega-reads less than 5 -- using the super reads as well
    Coverage threshold for splitting unitigs is 138 minimum ovl 63
    Running assembly
    /gscratch/srlab/programs/MaSuRCA-3.2.3/bin/ line 85: 24330 Aborted (core dumped) overlapStoreBuild -o $ASM_DIR/$ASM_PREFIX.ovlStore -M 65536 -g $ASM_DIR/$ASM_PREFIX.gkpStore $ASM_DIR/overlaps_dedup.ovb.gz > $ASM_DIR/overlapStore.rebuild.err 2>&1
    Assembly stopped or failed, see
    [Mon Oct 30 23:19:37 PDT 2017] Assembly stopped or failed, see

    --- (tail)

    number of threads = 28 (OpenMP default)

    ERROR: overlapStore '/gscratch/scrubbed/samwhite/20171019_masurca_oly_assembly/' is incomplete; previous overlapStoreBuild probably crashed.

    Failure message:

    failed to unitig


    Scanning overlap files to count the number of overlaps.
    Found 277.972 million overlaps.
    Memory limit 65536MB supplied. Ill put 3246167525 IIDs (3435.97 million overlaps) into each of 1 buckets.
    bucketizing DONE!
    overlaps skipped:
    0 OBT - low quality
    0 DUP - non-duplicate overlap
    0 DUP - different library
    0 DUP - dedup not requested
    terminate called after throwing an instance of std::bad_alloc
    what(): std::bad_alloc

    Failed with Aborted

    Backtrace (mangled):


    Backtrace (demangled):

    [0] overlapStoreBuild() [0x40523a]
    [1] /usr/lib64/ + 0xf100 [0x2af83b3c0100]
    [2] /usr/lib64/ + 0x37 [0x2af83c0395f7]
    [3] /usr/lib64/ + 0x148 [0x2af83c03ace8]
    [4] /usr/lib64/ + 0x165 [0x2af83b62d9d5]
    [5] /usr/lib64/ + 0x5e946 [0x2af83b62b946]
    [6] /usr/lib64/ + 0x5e973 [0x2af83b62b973]
    [7] /usr/lib64/ + 0x5eb93 [0x2af83b62bb93]
    [8] /usr/lib64/ new(unsigned long) + 0x7d [0x2af83b62c12d]
    [9] /usr/lib64/ new[](unsigned long) + 0x9 [0x2af83b62c1c9]
    [10] overlapStoreBuild() [0x402e10]
    [11] /usr/lib64/ + 0xf5 [0x2af83c025b15]
    [12] overlapStoreBuild() [0x403089]