Thursday, May 4, 2017

New version MaSuRCA release 3.2.2

Today I posted a new version that I can finally call a "release" version 3.2.2.  You can get it here ftp://ftp.genome.umd.edu/pub/MaSuRCA/latest/MaSuRCA-3.2.2.tar.gz .  This version has numerous speed and stability improvements based on feedback from the users and my own experience running assembly experiments.

Here are the major changes, in addition to many bugfixes based on the user feedback:

1. based on the user feedback, disabled stones and Extend Clear Ranges steps in CABOG, because they have been failing and/or causing long run times without significant benefit on assemblies with long read PacBio/MinION data.

2. introduced speedups in creation of the mega-reads a long read that yields a single mega-read in pass 1 is not re-processed in pass 2.  This reduced the mega-read correction time by about 25%.

3. introduced contained reads filter before CABOG;  by nature of the mega-reads creation from PacBio/MinION reads, many mega-reads end up being exactly contained in other mega-reads and thus they do not contribute any new information in the assembly.  I found an efficient way to remove these exact containees, reducing the coverage and the number of reads that CABOG has to deal with.
 This improved the run time of CABOG by about factor of 4. Now 120Mb plant genome with 30x PacBio coverage and 100x Illumina coverage assembles in about 8 hours on a single 48-core AMD Opteron server; 12Mb yeast data with about the same level of coverage assembles in under 1 hour on the same server.

4. reworked the way assembly failures are reported making the error messages more informative

5. fixed bugs in SOAPDenovo2 assembly module thanks to the input from Rubang Luo, author of SOAPDenovo2.

6. removed dependency on "parallel" that has been causing problems for many users

Thanks to all users who reported their failures and successes to me.  Your feedback is extremely valuable, it helps me make MaSuRCA more stable and easier to use!

3 comments:

  1. Hi. This looks like really promising software for hybrid assembly, so thanks for all the work you have put into it. I recently tried running the software for a fungal genome, with 8x nanopore coverage and 140X Illumina HiSeq coverage (100nt reads). I was using 8 processors and I have 30Gigs of RAM available. However, when the assembly was running (after splitting unitigs), it failed. I found the following in the logs:

    ----------------------------------------END CONCURRENT Sun May 14 06:54:29 2017 (31711 seconds)
    ----------------------------------------START Sun May 14 06:54:30 2017
    find -L /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim-overlap \( -name \*ovb.gz -or -name \*ovb \) -print > /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.list
    ----------------------------------------END Sun May 14 06:54:30 2017 (0 seconds)
    ----------------------------------------START Sun May 14 06:54:30 2017
    /home/ali/MaSuRCA-3.2.2/CA8/Linux-amd64/bin/overlapStoreBuild -obt -o /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.BUILDING -g /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/genome.gkpStore -M 65536 -L /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.list > /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.err 2>&1
    ----------------------------------------END Sun May 14 08:36:58 2017 (6148 seconds)
    ERROR: Failed with signal ABRT (6)
    ================================================================================

    runCA failed.


    ----------------------------------------
    Stack trace:

    at /home/ali/MaSuRCA-3.2.2/bin/../CA8/Linux-amd64/bin/runCA line 1592, line 2297.
    main::caFailure('failed to build the obt store', '/home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/ge...') called at /home/ali/MaSuRCA-3.2.2/bin/../CA8/Linux-amd64/bin/runCA line 4060
    main::overlapTrim() called at /home/ali/MaSuRCA-3.2.2/bin/../CA8/Linux-amd64/bin/runCA line 6501

    ----------------------------------------
    Last few lines of the relevant log file (/home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.err):

    bucketizing /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim-overlap/002/001217.ovb.gz
    bucketizing /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim-overlap/002/001802.ovb.gz
    bucketizing /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim-overlap/002/001583.ovb.gz
    bucketizing DONE!
    overlaps skipped:
    2400581126 OBT - low quality
    0 DUP - non-duplicate overlap
    0 DUP - different library
    0 DUP - dedup not requested
    terminate called after throwing an instance of 'std::bad_alloc'
    what(): std::bad_alloc

    Failed with 'Aborted'


    What would you advise me to do? Thanks!
    -Ali

    ReplyDelete
  2. You will need more memory, at least 64Gb. If you cannot get that, try setting ovlStoreMemory=16384 under CA_PARAMETERS in config file and re-running.

    ReplyDelete
  3. Hi,

    I tried to assemble a fish genome with about 50x Illumina PE reads and ~10x Nanopore reads, but the run was met with an error at the overlap correction step.

    Error in CA log:

    Failure message:

    10 overlap correction jobs failed; remove /home/munhua/nemo/3.assemble_masurca/k50_ilmn-nano/CA.mr.41.15.13.0.02/3-overlapcorrection/ovlcorr.sh (or run by hand) to try again


    I followed the suggestion to try again but it still failed. Here's some messages from the error for one of the jobs (e.g. 0007.err file):

    nohup: ignoring input
    Quality Threshold = 1.50%
    Allocating 15488156 words for Edit_Space.
    Starting Read_Frags ()
    Starting Correct_Frags ()
    Starting Read_Olaps ()
    Starting qsort ()
    Starting Redo_Olaps ()
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 651039 b_iid = 17619 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 638519 b_iid = 48118 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 630883 b_iid = 124903 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 626494 b_iid = 127517 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 602869 b_iid = 129399 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 650239 b_iid = 154508 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 618975 b_iid = 228567 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 629416 b_iid = 395971 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 616607 b_iid = 411975 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 619090 b_iid = 450197 errors = 0
    ERROR: Bad alignment ends a_end = 0 b_end = 0
    a_iid = 609360 b_iid = 484177 errors = 0

    Failed with 'Segmentation fault'


    Any idea what the cause and fix for this is?

    Thanks,
    Mun

    ReplyDelete