Thursday, May 4, 2017

New version MaSuRCA release 3.2.2

Today I posted a new version that I can finally call a "release" version 3.2.2.  You can get it here ftp://ftp.genome.umd.edu/pub/MaSuRCA/latest/MaSuRCA-3.2.2.tar.gz .  This version has numerous speed and stability improvements based on feedback from the users and my own experience running assembly experiments.

Here are the major changes, in addition to many bugfixes based on the user feedback:

1. based on the user feedback, disabled stones and Extend Clear Ranges steps in CABOG, because they have been failing and/or causing long run times without significant benefit on assemblies with long read PacBio/MinION data.

2. introduced speedups in creation of the mega-reads a long read that yields a single mega-read in pass 1 is not re-processed in pass 2.  This reduced the mega-read correction time by about 25%.

3. introduced contained reads filter before CABOG;  by nature of the mega-reads creation from PacBio/MinION reads, many mega-reads end up being exactly contained in other mega-reads and thus they do not contribute any new information in the assembly.  I found an efficient way to remove these exact containees, reducing the coverage and the number of reads that CABOG has to deal with.
 This improved the run time of CABOG by about factor of 4. Now 120Mb plant genome with 30x PacBio coverage and 100x Illumina coverage assembles in about 8 hours on a single 48-core AMD Opteron server; 12Mb yeast data with about the same level of coverage assembles in under 1 hour on the same server.

4. reworked the way assembly failures are reported making the error messages more informative

5. fixed bugs in SOAPDenovo2 assembly module thanks to the input from Rubang Luo, author of SOAPDenovo2.

6. removed dependency on "parallel" that has been causing problems for many users

Thanks to all users who reported their failures and successes to me.  Your feedback is extremely valuable, it helps me make MaSuRCA more stable and easier to use!

2 comments:

  1. Hi. This looks like really promising software for hybrid assembly, so thanks for all the work you have put into it. I recently tried running the software for a fungal genome, with 8x nanopore coverage and 140X Illumina HiSeq coverage (100nt reads). I was using 8 processors and I have 30Gigs of RAM available. However, when the assembly was running (after splitting unitigs), it failed. I found the following in the logs:

    ----------------------------------------END CONCURRENT Sun May 14 06:54:29 2017 (31711 seconds)
    ----------------------------------------START Sun May 14 06:54:30 2017
    find -L /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim-overlap \( -name \*ovb.gz -or -name \*ovb \) -print > /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.list
    ----------------------------------------END Sun May 14 06:54:30 2017 (0 seconds)
    ----------------------------------------START Sun May 14 06:54:30 2017
    /home/ali/MaSuRCA-3.2.2/CA8/Linux-amd64/bin/overlapStoreBuild -obt -o /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.BUILDING -g /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/genome.gkpStore -M 65536 -L /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.list > /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.err 2>&1
    ----------------------------------------END Sun May 14 08:36:58 2017 (6148 seconds)
    ERROR: Failed with signal ABRT (6)
    ================================================================================

    runCA failed.


    ----------------------------------------
    Stack trace:

    at /home/ali/MaSuRCA-3.2.2/bin/../CA8/Linux-amd64/bin/runCA line 1592, line 2297.
    main::caFailure('failed to build the obt store', '/home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/ge...') called at /home/ali/MaSuRCA-3.2.2/bin/../CA8/Linux-amd64/bin/runCA line 4060
    main::overlapTrim() called at /home/ali/MaSuRCA-3.2.2/bin/../CA8/Linux-amd64/bin/runCA line 6501

    ----------------------------------------
    Last few lines of the relevant log file (/home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim/genome.obtStore.err):

    bucketizing /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim-overlap/002/001217.ovb.gz
    bucketizing /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim-overlap/002/001802.ovb.gz
    bucketizing /home/ali/MaSuRCA-3.2.2/CA.mr.41.15.17.0.029/0-overlaptrim-overlap/002/001583.ovb.gz
    bucketizing DONE!
    overlaps skipped:
    2400581126 OBT - low quality
    0 DUP - non-duplicate overlap
    0 DUP - different library
    0 DUP - dedup not requested
    terminate called after throwing an instance of 'std::bad_alloc'
    what(): std::bad_alloc

    Failed with 'Aborted'


    What would you advise me to do? Thanks!
    -Ali

    ReplyDelete
  2. You will need more memory, at least 64Gb. If you cannot get that, try setting ovlStoreMemory=16384 under CA_PARAMETERS in config file and re-running.

    ReplyDelete