Tuesday, December 13, 2016

Comments are welcome

I welcome any comments on the posts.  You can use comments to report bugs on the new versions or ask questions about the new features.  For now all comments are moderated, that is I have to look at every comment and approve it before it appears on the public blog.  This way I can make sure I have looked at all comments.


  1. Dear Aleksey,

    I am testing MaSuRCA. So far I managed to run it using two single fastqs per library. But I do already have many fastqs per each. Is there any way to point to a set of files instead of having to concatenate all of them into a single one?

    Sorry if it sounds naive. I have tried to run it this way and it seems to have problems recognizing the wildcard:

    PE= p4 343 67 /scratch/devel/denovo_assemblies/tests/masurca2/reads/lib400/*.1.fastq.gz /scratch/devel/denovo_assemblies/tests/masurca2/reads/lib400/*.2.fastq.gz
    PE= p7 620 116 /scratch/devel/denovo_assemblies/tests/masurca2/reads/lib700/*.1.fastq.gz /scratch/devel/denovo_assemblies/tests/masurca2/reads/lib700/*.2.fastq.gz
    JUMP= j4 2750 836 /scratch/devel/denovo_assemblies/tests/masurca2/reads/mp4000/*.1.fastq.gz /scratch/devel/denovo_assemblies/tests/masurca2/reads/mp4000/*.2.fastq.gz
    JUMP= j8 6286 1947 /scratch/devel/denovo_assemblies/tests/masurca2/reads/mp8000/*.1.fastq.gz /scratch/devel/denovo_assemblies/tests/masurca2/reads/mp8000/*.2.fastq.gz


    When I include the paragraph above in the config.txt, masurca fails to produce assemble.sh and error says:

    invalid forward file for PE library 'p4': '/scratch/devel/denovo_assemblies/tests/masurca2/reads/lib400/*.1.fastq.gz' Bad file descriptor

    What do you normally do when having hundreds of files per library?

    Thanks in advance,

  2. No, unfortunately wildcards are not supported. You can either generate JUMP or PE entry for each file or concatenate the files.

  3. Hi,

    I was running MaSuRCA v3.2.1 using two PE and two MP libraries. The program stopped after two days while preprocessing the first PE library. I have found this error in the quorum.err file;

    [2017/01/11 01:40:44] Loading mer database
    [2017/01/11 01:44:17] Loading contaminant sequences
    [2017/01/11 01:44:17] Computing Poisson cutoff
    [2017/01/11 01:47:26] distinct mers:20379890205 total mers:174169779288 estimated coverage:8.54616
    [2017/01/11 01:47:26] lambda:0.0284872 collision_prob:0.00333333 poisson_threshold:0.0001
    [2017/01/11 01:47:26] Using cutoff of 4
    [2017/01/11 01:47:26] Correcting reads
    terminate called after throwing an instance of 'std::ios_base::failure'
    what(): basic_ios::clear

    Do you know what could be the reason and how to fix it?

    Just in case is relevant, I will tell you that I was using gcc/4.9.3, perl/5.22.1, boost/1.60.0 together with masurca.

    Thanks in advance,

  4. This error is in the error corrector. Never seen this one before. Can you see if the corrected reads file pe.cor.fa looks reasonable up to that point?

  5. Zimin,

    I see that the v3.2.2 is faster with illumina only data. When I include pacbio it gets very slow at "Running assembly and Recomputing Astats" steps. I understand that MaSuRcA corrects the input pacbio reads. How can it be improved to run faster ?

  6. The v3.2.2 is faster with Illumina only assemblies. I see that it gets very slow when pacbio data is included. It takes a long time with "Running assembly" and "recomputing A-stat for super-reads". How can this be made faster ?

    1. The step "Recomputing A-stat for super-reads" only runs with low-coverage (10x or less PacBio/Nanopore data sets. It runs if the long read coverage after correction is less than 5x. In this case the assembler also uses the super-reads along with the corrected PacBio reads and it has to re-compute coverage statistics. In general, you should expect Illumina+Pacbio assemblies to take longer due to more computations involved.

  7. Hello Alexey,

    I'm trying to use Masurca for a hybrid assemlby of Nanopore and Illumina data. We have about 15-20x of Nanopore so far. Couple of questions:

    - what is the desired coverage of Illumina reads? I was sort of curious about "most bang for your buck" argument.
    - is it OK to combine Illumina reads from different instruments, i.e. of different lengths - some are 2x90 and some are 2x150 bp?

  8. Hi,
    I'm using Masurca to assemble a large genome (3.0Gb) with PE Illumina reads. The genome is also quite repetitive - about 51%.
    The assembly is seemingly just pausing during the step "Computing super reads from PE". It has been at this stage for more than a day and for nearly 24hrs no new files or file changes have been made. All of the processes are in asleep(S mode).
    Do you have any explanation for this?

  9. Never seen this before, can you contact me by email and we can diagnose this.