Opera: Reconstructing optimal genomic scaffolds with high-throughput paired-end sequences

Song Gao, Wing Kin Sung, Niranjan Nagarajan

    Research output: Contribution to journalArticlepeer-review

    147 Citations (Scopus)

    Abstract

    Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no guarantees on the quality of the solution. In this work, we explored the feasibility of an exact solution for scaffolding and present a first tractable solution for this problem (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes (Availability: http://sourceforge.net/projects/operasf/).

    Original languageEnglish
    Pages (from-to)1681-1691
    Number of pages11
    JournalJournal of Computational Biology
    Volume18
    Issue number11
    DOIs
    Publication statusPublished or Issued - 1 Nov 2011

    Keywords

    • genome assembly
    • parametric complexity
    • quadratic programming
    • scaffolding

    ASJC Scopus subject areas

    • Modelling and Simulation
    • Molecular Biology
    • Genetics
    • Computational Mathematics
    • Computational Theory and Mathematics

    Cite this