Scaffold filling, contig fusion and comparative gene order inference

Persistent Link:
http://hdl.handle.net/10150/610198
Title:
Scaffold filling, contig fusion and comparative gene order inference
Author:
Munoz, Adriana; Zheng, Chunfang; Zhu, Qian; Albert, Victor; Rounsley, Steve; Sankoff, David
Affiliation:
School of Information Technology and Engineering, University of Ottawa, Ottawa, K1N 6N5, Canada; Département d'informatique et de recherche opérationnelle, Université de Montréal, Montréal, H3C 3J7, Canada; Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Department of Biological Sciences, University at Buffalo, Buffalo, NY 14260, USA; School of Plant Sciences and BIO5 Institute, University of Arizona, Tucson, AZ 85719, USA; Department of Mathematics and Statistics, University of Ottawa, Ottawa, K1N 6N5, Canada
Issue Date:
2010
Publisher:
BioMed Central
Citation:
Muñoz et al. BMC Bioinformatics 2010, 11:304 http://www.biomedcentral.com/1471-2105/11/304
Journal:
BMC Bioinformatics
Rights:
© 2010 Muñoz et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0)
Collection Information:
This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at repository@u.library.arizona.edu.
Abstract:
BACKGROUND:There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes?RESULTS:Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera.CONCLUSIONS:The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.
EISSN:
1471-2105
DOI:
10.1186/1471-2105-11-304
Version:
Final published version
Additional Links:
http://www.biomedcentral.com/1471-2105/11/304

Full metadata record

DC FieldValue Language
dc.contributor.authorMunoz, Adrianaen
dc.contributor.authorZheng, Chunfangen
dc.contributor.authorZhu, Qianen
dc.contributor.authorAlbert, Victoren
dc.contributor.authorRounsley, Steveen
dc.contributor.authorSankoff, Daviden
dc.date.accessioned2016-05-20T09:00:52Z-
dc.date.available2016-05-20T09:00:52Z-
dc.date.issued2010en
dc.identifier.citationMuñoz et al. BMC Bioinformatics 2010, 11:304 http://www.biomedcentral.com/1471-2105/11/304en
dc.identifier.doi10.1186/1471-2105-11-304en
dc.identifier.urihttp://hdl.handle.net/10150/610198-
dc.description.abstractBACKGROUND:There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes?RESULTS:Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera.CONCLUSIONS:The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.en
dc.language.isoenen
dc.publisherBioMed Centralen
dc.relation.urlhttp://www.biomedcentral.com/1471-2105/11/304en
dc.rights© 2010 Muñoz et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0)en
dc.titleScaffold filling, contig fusion and comparative gene order inferenceen
dc.typeArticleen
dc.identifier.eissn1471-2105en
dc.contributor.departmentSchool of Information Technology and Engineering, University of Ottawa, Ottawa, K1N 6N5, Canadaen
dc.contributor.departmentDépartement d'informatique et de recherche opérationnelle, Université de Montréal, Montréal, H3C 3J7, Canadaen
dc.contributor.departmentDepartment of Computer Science, Princeton University, Princeton, NJ 08544, USAen
dc.contributor.departmentDepartment of Biological Sciences, University at Buffalo, Buffalo, NY 14260, USAen
dc.contributor.departmentSchool of Plant Sciences and BIO5 Institute, University of Arizona, Tucson, AZ 85719, USAen
dc.contributor.departmentDepartment of Mathematics and Statistics, University of Ottawa, Ottawa, K1N 6N5, Canadaen
dc.identifier.journalBMC Bioinformaticsen
dc.description.collectioninformationThis item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at repository@u.library.arizona.edu.en
dc.eprint.versionFinal published versionen
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.