Phylogenomics with incomplete taxon coverage: the limits to inference

Persistent Link:
http://hdl.handle.net/10150/610378
Title:
Phylogenomics with incomplete taxon coverage: the limits to inference
Author:
Sanderson, Michael; McMahon, Michelle; Steel, Mike
Affiliation:
Department of Ecology and Evolutionary Biology, University of Arizona, Tucson AZ 85721 USA; Department of Plant Sciences, University of Arizona, Tucson AZ 85721 USA; Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
Issue Date:
2010
Publisher:
BioMed Central
Citation:
Sanderson et al. BMC Evolutionary Biology 2010, 10:155 http://www.biomedcentral.com/1471-2148/10/155
Journal:
BMC Evolutionary Biology
Rights:
© 2010 Sanderson et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0)
Collection Information:
This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at repository@u.library.arizona.edu.
Abstract:
BACKGROUND:Phylogenomic studies based on multi-locus sequence data sets are usually characterized by partial taxon coverage, in which sequences for some loci are missing for some taxa. The impact of missing data has been widely studied in phylogenetics, but it has proven difficult to distinguish effects due to error in tree reconstruction from effects due to missing data per se. We approach this problem using a explicitly phylogenomic criterion of success, decisiveness, which refers to whether the pattern of taxon coverage allows for uniquely defining a single tree for all taxa.RESULTS:We establish theoretical bounds on the impact of missing data on decisiveness. Results are derived for two contexts: a fixed taxon coverage pattern, such as that observed from an already assembled data set, and a randomly generated pattern derived from a process of sampling new data, such as might be observed in an ongoing comparative genomics sequencing project. Lower bounds on how many loci are needed for decisiveness are derived for the former case, and both lower and upper bounds for the latter. When data are not decisive for all trees, we estimate the probability of decisiveness and the chances that a given edge in the tree will be distinguishable. Theoretical results are illustrated using several empirical examples constructed by mining sequence databases, genomic libraries such as ESTs and BACs, and complete genome sequences.CONCLUSION:Partial taxon coverage among loci can limit phylogenomic inference by making it impossible to distinguish among multiple alternative trees. However, even though lack of decisiveness is typical of many sparse phylogenomic data sets, it is often still possible to distinguish a large fraction of edges in the tree.
EISSN:
1471-2148
DOI:
10.1186/1471-2148-10-155
Version:
Final published version
Additional Links:
http://www.biomedcentral.com/1471-2148/10/155

Full metadata record

DC FieldValue Language
dc.contributor.authorSanderson, Michaelen
dc.contributor.authorMcMahon, Michelleen
dc.contributor.authorSteel, Mikeen
dc.date.accessioned2016-05-20T09:05:28Z-
dc.date.available2016-05-20T09:05:28Z-
dc.date.issued2010en
dc.identifier.citationSanderson et al. BMC Evolutionary Biology 2010, 10:155 http://www.biomedcentral.com/1471-2148/10/155en
dc.identifier.doi10.1186/1471-2148-10-155en
dc.identifier.urihttp://hdl.handle.net/10150/610378-
dc.description.abstractBACKGROUND:Phylogenomic studies based on multi-locus sequence data sets are usually characterized by partial taxon coverage, in which sequences for some loci are missing for some taxa. The impact of missing data has been widely studied in phylogenetics, but it has proven difficult to distinguish effects due to error in tree reconstruction from effects due to missing data per se. We approach this problem using a explicitly phylogenomic criterion of success, decisiveness, which refers to whether the pattern of taxon coverage allows for uniquely defining a single tree for all taxa.RESULTS:We establish theoretical bounds on the impact of missing data on decisiveness. Results are derived for two contexts: a fixed taxon coverage pattern, such as that observed from an already assembled data set, and a randomly generated pattern derived from a process of sampling new data, such as might be observed in an ongoing comparative genomics sequencing project. Lower bounds on how many loci are needed for decisiveness are derived for the former case, and both lower and upper bounds for the latter. When data are not decisive for all trees, we estimate the probability of decisiveness and the chances that a given edge in the tree will be distinguishable. Theoretical results are illustrated using several empirical examples constructed by mining sequence databases, genomic libraries such as ESTs and BACs, and complete genome sequences.CONCLUSION:Partial taxon coverage among loci can limit phylogenomic inference by making it impossible to distinguish among multiple alternative trees. However, even though lack of decisiveness is typical of many sparse phylogenomic data sets, it is often still possible to distinguish a large fraction of edges in the tree.en
dc.language.isoenen
dc.publisherBioMed Centralen
dc.relation.urlhttp://www.biomedcentral.com/1471-2148/10/155en
dc.rights© 2010 Sanderson et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0)en
dc.titlePhylogenomics with incomplete taxon coverage: the limits to inferenceen
dc.typeArticleen
dc.identifier.eissn1471-2148en
dc.contributor.departmentDepartment of Ecology and Evolutionary Biology, University of Arizona, Tucson AZ 85721 USAen
dc.contributor.departmentDepartment of Plant Sciences, University of Arizona, Tucson AZ 85721 USAen
dc.contributor.departmentBiomathematics Research Centre, University of Canterbury, Christchurch, New Zealanden
dc.identifier.journalBMC Evolutionary Biologyen
dc.description.collectioninformationThis item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at repository@u.library.arizona.edu.en
dc.eprint.versionFinal published versionen
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.