Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony

Persistent Link:
http://hdl.handle.net/10150/301751
Title:
Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
Author:
Shi, Tao
Issue Date:
2013
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Embargo:
Release after 02-Feb-2014
Abstract:
Genome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene tree parsimony (GTP) methods, which seek the species tree that minimizes the cost of gene duplication, have been successfully applied to gene families with frequent duplication history. Their utility and performance in the context of gene families with complex histories of gene duplication and domain reshuffling remains unclear. In this study, we analyzed 4389 gene families from six angiosperm genomes encompassing a wide range of duplication rates, and a broad diversity of domain architecture. Overall species tree inference accuracy increased monotonically with the inclusion of more gene trees, and high accuracy was achieved with 50-100 gene trees. The rate of gene duplication strongly influences species tree inference accuracy, with the highest accuracy at either very low or very high rates of duplication and lowest accuracy centered around one duplication per branch in the unrooted species tree. This is the opposite of the relationship between substitution rates on tree construction accuracy, in which intermediate rates have highest accuracy. Accuracy is generally higher in gene families with high domain architecture diversity but has high variance in families with relatively low domain architecture diversity. The latter is probably due to the high variation of gene duplication number for those gene families. We close with some discussion of potential impacts of domain evolution on phylogenomic reconstruction protocols in general, including its effect on alignment.
Type:
text; Electronic Thesis
Keywords:
Gene duplication; Gene tree; Species tree; Ecology & Evolutionary Biology; Domain architecture
Degree Name:
M.S.
Degree Level:
masters
Degree Program:
Graduate College; Ecology & Evolutionary Biology
Degree Grantor:
University of Arizona
Advisor:
Sanderson, Michael J.

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleImpact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimonyen_US
dc.creatorShi, Taoen_US
dc.contributor.authorShi, Taoen_US
dc.date.issued2013-
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.releaseRelease after 02-Feb-2014en_US
dc.description.abstractGenome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene tree parsimony (GTP) methods, which seek the species tree that minimizes the cost of gene duplication, have been successfully applied to gene families with frequent duplication history. Their utility and performance in the context of gene families with complex histories of gene duplication and domain reshuffling remains unclear. In this study, we analyzed 4389 gene families from six angiosperm genomes encompassing a wide range of duplication rates, and a broad diversity of domain architecture. Overall species tree inference accuracy increased monotonically with the inclusion of more gene trees, and high accuracy was achieved with 50-100 gene trees. The rate of gene duplication strongly influences species tree inference accuracy, with the highest accuracy at either very low or very high rates of duplication and lowest accuracy centered around one duplication per branch in the unrooted species tree. This is the opposite of the relationship between substitution rates on tree construction accuracy, in which intermediate rates have highest accuracy. Accuracy is generally higher in gene families with high domain architecture diversity but has high variance in families with relatively low domain architecture diversity. The latter is probably due to the high variation of gene duplication number for those gene families. We close with some discussion of potential impacts of domain evolution on phylogenomic reconstruction protocols in general, including its effect on alignment.en_US
dc.typetexten_US
dc.typeElectronic Thesisen_US
dc.subjectGene duplicationen_US
dc.subjectGene treeen_US
dc.subjectSpecies treeen_US
dc.subjectEcology & Evolutionary Biologyen_US
dc.subjectDomain architectureen_US
thesis.degree.nameM.S.en_US
thesis.degree.levelmastersen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineEcology & Evolutionary Biologyen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorSanderson, Michael J.en_US
dc.contributor.committeememberSanderson, Michael J.en_US
dc.contributor.committeememberTax, Fransen_US
dc.contributor.committeememberWorobey, Michaelen_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.