Novel Computational and Statistical Approaches in Metagenomic Studies

Persistent Link:
http://hdl.handle.net/10150/556866
Title:
Novel Computational and Statistical Approaches in Metagenomic Studies
Author:
Sohn, Michael B.
Issue Date:
2015
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Metagenomics has a great potential to discover previously unattainable information about microbial communities. The simplest, but extremely powerful approach for studying the characteristics of a microbial community is the analysis of differential abundance, which tries to identify differentially abundant features (e.g. species or genes) across different communities. For instance, detection of differentially abundant microbes across healthy and diseased groups can enable us to identify potential pathogens or probiotics. However, the analysis of differential abundance could mislead us about the characteristics of microbial communities if the counts or abundance of features on different scales are not properly normalized within and between communities. An important prerequisite for the analysis of differential abundance is to accurately estimate the composition of microbial communities, which is commonly known as the analysis of taxonomic composition. Most of prevalent approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree. In this study, two novel methods are developed: one for the analysis of taxonomic composition, called Taxonomic Analysis by Elimination and Correction (TAEC) and the other for the analysis of differential abundance, called Ratio Approach for Identifying Differential Abundance (RAIDA). TAEC utilizes the alignment similarity between known genomes in addition to the similarity between query sequences and sequences of known genomes. It is comprehensively tested on various simulated datasets of diverse complexity of bacterial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in the abundance of bacteria in a given microbial sample. RAIDA utilizes an invariant property of the ratio between the abundance of features, that is, a ratio between the relative abundance of two features is the same as a ratio between the absolute abundance of two features. Through comprehensive simulation studies the performance of RAIDA is consistently powerful and under some situations it greatly surpasses other existing methods for the analysis of differential abundance in metagenomic studies.
Type:
text; Electronic Dissertation
Keywords:
Statistics
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Statistics
Degree Grantor:
University of Arizona
Advisor:
An, Lingling

Full metadata record

DC FieldValue Language
dc.language.isoen_USen
dc.titleNovel Computational and Statistical Approaches in Metagenomic Studiesen_US
dc.creatorSohn, Michael B.en
dc.contributor.authorSohn, Michael B.en
dc.date.issued2015en
dc.publisherThe University of Arizona.en
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en
dc.description.abstractMetagenomics has a great potential to discover previously unattainable information about microbial communities. The simplest, but extremely powerful approach for studying the characteristics of a microbial community is the analysis of differential abundance, which tries to identify differentially abundant features (e.g. species or genes) across different communities. For instance, detection of differentially abundant microbes across healthy and diseased groups can enable us to identify potential pathogens or probiotics. However, the analysis of differential abundance could mislead us about the characteristics of microbial communities if the counts or abundance of features on different scales are not properly normalized within and between communities. An important prerequisite for the analysis of differential abundance is to accurately estimate the composition of microbial communities, which is commonly known as the analysis of taxonomic composition. Most of prevalent approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree. In this study, two novel methods are developed: one for the analysis of taxonomic composition, called Taxonomic Analysis by Elimination and Correction (TAEC) and the other for the analysis of differential abundance, called Ratio Approach for Identifying Differential Abundance (RAIDA). TAEC utilizes the alignment similarity between known genomes in addition to the similarity between query sequences and sequences of known genomes. It is comprehensively tested on various simulated datasets of diverse complexity of bacterial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in the abundance of bacteria in a given microbial sample. RAIDA utilizes an invariant property of the ratio between the abundance of features, that is, a ratio between the relative abundance of two features is the same as a ratio between the absolute abundance of two features. Through comprehensive simulation studies the performance of RAIDA is consistently powerful and under some situations it greatly surpasses other existing methods for the analysis of differential abundance in metagenomic studies.en
dc.typetexten
dc.typeElectronic Dissertationen
dc.subjectStatisticsen
thesis.degree.namePh.D.en
thesis.degree.leveldoctoralen
thesis.degree.disciplineGraduate Collegeen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorUniversity of Arizonaen
dc.contributor.advisorAn, Linglingen
dc.contributor.committeememberAn, Linglingen
dc.contributor.committeememberWatkins, Joseph C.en
dc.contributor.committeememberZhang, Hao Helenen
dc.contributor.committeememberBillheimer, Deanen
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.