Persistent Link:
http://hdl.handle.net/10150/566996
Title:
Statistical Discovery of Biomarkers in Metagenomics
Author:
Abdul Wahab, Ahmad Hakeem
Issue Date:
2015
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Embargo:
Release after 29-Jan-2016
Abstract:
Metagenomics holds unyielding potential in uncovering relationships within microbial communities that have yet to be discovered, particularly because the field circumvents the need to isolate and culture microbes from their natural environmental settings. A common research objective is to detect biomarkers, microbes are associated with changes in a status. For instance, determining such microbes across conditions such as healthy and diseased groups for instance allows researchers to identify pathogens and probiotics. This is often achieved via analysis of differential abundance of microbes. The problem is that differential abundance analysis looks at each microbe individually without considering the possible associations the microbes may have with each other. This is not favorable, since microbes rarely act individually but within intricate communities involving other microbes. An alternative would be variable selection techniques such as Lasso or Elastic Net which considers all the microbes simultaneously and conducts selection. However, Lasso often selects only a representative feature of a correlated cluster of features and the Elastic Net may incorrectly select unimportant features too frequently and erratically due to high levels of sparsity and variation in the data.\par In this research paper, the proposed method AdaLassop is an augmented variable selection technique that overcomes the misgivings of Lasso and Elastic Net. It provides researchers with a holistic model that takes into account the effects of selected biomarkers in presence of other important biomarkers. For AdaLassop, variable selection on sparse ultra-high dimensional data is implemented using the Adaptive Lasso with p-values extracted from Zero Inflated Negative Binomial Regressions as augmented weights. Comprehensive simulations involving varying correlation structures indicate that AdaLassop has optimal performance in the presence multicollinearity. This is especially apparent as sample size grows. Application of Adalassop on a Metagenome-wide study of diabetic patients reveals both pathogens and probiotics that have been researched in the medical field.
Type:
text; Electronic Thesis
Keywords:
Adaptive Lasso; Biomarker; Metagenomics; Variable Selection; Statistics; Adaptive Elastic Net
Degree Name:
M.S.
Degree Level:
masters
Degree Program:
Graduate College; Statistics
Degree Grantor:
University of Arizona
Advisor:
An, Lingling

Full metadata record

DC FieldValue Language
dc.language.isoen_USen
dc.titleStatistical Discovery of Biomarkers in Metagenomicsen_US
dc.creatorAbdul Wahab, Ahmad Hakeemen
dc.contributor.authorAbdul Wahab, Ahmad Hakeemen
dc.date.issued2015en
dc.publisherThe University of Arizona.en
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en
dc.description.releaseRelease after 29-Jan-2016en
dc.description.abstractMetagenomics holds unyielding potential in uncovering relationships within microbial communities that have yet to be discovered, particularly because the field circumvents the need to isolate and culture microbes from their natural environmental settings. A common research objective is to detect biomarkers, microbes are associated with changes in a status. For instance, determining such microbes across conditions such as healthy and diseased groups for instance allows researchers to identify pathogens and probiotics. This is often achieved via analysis of differential abundance of microbes. The problem is that differential abundance analysis looks at each microbe individually without considering the possible associations the microbes may have with each other. This is not favorable, since microbes rarely act individually but within intricate communities involving other microbes. An alternative would be variable selection techniques such as Lasso or Elastic Net which considers all the microbes simultaneously and conducts selection. However, Lasso often selects only a representative feature of a correlated cluster of features and the Elastic Net may incorrectly select unimportant features too frequently and erratically due to high levels of sparsity and variation in the data.\par In this research paper, the proposed method AdaLassop is an augmented variable selection technique that overcomes the misgivings of Lasso and Elastic Net. It provides researchers with a holistic model that takes into account the effects of selected biomarkers in presence of other important biomarkers. For AdaLassop, variable selection on sparse ultra-high dimensional data is implemented using the Adaptive Lasso with p-values extracted from Zero Inflated Negative Binomial Regressions as augmented weights. Comprehensive simulations involving varying correlation structures indicate that AdaLassop has optimal performance in the presence multicollinearity. This is especially apparent as sample size grows. Application of Adalassop on a Metagenome-wide study of diabetic patients reveals both pathogens and probiotics that have been researched in the medical field.en
dc.typetexten
dc.typeElectronic Thesisen
dc.subjectAdaptive Lassoen
dc.subjectBiomarkeren
dc.subjectMetagenomicsen
dc.subjectVariable Selectionen
dc.subjectStatisticsen
dc.subjectAdaptive Elastic Neten
thesis.degree.nameM.S.en
thesis.degree.levelmastersen
thesis.degree.disciplineGraduate Collegeen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorUniversity of Arizonaen
dc.contributor.advisorAn, Linglingen
dc.contributor.committeememberHao, Ningen
dc.contributor.committeememberHurwitz, Bonnieen
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.