Efficient Exact Tests in Linear Mixed Models for Longitudinal Microbiome Studies

Persistent Link:
http://hdl.handle.net/10150/612412
Title:
Efficient Exact Tests in Linear Mixed Models for Longitudinal Microbiome Studies
Author:
Zhai, Jing
Issue Date:
2016
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Embargo:
Release after 01-Jan-2017
Abstract:
Microbiome plays an important role in human health. The analysis of association between microbiome and clinical outcome has become an active direction in biostatistics research. Testing the microbiome effect on clinical phenotypes directly using operational taxonomic unit abundance data is a challenging problem due to the high dimensionality, non-normality and phylogenetic structure of the data. Most of the studies only focus on describing the change of microbe population that occur in patients who have the specific clinical condition. Instead, a statistical strategy utilizing distance-based or similarity-based non-parametric testing, in which a distance or similarity measure is defined between any two microbiome samples, is developed to assess association between microbiome composition and outcomes of interest. Despite the improvements, this test is still not easily interpretable and not able to adjust for potential covariates. A novel approach, kernel-based semi-parametric regression framework, is applied in evaluating the association while controlling the covariates. The framework utilizes a kernel function which is a measure of similarity between samples' microbiome compositions and characterizes the relationship between the microbiome and the outcome of interest. This kernel-based regression model, however, cannot be applied in longitudinal studies since it could not model the correlation between the repeated measurements. We proposed microbiome association exact tests (MAETs) in linear mixed model can deal with longitudinal microbiome data. MAETs can test not only the effect of overall microbiome but also the effect from specific cluster of the OTUs while controlling for others by introducing more random effects in the model. The current methods for multiple variance component testing are based on either asymptotic distribution or parametric bootstrap which require large sample size or high computational cost. The exact (R)LRT tests, an computational efficient and powerful testing methodology, was derived by Crainiceanu. Since the exact (R)LRT can only be used in testing one variance component, we proposed an approach that combines the recent development of exact (R)LRT and a strategy for simplifying linear mixed model with multiple variance components to a single case. The Monte Carlo simulation studies present correctly controlled type I error and provided superior power in testing association between microbiome and outcomes in longitudinal studies. Finally, the MAETs were applied to longitudinal pulmonary microbiome datasets to demonstrate that microbiome composition is associated with lung function and immunological outcomes. We also successfully found two interesting genera Prevotella and Veillonella which are associated with forced vital capacity.
Type:
text; Electronic Thesis
Keywords:
Kernel-based regression; Longitudinal study; Microbiome composition; Multiple variance components; Public Health; Exact tests
Degree Name:
M.S.
Degree Level:
masters
Degree Program:
Graduate College; Public Health
Degree Grantor:
University of Arizona
Advisor:
Zhou, Jin

Full metadata record

DC FieldValue Language
dc.language.isoen_USen
dc.titleEfficient Exact Tests in Linear Mixed Models for Longitudinal Microbiome Studiesen_US
dc.creatorZhai, Jingen
dc.contributor.authorZhai, Jingen
dc.date.issued2016-
dc.publisherThe University of Arizona.en
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en
dc.description.releaseRelease after 01-Jan-2017en
dc.description.abstractMicrobiome plays an important role in human health. The analysis of association between microbiome and clinical outcome has become an active direction in biostatistics research. Testing the microbiome effect on clinical phenotypes directly using operational taxonomic unit abundance data is a challenging problem due to the high dimensionality, non-normality and phylogenetic structure of the data. Most of the studies only focus on describing the change of microbe population that occur in patients who have the specific clinical condition. Instead, a statistical strategy utilizing distance-based or similarity-based non-parametric testing, in which a distance or similarity measure is defined between any two microbiome samples, is developed to assess association between microbiome composition and outcomes of interest. Despite the improvements, this test is still not easily interpretable and not able to adjust for potential covariates. A novel approach, kernel-based semi-parametric regression framework, is applied in evaluating the association while controlling the covariates. The framework utilizes a kernel function which is a measure of similarity between samples' microbiome compositions and characterizes the relationship between the microbiome and the outcome of interest. This kernel-based regression model, however, cannot be applied in longitudinal studies since it could not model the correlation between the repeated measurements. We proposed microbiome association exact tests (MAETs) in linear mixed model can deal with longitudinal microbiome data. MAETs can test not only the effect of overall microbiome but also the effect from specific cluster of the OTUs while controlling for others by introducing more random effects in the model. The current methods for multiple variance component testing are based on either asymptotic distribution or parametric bootstrap which require large sample size or high computational cost. The exact (R)LRT tests, an computational efficient and powerful testing methodology, was derived by Crainiceanu. Since the exact (R)LRT can only be used in testing one variance component, we proposed an approach that combines the recent development of exact (R)LRT and a strategy for simplifying linear mixed model with multiple variance components to a single case. The Monte Carlo simulation studies present correctly controlled type I error and provided superior power in testing association between microbiome and outcomes in longitudinal studies. Finally, the MAETs were applied to longitudinal pulmonary microbiome datasets to demonstrate that microbiome composition is associated with lung function and immunological outcomes. We also successfully found two interesting genera Prevotella and Veillonella which are associated with forced vital capacity.en
dc.typetexten
dc.typeElectronic Thesisen
dc.subjectKernel-based regressionen
dc.subjectLongitudinal studyen
dc.subjectMicrobiome compositionen
dc.subjectMultiple variance componentsen
dc.subjectPublic Healthen
dc.subjectExact testsen
thesis.degree.nameM.S.en
thesis.degree.levelmastersen
thesis.degree.disciplineGraduate Collegeen
thesis.degree.disciplinePublic Healthen
thesis.degree.grantorUniversity of Arizonaen
dc.contributor.advisorZhou, Jinen
dc.contributor.committeememberRoe, Deniseen
dc.contributor.committeememberHu, Chengchengen
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.