Persistent Link:
http://hdl.handle.net/10150/595812
Title:
A Powerful Correlation Method for Microbial Co-Occurrence Networks
Author:
Ziebell, Sara E.
Issue Date:
2015
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Motivation: Network interpretation using correlations has several known difficulties. Firstly, the data structure has discrete counts with an excess of zeros creating non-normal non-continuous data. Secondly, correlations, often used as similarity measures in network inference, are not causal. Thirdly, there is a masking effect of mutualism on commensalism and competition on amensalism in ecological networks that interfere with interpretation (Faust and Raes, 2012). More explicitly, the symmetric nature of correlations (cor(X,Y)=cor(Y,X)) can mask the affect of the asymmetric ecology relationship (commensalism and amensalism). We aim to solve the third issue which may speed up targeted drug therapies or disease diagnosis based on specific relationships in gut microbiomes. Methods: We apply a non-symmetric correlation method, Gini Correlations which should serve as a better classifier of ecological relationships revealing a fuller picture of microbiomes. First, create simulated correlated and independent Zero-Inflated Negative Binomial data. Second, validate Gini correlations by comparing Gini with Pearson Spearman and Kendall correlations; calculate false positive rate, true positive rate, accuracy, ROC, AUC after applying Benjamini-Hochberg (1995) multiple testing correction. Simulation Result: Gini is consistent and out performs other methods for small sample sizes of 10 and 25 producing consistently low false positive rates across 64+ simulation settings as well as consistently high accuracy rates. When sample size is increased to 50 Gini performs as well as other methods. Real Data Result: For well-defined microbial communities Gini correlations found novel biologically and medically relevant relationships. However, Gini's ability to unmask non-symmetric ecological relationships is yet to be determined.
Type:
text; Electronic Thesis
Keywords:
Correlation; Diabetes; Gini Correlation; Metagenomics; Statistics; Colon Cancer
Degree Name:
M.S.
Degree Level:
masters
Degree Program:
Graduate College; Statistics
Degree Grantor:
University of Arizona
Advisor:
An, Lingling

Full metadata record

DC FieldValue Language
dc.language.isoen_USen
dc.titleA Powerful Correlation Method for Microbial Co-Occurrence Networksen_US
dc.creatorZiebell, Sara E.en
dc.contributor.authorZiebell, Sara E.en
dc.date.issued2015en
dc.publisherThe University of Arizona.en
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en
dc.description.abstractMotivation: Network interpretation using correlations has several known difficulties. Firstly, the data structure has discrete counts with an excess of zeros creating non-normal non-continuous data. Secondly, correlations, often used as similarity measures in network inference, are not causal. Thirdly, there is a masking effect of mutualism on commensalism and competition on amensalism in ecological networks that interfere with interpretation (Faust and Raes, 2012). More explicitly, the symmetric nature of correlations (cor(X,Y)=cor(Y,X)) can mask the affect of the asymmetric ecology relationship (commensalism and amensalism). We aim to solve the third issue which may speed up targeted drug therapies or disease diagnosis based on specific relationships in gut microbiomes. Methods: We apply a non-symmetric correlation method, Gini Correlations which should serve as a better classifier of ecological relationships revealing a fuller picture of microbiomes. First, create simulated correlated and independent Zero-Inflated Negative Binomial data. Second, validate Gini correlations by comparing Gini with Pearson Spearman and Kendall correlations; calculate false positive rate, true positive rate, accuracy, ROC, AUC after applying Benjamini-Hochberg (1995) multiple testing correction. Simulation Result: Gini is consistent and out performs other methods for small sample sizes of 10 and 25 producing consistently low false positive rates across 64+ simulation settings as well as consistently high accuracy rates. When sample size is increased to 50 Gini performs as well as other methods. Real Data Result: For well-defined microbial communities Gini correlations found novel biologically and medically relevant relationships. However, Gini's ability to unmask non-symmetric ecological relationships is yet to be determined.en
dc.typetexten
dc.typeElectronic Thesisen
dc.subjectCorrelationen
dc.subjectDiabetesen
dc.subjectGini Correlationen
dc.subjectMetagenomicsen
dc.subjectStatisticsen
dc.subjectColon Canceren
thesis.degree.nameM.S.en
thesis.degree.levelmastersen
thesis.degree.disciplineGraduate Collegeen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorUniversity of Arizonaen
dc.contributor.advisorAn, Linglingen
dc.contributor.committeememberAn, Linglingen
dc.contributor.committeememberNiu, Yueen
dc.contributor.committeememberWatkins, Josephen
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.