Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level

Persistent Link:
http://hdl.handle.net/10150/618985
Title:
Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level
Author:
Jeng, Xinge Jessie; Daye, Zhongyin John; Lu, Wenbin; Tzeng, Jung-Ying
Affiliation:
Univ Arizona, Epidemiol & Biostat
Issue Date:
2016-06-29
Publisher:
Public Library of Science
Citation:
Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level 2016, 12 (6):e1004993 PLOS Computational Biology
Journal:
PLOS Computational Biology
Rights:
: © 2016 Jeng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Collection Information:
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.
Abstract:
Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.
Note:
Open Access Journal
ISSN:
1553-7358
PubMed ID:
27355347
DOI:
10.1371/journal.pcbi.1004993
Version:
Final published version
Sponsors:
National Institutes of Health [P01 CA142538]
Additional Links:
http://dx.plos.org/10.1371/journal.pcbi.1004993

Full metadata record

DC FieldValue Language
dc.contributor.authorJeng, Xinge Jessieen
dc.contributor.authorDaye, Zhongyin Johnen
dc.contributor.authorLu, Wenbinen
dc.contributor.authorTzeng, Jung-Yingen
dc.date.accessioned2016-08-27T01:08:29Z-
dc.date.available2016-08-27T01:08:29Z-
dc.date.issued2016-06-29-
dc.identifier.citationRare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level 2016, 12 (6):e1004993 PLOS Computational Biologyen
dc.identifier.issn1553-7358-
dc.identifier.pmid27355347-
dc.identifier.doi10.1371/journal.pcbi.1004993-
dc.identifier.urihttp://hdl.handle.net/10150/618985-
dc.description.abstractGenetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.en
dc.description.sponsorshipNational Institutes of Health [P01 CA142538]en
dc.language.isoenen
dc.publisherPublic Library of Scienceen
dc.relation.urlhttp://dx.plos.org/10.1371/journal.pcbi.1004993en
dc.rights: © 2016 Jeng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.en
dc.titleRare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Levelen
dc.typeArticleen
dc.contributor.departmentUniv Arizona, Epidemiol & Biostaten
dc.identifier.journalPLOS Computational Biologyen
dc.description.noteOpen Access Journalen
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.en
dc.eprint.versionFinal published versionen

Related articles on PubMed

All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.