Multi-Allele Population Genomics for Inference of Demography and Natural Selection

Persistent Link:
http://hdl.handle.net/10150/622993
Title:
Multi-Allele Population Genomics for Inference of Demography and Natural Selection
Author:
Ragsdale, Aaron
Issue Date:
2016
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
The demographic and evolutionary history of a population leaves an identifiable signature on patterns of genetic variation, so we can learn about demography and natural selection through inference on contemporary polymorphism data. The distribution of sample allele frequencies, known as the allele frequency spectrum (AFS), is an informative statistic that has been used to infer single- and multi-population demographic histories and distributions of fitness effects of new mutations. AFS-based methods typically rely on the infinite sites model, in which loci are assumed to evolve independently and mutations always arise at a previously unmutated site. However, many loci are seen to violate these assumptions. Most obviously, loci occupy a physical space on the genome, and neighboring mutations will have correlated allele frequencies. Additionally some SNPs are found to be multi-allelic, with more than two alleles simultaneously segregating. The assumptions of the infinite sites model forces one to ignore or exclude such loci, but these loci are rich in information not captured by standard AFS approaches. With this in mind, I developed a numerical approach for solving a class of multi-allelic diffusion equations that allow for novel inferences on genomic sequence data. First, I considered selection at triallelic nonsynonymous data to infer the correlation of fitness effects for same-site mutations. I then explored the increase in power afforded to demographic inferences by two-locus allele frequency statistics, in which two biallelic loci are separated by a known recombination distance so the joint distribution of allele frequencies and linkage disequilibrium may be modeled by a diffusion approximation. Finally, I considered the same two-locus diffusion model but with selection placed on one of the two loci. This allows for the direct modeling of the effects of linked selection on neutral variants, and for potential inference applications such as the parameters of a selective sweep or the distribution of fitness effects.
Type:
text; Electronic Dissertation
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Applied Mathematics
Degree Grantor:
University of Arizona
Advisor:
Gutenkunst, Ryan

Full metadata record

DC FieldValue Language
dc.language.isoen_USen
dc.titleMulti-Allele Population Genomics for Inference of Demography and Natural Selectionen_US
dc.creatorRagsdale, Aaronen
dc.contributor.authorRagsdale, Aaronen
dc.date.issued2016-
dc.publisherThe University of Arizona.en
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en
dc.description.abstractThe demographic and evolutionary history of a population leaves an identifiable signature on patterns of genetic variation, so we can learn about demography and natural selection through inference on contemporary polymorphism data. The distribution of sample allele frequencies, known as the allele frequency spectrum (AFS), is an informative statistic that has been used to infer single- and multi-population demographic histories and distributions of fitness effects of new mutations. AFS-based methods typically rely on the infinite sites model, in which loci are assumed to evolve independently and mutations always arise at a previously unmutated site. However, many loci are seen to violate these assumptions. Most obviously, loci occupy a physical space on the genome, and neighboring mutations will have correlated allele frequencies. Additionally some SNPs are found to be multi-allelic, with more than two alleles simultaneously segregating. The assumptions of the infinite sites model forces one to ignore or exclude such loci, but these loci are rich in information not captured by standard AFS approaches. With this in mind, I developed a numerical approach for solving a class of multi-allelic diffusion equations that allow for novel inferences on genomic sequence data. First, I considered selection at triallelic nonsynonymous data to infer the correlation of fitness effects for same-site mutations. I then explored the increase in power afforded to demographic inferences by two-locus allele frequency statistics, in which two biallelic loci are separated by a known recombination distance so the joint distribution of allele frequencies and linkage disequilibrium may be modeled by a diffusion approximation. Finally, I considered the same two-locus diffusion model but with selection placed on one of the two loci. This allows for the direct modeling of the effects of linked selection on neutral variants, and for potential inference applications such as the parameters of a selective sweep or the distribution of fitness effects.en
dc.typetexten
dc.typeElectronic Dissertationen
thesis.degree.namePh.D.en
thesis.degree.leveldoctoralen
thesis.degree.disciplineGraduate Collegeen
thesis.degree.disciplineApplied Mathematicsen
thesis.degree.grantorUniversity of Arizonaen
dc.contributor.advisorGutenkunst, Ryanen
dc.contributor.committeememberGutenkunst, Ryanen
dc.contributor.committeememberBrio, Moyseyen
dc.contributor.committeememberMasel, Joannaen
dc.contributor.committeememberWatkins, Joeen
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.