Persistent Link:
http://hdl.handle.net/10150/289091
Title:
Algorithms for whole genome shotgun sequencing
Author:
Anson, Eric Lance
Issue Date:
2000
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
A monumental achievement in the history of science, the sequencing of the entire human genome, will soon be reached. The Human Genome Project (HGP) has been working toward this goal since 1990 using a two-tiered strategy. Recently it was proposed that using a whole-genome shotgun approach to sequence the genome would be faster and less costly. This thesis expands on that proposal by presenting two algorithms that can be used in whole-genome shotgun sequencing. These algorithms were implemented and tested on simulated data. Essential to this approach is the availability of pairs of short, unique sequence markers at a roughly estimated distance from each other. Determining the sequence of the genome can then be broken into a series of inter-marker assembly problems that determine the sequence between a pair of markers. Unfortunately, marker pairs are not always correct and repeats can greatly confound the assembly. This motivates the first problem of rapidly finding a set of linked contigs, called a scaffold, between a pair of markers that confirms the marker pair and the ability to traverse the region between them. Then an inter-marker assembly algorithm that determines the unique sequence segments between a marker pair is presented. Both algorithms are evaluated with respect to a simulation that can model various types of repeats and for which our only information about the presence of repeats is excessive coverage and the ability to detect their boundaries. Simulation results show that at 10x coverage one can find and assemble the unique sequence between markers more than 99.9% of the time for many of the repeat models. Events in this field have been moving rapidly. Recently a new company called Celera Genomics announced its intention to sequence the human genome before the HGP by using the whole-genome shotgun approach. We end this thesis by briefly discussing Celera's approach, and relating it to the algorithms presented here.
Type:
text; Dissertation-Reproduction (electronic)
Keywords:
Biology, Biostatistics.; Computer Science.
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Computer Science
Degree Grantor:
University of Arizona
Advisor:
Myers, Eugene

Full metadata record

DC FieldValue Language
dc.language.isoen_USen_US
dc.titleAlgorithms for whole genome shotgun sequencingen_US
dc.creatorAnson, Eric Lanceen_US
dc.contributor.authorAnson, Eric Lanceen_US
dc.date.issued2000en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractA monumental achievement in the history of science, the sequencing of the entire human genome, will soon be reached. The Human Genome Project (HGP) has been working toward this goal since 1990 using a two-tiered strategy. Recently it was proposed that using a whole-genome shotgun approach to sequence the genome would be faster and less costly. This thesis expands on that proposal by presenting two algorithms that can be used in whole-genome shotgun sequencing. These algorithms were implemented and tested on simulated data. Essential to this approach is the availability of pairs of short, unique sequence markers at a roughly estimated distance from each other. Determining the sequence of the genome can then be broken into a series of inter-marker assembly problems that determine the sequence between a pair of markers. Unfortunately, marker pairs are not always correct and repeats can greatly confound the assembly. This motivates the first problem of rapidly finding a set of linked contigs, called a scaffold, between a pair of markers that confirms the marker pair and the ability to traverse the region between them. Then an inter-marker assembly algorithm that determines the unique sequence segments between a marker pair is presented. Both algorithms are evaluated with respect to a simulation that can model various types of repeats and for which our only information about the presence of repeats is excessive coverage and the ability to detect their boundaries. Simulation results show that at 10x coverage one can find and assemble the unique sequence between markers more than 99.9% of the time for many of the repeat models. Events in this field have been moving rapidly. Recently a new company called Celera Genomics announced its intention to sequence the human genome before the HGP by using the whole-genome shotgun approach. We end this thesis by briefly discussing Celera's approach, and relating it to the algorithms presented here.en_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
dc.subjectBiology, Biostatistics.en_US
dc.subjectComputer Science.en_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorMyers, Eugeneen_US
dc.identifier.proquest9965855en_US
dc.identifier.bibrecord.b40376515en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.