Persistent Link:
http://hdl.handle.net/10150/194840
Title:
Optimal Alignment of Multiple Sequence Alignments
Author:
Starrett, Dean
Issue Date:
2008
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
An essential tool in biology is the alignment of multiple sequences. Biologists use multiple sequence alignments for tasks such as predicting protein structure and function, reconstructing phylogenetic trees, and finding motifs. Constructing high-quality multiple alignments is computationally hard, both in theory and in practice, and is typically done using heuristic methods. The majority of state-of-the-art multiple alignment programs employ a form and polish strategy, where in the construction phase, an initial multiple alignment is formed by progressively merging smaller alignments, starting with single sequences. Then in a local-search phase, the resulting alignment is polished by repeatedly splitting it into smaller alignments and re-merging. This merging of alignments, the basic computational problem in the construction and local-search phases of the best multiple alignment heuristics, is called the Aligning Alignments Problem. Under the sum-of-pairs objective for scoring multiple alignments, this problem may seem to be a simple extension of two-sequence alignment. It is proven here, however, that with affine gap costs (which are recognized as necessary to get biologically-informative alignments) the problem is NP-complete when gaps are counted exactly. Interestingly, this form of multiple alignment is polynomial-time solvable when we relax the exact count, showing that exact gap counts themselves are inherently hard in multiple sequence alignment. Unlike general multiple alignment however, we show that Aligning Alignments with affine gap costs and exact counts is tractable in practice, by demonstrating an effective algorithm and a fast implementation. Our software AlignAlign is both time- and space-efficient on biological data. Computational experiments on biological data show instances derived from standard benchmark suites can be optimally aligned with surprising efficiency, and experiments on simulated data show the time and space both scale well.
Type:
text; Electronic Dissertation
Keywords:
Algorithms; Algorithm analysis; Sequence Analysis; Combinatorial Optimization; Computational Complexity; Computational Biology
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Computer Science; Graduate College
Degree Grantor:
University of Arizona
Advisor:
Kececioglu, John D.
Committee Chair:
Kececioglu, John D.

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleOptimal Alignment of Multiple Sequence Alignmentsen_US
dc.creatorStarrett, Deanen_US
dc.contributor.authorStarrett, Deanen_US
dc.date.issued2008en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractAn essential tool in biology is the alignment of multiple sequences. Biologists use multiple sequence alignments for tasks such as predicting protein structure and function, reconstructing phylogenetic trees, and finding motifs. Constructing high-quality multiple alignments is computationally hard, both in theory and in practice, and is typically done using heuristic methods. The majority of state-of-the-art multiple alignment programs employ a form and polish strategy, where in the construction phase, an initial multiple alignment is formed by progressively merging smaller alignments, starting with single sequences. Then in a local-search phase, the resulting alignment is polished by repeatedly splitting it into smaller alignments and re-merging. This merging of alignments, the basic computational problem in the construction and local-search phases of the best multiple alignment heuristics, is called the Aligning Alignments Problem. Under the sum-of-pairs objective for scoring multiple alignments, this problem may seem to be a simple extension of two-sequence alignment. It is proven here, however, that with affine gap costs (which are recognized as necessary to get biologically-informative alignments) the problem is NP-complete when gaps are counted exactly. Interestingly, this form of multiple alignment is polynomial-time solvable when we relax the exact count, showing that exact gap counts themselves are inherently hard in multiple sequence alignment. Unlike general multiple alignment however, we show that Aligning Alignments with affine gap costs and exact counts is tractable in practice, by demonstrating an effective algorithm and a fast implementation. Our software AlignAlign is both time- and space-efficient on biological data. Computational experiments on biological data show instances derived from standard benchmark suites can be optimally aligned with surprising efficiency, and experiments on simulated data show the time and space both scale well.en_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
dc.subjectAlgorithmsen_US
dc.subjectAlgorithm analysisen_US
dc.subjectSequence Analysisen_US
dc.subjectCombinatorial Optimizationen_US
dc.subjectComputational Complexityen_US
dc.subjectComputational Biologyen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorKececioglu, John D.en_US
dc.contributor.chairKececioglu, John D.en_US
dc.contributor.committeememberEfrat, Alonen_US
dc.contributor.committeememberAndrews, Gregory R.en_US
dc.contributor.committeememberDowney, Peter J.en_US
dc.identifier.proquest2905en_US
dc.identifier.oclc659749554en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.