The University of Arizona Campus Repository
>
UA Graduate and Undergraduate Research
>
UA Theses and Dissertations
>
Dissertations
>

# Exact and approximation algorithms for DNA sequence reconstruction.

- Persistent Link:
- http://hdl.handle.net/10150/185673
- Title:
- Exact and approximation algorithms for DNA sequence reconstruction.
- Author:
- Issue Date:
- 1991
- Publisher:
- Rights:
- Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
- Abstract:
- The DNA sequence in every human being is a text of three billion characters from a four letter alphabet; determining this sequence is a major project in molecular biology. The fundamental task biologists face is to reconstruct a long sequence given short fragments from unknown locations. These fragments contain errors, and may represent the sequence on one strand of the double-helix, or the reverse complement sequence on the other strand. The Sequence Reconstruction Problem is, given a collection F of fragment sequences and an error rate 0 ≤ ε < 1, find a shortest sequence S such that every fragment F ∈ F, or its reverse complement, matches a substring of S with at most ε|F| errors. Sequence Reconstruction is NP-complete. We decompose the problem into (1) constructing a graph of approximate overlaps between pairs of fragments, (2) selecting a set of overlaps of maximum total weight that induce a consistent layout of the fragments, (3) merging the overlaps into a multiple sequence alignment and voting on a consensus. A solution to (1) through (3) yields a reconstructed sequence feasible at error rate 2ε/(1-ε) and at most a factor 1/1-ε longer than the shortest reconstruction, given some assumptions on fragment error. We define a measure of the overlap in a reconstruction, show that maximizing the overlap minimizes the length, and that approximating (2) within a factor of α approximates Sequence Reconstruction within a factor of (1- ε)α under the overlap measure. We construct the overlap graph for (1) in O(εN²) time given fragments of total length N at error rate ε. We develop two exact and two approximation algorithms for (2). Our best exact algorithm computes an optimal layout for a graph of E overlaps and V fragments in O(K(E + V log V)) time, where K ≤ 2ᴱ is the size of the branch-and-bound search tree. Our best approximation algorithm computes a layout with overlap at least 1/2 the maximum in O(V(E + V log V)log V) time. This is the first treatment of Sequence Reconstruction with inexact data and unknown complementarity.
- Type:
- text; Dissertation-Reproduction (electronic)
- Keywords:
- Degree Name:
- Ph.D.
- Degree Level:
- doctoral
- Degree Program:
- Degree Grantor:
- University of Arizona
- Advisor:

# Full metadata record

DC Field | Value | Language |
---|---|---|

dc.language.iso | en | en_US |

dc.title | Exact and approximation algorithms for DNA sequence reconstruction. | en_US |

dc.creator | Kececioglu, John Dimitri. | en_US |

dc.contributor.author | Kececioglu, John Dimitri. | en_US |

dc.date.issued | 1991 | en_US |

dc.publisher | The University of Arizona. | en_US |

dc.rights | Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. | en_US |

dc.description.abstract | The DNA sequence in every human being is a text of three billion characters from a four letter alphabet; determining this sequence is a major project in molecular biology. The fundamental task biologists face is to reconstruct a long sequence given short fragments from unknown locations. These fragments contain errors, and may represent the sequence on one strand of the double-helix, or the reverse complement sequence on the other strand. The Sequence Reconstruction Problem is, given a collection F of fragment sequences and an error rate 0 ≤ ε < 1, find a shortest sequence S such that every fragment F ∈ F, or its reverse complement, matches a substring of S with at most ε|F| errors. Sequence Reconstruction is NP-complete. We decompose the problem into (1) constructing a graph of approximate overlaps between pairs of fragments, (2) selecting a set of overlaps of maximum total weight that induce a consistent layout of the fragments, (3) merging the overlaps into a multiple sequence alignment and voting on a consensus. A solution to (1) through (3) yields a reconstructed sequence feasible at error rate 2ε/(1-ε) and at most a factor 1/1-ε longer than the shortest reconstruction, given some assumptions on fragment error. We define a measure of the overlap in a reconstruction, show that maximizing the overlap minimizes the length, and that approximating (2) within a factor of α approximates Sequence Reconstruction within a factor of (1- ε)α under the overlap measure. We construct the overlap graph for (1) in O(εN²) time given fragments of total length N at error rate ε. We develop two exact and two approximation algorithms for (2). Our best exact algorithm computes an optimal layout for a graph of E overlaps and V fragments in O(K(E + V log V)) time, where K ≤ 2ᴱ is the size of the branch-and-bound search tree. Our best approximation algorithm computes a layout with overlap at least 1/2 the maximum in O(V(E + V log V)log V) time. This is the first treatment of Sequence Reconstruction with inexact data and unknown complementarity. | en_US |

dc.type | text | en_US |

dc.type | Dissertation-Reproduction (electronic) | en_US |

dc.subject | DNA -- Computer programs. | en_US |

thesis.degree.name | Ph.D. | en_US |

thesis.degree.level | doctoral | en_US |

thesis.degree.discipline | Computer Science | en_US |

thesis.degree.discipline | Graduate College | en_US |

thesis.degree.grantor | University of Arizona | en_US |

dc.contributor.advisor | Myers, Eugene W. | en_US |

dc.contributor.committeemember | Downey, Peter | en_US |

dc.contributor.committeemember | Manber, Udi | en_US |

dc.identifier.proquest | 9210281 | en_US |

dc.identifier.oclc | 704395410 | en_US |

All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.