Persistent Link:
http://hdl.handle.net/10150/185624
Title:
A pattern matching system for biosequences.
Author:
Mehldau, Gerhard.
Issue Date:
1991
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
String pattern matching is an extensively studied area of computer science. Over the past few decades, many important theoretical results have been discovered, and a large number of practical algorithms has been developed for efficiently matching various classes of patterns. A variety of general pattern matching tools and specialized programming languages have been implemented for applications in areas such as lexical analysis, text editing, or database searching. Most recently, the field of molecular biology has been added to the growing list of applications that make use of pattern matching technology. The requirements of biological pattern matching differ from traditional applications in several ways. First, the amount of data to be processed is very large, and hence highly efficient pattern matching tools are required. Second, the data to be searched is obtained from biological experiments, where error rates of up to 5% are not uncommon. In addition, patterns are often averaged from several, biologically similar sequences. Therefore, to be useful, pattern matching tools must be able to accommodate some notion of approximate matching. Third, formal language notations such as regular expressions, which are commonly used in traditional applications, are insufficient for describing many of the patterns that are of interest to biologists. Hence, any conventional notation must be significantly enhanced to accommodate such patterns. Taken together, these differences combine to render most existing pattern matching tools inadequate, and have created a need for specialized pattern matching systems. This dissertation presents a pattern matching system that specifically addresses the three issues outlined above. A notation for defining patterns is developed by extending the regular expression syntax in a consistent way. Using this notation, virtually any pattern of interest to biologists can be expressed in an intuitive and concise manner. The system further incorporates a very flexible notion of approximate pattern matching that unifies most of the previously developed concepts. Last, but not least, the system employs a novel, optimized backtracking algorithm, which enables it to efficiently search even very large databases.
Type:
text; Dissertation-Reproduction (electronic)
Keywords:
Molecular biology; Computer science and scientific computing.
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Computer Science; Graduate College
Degree Grantor:
University of Arizona
Advisor:
Myers, Eugene W.

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleA pattern matching system for biosequences.en_US
dc.creatorMehldau, Gerhard.en_US
dc.contributor.authorMehldau, Gerhard.en_US
dc.date.issued1991en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractString pattern matching is an extensively studied area of computer science. Over the past few decades, many important theoretical results have been discovered, and a large number of practical algorithms has been developed for efficiently matching various classes of patterns. A variety of general pattern matching tools and specialized programming languages have been implemented for applications in areas such as lexical analysis, text editing, or database searching. Most recently, the field of molecular biology has been added to the growing list of applications that make use of pattern matching technology. The requirements of biological pattern matching differ from traditional applications in several ways. First, the amount of data to be processed is very large, and hence highly efficient pattern matching tools are required. Second, the data to be searched is obtained from biological experiments, where error rates of up to 5% are not uncommon. In addition, patterns are often averaged from several, biologically similar sequences. Therefore, to be useful, pattern matching tools must be able to accommodate some notion of approximate matching. Third, formal language notations such as regular expressions, which are commonly used in traditional applications, are insufficient for describing many of the patterns that are of interest to biologists. Hence, any conventional notation must be significantly enhanced to accommodate such patterns. Taken together, these differences combine to render most existing pattern matching tools inadequate, and have created a need for specialized pattern matching systems. This dissertation presents a pattern matching system that specifically addresses the three issues outlined above. A notation for defining patterns is developed by extending the regular expression syntax in a consistent way. Using this notation, virtually any pattern of interest to biologists can be expressed in an intuitive and concise manner. The system further incorporates a very flexible notion of approximate pattern matching that unifies most of the previously developed concepts. Last, but not least, the system employs a novel, optimized backtracking algorithm, which enables it to efficiently search even very large databases.en_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
dc.subjectMolecular biologyen_US
dc.subjectComputer science and scientific computing.en_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorMyers, Eugene W.en_US
dc.contributor.committeememberGriswold, Ralph E.en_US
dc.contributor.committeememberHudson, Scott E.en_US
dc.contributor.committeememberSchowengerdt, Robert A.en_US
dc.contributor.committeememberHunt, Bobby R.en_US
dc.identifier.proquest9208024en_US
dc.identifier.oclc702674768en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.