Persistent Link:
http://hdl.handle.net/10150/186432
Title:
Discrete pattern matching over sequences and interval sets.
Author:
Knight, James Robert.
Issue Date:
1993
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Finding matches, both exact and approximate, between a sequence of symbols A and a pattern P has long been an active area of research in algorithm design. Some of the more well-known byproducts from that research are the diffprogram and grep family of programs. These problems form a sub-domain of a larger area of problems called discrete pattern matching which has been developed recently to characterize the wide range of pattern matching problems. This dissertation presents new algorithms for discrete pattern matching over sequences and develops a new sub-domain of problems called discrete pattern matching over interval sets. The problems and algorithms presented here are characterized by three common features: (1) a "computable scoring function" which defines the quality of matches; (2) a graph based, dynamic programming framework which captures the structure of the algorithmic solutions; and (3) an interdisciplinary aspect to the research, particularly between computer science and molecular biology, not found in other topics in computer science. The first half of the dissertation considers discrete pattern matching over sequences. It develops the alignment-graph/dynamic-programming framework for the algorithms in the sub-domain and then presents several new algorithms for regular expression and extended regular expression pattern matching. The second half of the dissertation develops the sub-domain of discrete pattern matching over interval sets, also called super-pattern matching. In this sub-domain, the input consists of sets of typed intervals, defined over a finite range, and a pattern expression of the interval types. A match between the interval sets and the pattern consists of a sequence of consecutive intervals, taken from the interval sets, such that their corresponding sequence of types matches the pattern. The name super-pattern matching comes from those problems where the interval sets corresponds to the sets of substrings reported by various pattern matching problems over a common input sequence. The pattern for the super-pattern matching problem, then, represents a "pattern of patterns," or super-pattern, and the sequences of intervals matching the super-pattern correspond to the substring of the original sequence which match that larger "pattern."
Type:
text; Dissertation-Reproduction (electronic)
Keywords:
Computer science.
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Computer Science; Graduate College
Degree Grantor:
University of Arizona
Committee Chair:
Myers, Eugene W.

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleDiscrete pattern matching over sequences and interval sets.en_US
dc.creatorKnight, James Robert.en_US
dc.contributor.authorKnight, James Robert.en_US
dc.date.issued1993en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractFinding matches, both exact and approximate, between a sequence of symbols A and a pattern P has long been an active area of research in algorithm design. Some of the more well-known byproducts from that research are the diffprogram and grep family of programs. These problems form a sub-domain of a larger area of problems called discrete pattern matching which has been developed recently to characterize the wide range of pattern matching problems. This dissertation presents new algorithms for discrete pattern matching over sequences and develops a new sub-domain of problems called discrete pattern matching over interval sets. The problems and algorithms presented here are characterized by three common features: (1) a "computable scoring function" which defines the quality of matches; (2) a graph based, dynamic programming framework which captures the structure of the algorithmic solutions; and (3) an interdisciplinary aspect to the research, particularly between computer science and molecular biology, not found in other topics in computer science. The first half of the dissertation considers discrete pattern matching over sequences. It develops the alignment-graph/dynamic-programming framework for the algorithms in the sub-domain and then presents several new algorithms for regular expression and extended regular expression pattern matching. The second half of the dissertation develops the sub-domain of discrete pattern matching over interval sets, also called super-pattern matching. In this sub-domain, the input consists of sets of typed intervals, defined over a finite range, and a pattern expression of the interval types. A match between the interval sets and the pattern consists of a sequence of consecutive intervals, taken from the interval sets, such that their corresponding sequence of types matches the pattern. The name super-pattern matching comes from those problems where the interval sets corresponds to the sets of substrings reported by various pattern matching problems over a common input sequence. The pattern for the super-pattern matching problem, then, represents a "pattern of patterns," or super-pattern, and the sequences of intervals matching the super-pattern correspond to the substring of the original sequence which match that larger "pattern."en_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
dc.subjectComputer science.en_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.chairMyers, Eugene W.en_US
dc.contributor.committeememberDowney, Peter J.en_US
dc.contributor.committeememberKannan, Sampathen_US
dc.identifier.proquest9408506en_US
dc.identifier.oclc702682446en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.