Data allocation and query optimization in large scale distributed databases

Persistent Link:
http://hdl.handle.net/10150/282189
Title:
Data allocation and query optimization in large scale distributed databases
Author:
Zhou, Zehai, 1962-
Issue Date:
1996
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Distributed database technology is expected to have a significant impact on data processing in the upcoming years because distributed database systems have many potential advantages over centralized systems for geographically distributed organizations. Data allocation and query optimization are two of the most important aspects of distributed database design. Data allocation involves placing a database and the applications that run against it in the multiple sites of a network. It is a very complex problem consisting of two processes: data fragmentation and fragment allocation. Data fragmentation involves the partitioning of each relation into a group of fragment relations while fragment allocation deals with the distribution of these fragmented relations across the sites of the distributed system. Query optimization includes designing algorithms that analyze and convert queries into a set of data manipulation operations. Both the data allocation and query optimization problems are NP-hard in nature and notoriously difficult to solve. We have attempted to combine the two highly interrelated and interactive decision processes in data allocation by formulating them as integer programs taking into consideration different constraints and under various assumptions. Various solution methods are discussed and a new linearization method is investigated. We next analyze the query optimization problem and reduce it to a join ordering problem. Several heuristics and a genetic algorithm have been developed for solving the join ordering problem. Some computational experiments on these algorithms were conducted and solution qualities compared. The computation experiments show that the suggested linearization method performs clearly and consistently better than a currently widely used method and that heuristics and genetic algorithms are viable methods for solving query optimization problem. It is anticipated that the models and solution methods developed in this study for data allocation and query optimization in distributed database systems may be of practical as well as theoretical use. Nevertheless, much more needs to be done to solve the distributed database design problems in order to achieve its potential benefits. Our models and solution methods can be the starting point for eventual resolution of these complex problems in large scale distributed database systems.
Type:
text; Dissertation-Reproduction (electronic)
Keywords:
Business Administration, Management.; Computer Science.
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Industrial Management
Degree Grantor:
University of Arizona
Advisor:
Sheng, Olivia R. Liu

Full metadata record

DC FieldValue Language
dc.language.isoen_USen_US
dc.titleData allocation and query optimization in large scale distributed databasesen_US
dc.creatorZhou, Zehai, 1962-en_US
dc.contributor.authorZhou, Zehai, 1962-en_US
dc.date.issued1996en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractDistributed database technology is expected to have a significant impact on data processing in the upcoming years because distributed database systems have many potential advantages over centralized systems for geographically distributed organizations. Data allocation and query optimization are two of the most important aspects of distributed database design. Data allocation involves placing a database and the applications that run against it in the multiple sites of a network. It is a very complex problem consisting of two processes: data fragmentation and fragment allocation. Data fragmentation involves the partitioning of each relation into a group of fragment relations while fragment allocation deals with the distribution of these fragmented relations across the sites of the distributed system. Query optimization includes designing algorithms that analyze and convert queries into a set of data manipulation operations. Both the data allocation and query optimization problems are NP-hard in nature and notoriously difficult to solve. We have attempted to combine the two highly interrelated and interactive decision processes in data allocation by formulating them as integer programs taking into consideration different constraints and under various assumptions. Various solution methods are discussed and a new linearization method is investigated. We next analyze the query optimization problem and reduce it to a join ordering problem. Several heuristics and a genetic algorithm have been developed for solving the join ordering problem. Some computational experiments on these algorithms were conducted and solution qualities compared. The computation experiments show that the suggested linearization method performs clearly and consistently better than a currently widely used method and that heuristics and genetic algorithms are viable methods for solving query optimization problem. It is anticipated that the models and solution methods developed in this study for data allocation and query optimization in distributed database systems may be of practical as well as theoretical use. Nevertheless, much more needs to be done to solve the distributed database design problems in order to achieve its potential benefits. Our models and solution methods can be the starting point for eventual resolution of these complex problems in large scale distributed database systems.en_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
dc.subjectBusiness Administration, Management.en_US
dc.subjectComputer Science.en_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineIndustrial Managementen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorSheng, Olivia R. Liuen_US
dc.identifier.proquest9713439en_US
dc.identifier.bibrecord.b34452898en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.