Combining schema and instance information for integrating heterogeneous databases: An analytical approach and empirical evaluation

Persistent Link:
http://hdl.handle.net/10150/280014
Title:
Combining schema and instance information for integrating heterogeneous databases: An analytical approach and empirical evaluation
Author:
Zhao, Huimin
Issue Date:
2002
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Critical to semantic integration of heterogeneous data sources, determining the semantic correspondences among the data sources is a very complex and resource-consuming task and demands automated support. In this dissertation, we propose a comprehensive approach to detecting both schema-level and instance-level semantic correspondences from heterogeneous data sources. Semantic correspondences on the two levels are identified alternately and incrementally in an iterative procedure. Statistical cluster analysis methods and the Self-Organizing Map (SOM) neural network method are used first to identify similar schema elements (i.e., relations and attributes). Based on the identified schema-level correspondences, classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks are then used to identify matching tuples. Multiple classifiers are combined in various ways, such as bagging, boosting, concatenating, and stacking, to improve classification accuracy. Statistical analysis techniques, such as correlation and regression, are then applied to a preliminary integrated data set to evaluate the relationships among schema elements more accurately. Improved schema-level correspondences are fed back into the identification of instance-level correspondences, resulting in a loop in the overall procedure. Empirical evaluation using real-world and simulated data that has been performed is described to demonstrate the utility of the proposed multi-level, multi-technique approach to detecting semantic correspondences from heterogeneous data sources.
Type:
text; Dissertation-Reproduction (electronic)
Keywords:
Business Administration, Management.; Information Science.
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Business Administration
Degree Grantor:
University of Arizona
Advisor:
Ram, Sudha

Full metadata record

DC FieldValue Language
dc.language.isoen_USen_US
dc.titleCombining schema and instance information for integrating heterogeneous databases: An analytical approach and empirical evaluationen_US
dc.creatorZhao, Huiminen_US
dc.contributor.authorZhao, Huiminen_US
dc.date.issued2002en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractCritical to semantic integration of heterogeneous data sources, determining the semantic correspondences among the data sources is a very complex and resource-consuming task and demands automated support. In this dissertation, we propose a comprehensive approach to detecting both schema-level and instance-level semantic correspondences from heterogeneous data sources. Semantic correspondences on the two levels are identified alternately and incrementally in an iterative procedure. Statistical cluster analysis methods and the Self-Organizing Map (SOM) neural network method are used first to identify similar schema elements (i.e., relations and attributes). Based on the identified schema-level correspondences, classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks are then used to identify matching tuples. Multiple classifiers are combined in various ways, such as bagging, boosting, concatenating, and stacking, to improve classification accuracy. Statistical analysis techniques, such as correlation and regression, are then applied to a preliminary integrated data set to evaluate the relationships among schema elements more accurately. Improved schema-level correspondences are fed back into the identification of instance-level correspondences, resulting in a loop in the overall procedure. Empirical evaluation using real-world and simulated data that has been performed is described to demonstrate the utility of the proposed multi-level, multi-technique approach to detecting semantic correspondences from heterogeneous data sources.en_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
dc.subjectBusiness Administration, Management.en_US
dc.subjectInformation Science.en_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineBusiness Administrationen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorRam, Sudhaen_US
dc.identifier.proquest3053879en_US
dc.identifier.bibrecord.b42812471en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.