Persistent Link:
http://hdl.handle.net/10150/145314
Title:
W7 MODEL OF PROVENANCE AND ITS USE IN THE CONTEXT OF WIKIPEDIA
Author:
Liu, Jun
Issue Date:
2011
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Data provenance refers to the lineage or pedigree of data, including information such as its origin and key events that affect it over the course of its lifecycle. In recent years, provenance has become increasingly important as more and more people are using data that they themselves did not generate. Tracking data provenance helps ensure that data provided by many different providers and sources can be trusted and used appropriately. Data provenance also has several other critical uses, including data quality assessment, generating data replication recipes, data security management, etc.One of the major objectives of our research is to investigate the semantics or meaning of data provenance. We describe a generic ontology of data provenance called the W7 model that represents the semantics of data provenance. Formalized in the conceptual graph formalism, the W7 model represents provenance as a combination of seven interconnected elements including "what," "when," "where," "how," "who," "which" and "why." The W7 model is designed to be general and comprehensive enough to cover a broad range of provenance-related vocabularies. However, the W7 model alone, no matter how comprehensive it is, is insufficient for capturing all domain-specific provenance requirements. We hence present a novel approach to developing domain ontologies of provenance. This approach relies on various conceptual graph mechanisms, including schema definitions and canonical formation rules, and enables us to easily adapt and extend the W7 model to develop domain ontologies of provenance. The W7 model for data provenance has been widely adopted and adapted for use within Raytheon Missile Systems and the iPlant Collaborative, as well as the US Army's ATRAP IV (Asymmetric Threat Response and Analysis Program) system.We also developed a domain ontology of provenance for Wikipedia based on the W7 model. This domain ontology enables us to extract provenance for each Wikipedia article. We present a study in which we use their provenance to assess the quality of Wikipedia articles. Assessing and guaranteeing data quality has become a critical concern that, to a large extent, determines the future success and survival of Wikipedia since the quality of Wikipedia has been continuously called into question due to various incidents of vandalism and misinformation since its launch in 2001. Our study shows that the quality of Wikipedia articles depends not only on the different types of contributors but also on how they collaborate. We identify a number of contributor roles based on the provenance. Based on the roles and provenance, our research identifies several collaboration patterns that are preferable or detrimental for data quality, thus providing insights for designing tools and mechanisms to improve Wikipedia article quality.
Type:
Electronic Dissertation; text
Keywords:
Collaboration; Data provenance; Ontology; Wikipedia
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Management Information Systems
Degree Grantor:
University of Arizona
Advisor:
Ram, Sudha

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleW7 MODEL OF PROVENANCE AND ITS USE IN THE CONTEXT OF WIKIPEDIAen_US
dc.creatorLiu, Junen_US
dc.contributor.authorLiu, Junen_US
dc.date.issued2011-
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractData provenance refers to the lineage or pedigree of data, including information such as its origin and key events that affect it over the course of its lifecycle. In recent years, provenance has become increasingly important as more and more people are using data that they themselves did not generate. Tracking data provenance helps ensure that data provided by many different providers and sources can be trusted and used appropriately. Data provenance also has several other critical uses, including data quality assessment, generating data replication recipes, data security management, etc.One of the major objectives of our research is to investigate the semantics or meaning of data provenance. We describe a generic ontology of data provenance called the W7 model that represents the semantics of data provenance. Formalized in the conceptual graph formalism, the W7 model represents provenance as a combination of seven interconnected elements including "what," "when," "where," "how," "who," "which" and "why." The W7 model is designed to be general and comprehensive enough to cover a broad range of provenance-related vocabularies. However, the W7 model alone, no matter how comprehensive it is, is insufficient for capturing all domain-specific provenance requirements. We hence present a novel approach to developing domain ontologies of provenance. This approach relies on various conceptual graph mechanisms, including schema definitions and canonical formation rules, and enables us to easily adapt and extend the W7 model to develop domain ontologies of provenance. The W7 model for data provenance has been widely adopted and adapted for use within Raytheon Missile Systems and the iPlant Collaborative, as well as the US Army's ATRAP IV (Asymmetric Threat Response and Analysis Program) system.We also developed a domain ontology of provenance for Wikipedia based on the W7 model. This domain ontology enables us to extract provenance for each Wikipedia article. We present a study in which we use their provenance to assess the quality of Wikipedia articles. Assessing and guaranteeing data quality has become a critical concern that, to a large extent, determines the future success and survival of Wikipedia since the quality of Wikipedia has been continuously called into question due to various incidents of vandalism and misinformation since its launch in 2001. Our study shows that the quality of Wikipedia articles depends not only on the different types of contributors but also on how they collaborate. We identify a number of contributor roles based on the provenance. Based on the roles and provenance, our research identifies several collaboration patterns that are preferable or detrimental for data quality, thus providing insights for designing tools and mechanisms to improve Wikipedia article quality.en_US
dc.typeElectronic Dissertationen_US
dc.typetexten_US
dc.subjectCollaborationen_US
dc.subjectData provenanceen_US
dc.subjectOntologyen_US
dc.subjectWikipediaen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineManagement Information Systemsen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorRam, Sudhaen_US
dc.contributor.committeememberGoes, Pauloen_US
dc.contributor.committeememberDurcikova, Alexandraen_US
dc.identifier.proquest11468-
dc.identifier.oclc752261336-
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.