Persistent Link:
http://hdl.handle.net/10150/106030
Title:
Adapting Web Archive Catalogues for Dynamic Change
Author:
Wu, Paul H-J; Ichsan, Tamsir P.; Nguyen, Ngoc Giang
Editors:
Julien, Masanes; Andreas, Rauber
Citation:
Adapting Web Archive Catalogues for Dynamic Change 2007,
Issue Date:
2007
URI:
http://hdl.handle.net/10150/106030
Submitted date:
2008-05-06
Abstract:
Web archives are an important source of information. However, before a Web archive can be properly utilized, it needs to be catalogued. This is to ensure that the accessed materials yield the historical understanding intended by the researcher. At the same time, the dynamic nature of the Web will easily render these catalogues outdated, and there is a constant need to monitor when the Web catalogues become irrelevant upon change of the Web content. This means a substantial amount of human effort is required to maintain the catalogue records for the Web archives, adding additional burden to any institutions that maintain it. In this paper, we propose an automatic mechanism to monitor changes in Web content, so that human workload can be reduced. The system combines two component technologies to make this possible: (1) a contextualized annotation module and (2) an evidence change detection module. Contextualized annotation enables the cataloguing process to link content on the Web page (the evidence), to the value assigned for an element of a metadata schema. Thus, the metadata is â supportedâ by certain Web content that functions as evidence for a cataloguing decision. Regardless of changes in the webpages outside of the evidence, the metadata remains valid as long as all the evidence remains the same. In order to achieve evidence-specific change detection, we need to extend the traditional Longest Common Subsequence (LCS) based Diff engine using a Page Coordinate translation algorithm, which we argue, through a survey, is the first among many other Web content monitoring approaches.
Type:
Conference Paper
Language:
en
Keywords:
World Wide Web; Information Science; Archives; Knowledge Organization
Local subject classification:
Web archives; Evidence-based cataloguing; Change detection; Web curation

Full metadata record

DC FieldValue Language
dc.contributor.authorWu, Paul H-Jen_US
dc.contributor.authorIchsan, Tamsir P.en_US
dc.contributor.authorNguyen, Ngoc Giangen_US
dc.contributor.editorJulien, Masanesen_US
dc.contributor.editorAndreas, Rauberen_US
dc.date.accessioned2008-05-06T00:00:01Z-
dc.date.available2010-06-18T23:38:28Z-
dc.date.issued2007en_US
dc.date.submitted2008-05-06en_US
dc.identifier.citationAdapting Web Archive Catalogues for Dynamic Change 2007,en_US
dc.identifier.urihttp://hdl.handle.net/10150/106030-
dc.description.abstractWeb archives are an important source of information. However, before a Web archive can be properly utilized, it needs to be catalogued. This is to ensure that the accessed materials yield the historical understanding intended by the researcher. At the same time, the dynamic nature of the Web will easily render these catalogues outdated, and there is a constant need to monitor when the Web catalogues become irrelevant upon change of the Web content. This means a substantial amount of human effort is required to maintain the catalogue records for the Web archives, adding additional burden to any institutions that maintain it. In this paper, we propose an automatic mechanism to monitor changes in Web content, so that human workload can be reduced. The system combines two component technologies to make this possible: (1) a contextualized annotation module and (2) an evidence change detection module. Contextualized annotation enables the cataloguing process to link content on the Web page (the evidence), to the value assigned for an element of a metadata schema. Thus, the metadata is â supportedâ by certain Web content that functions as evidence for a cataloguing decision. Regardless of changes in the webpages outside of the evidence, the metadata remains valid as long as all the evidence remains the same. In order to achieve evidence-specific change detection, we need to extend the traditional Longest Common Subsequence (LCS) based Diff engine using a Page Coordinate translation algorithm, which we argue, through a survey, is the first among many other Web content monitoring approaches.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoenen_US
dc.subjectWorld Wide Weben_US
dc.subjectInformation Scienceen_US
dc.subjectArchivesen_US
dc.subjectKnowledge Organizationen_US
dc.subject.otherWeb archivesen_US
dc.subject.otherEvidence-based cataloguingen_US
dc.subject.otherChange detectionen_US
dc.subject.otherWeb curationen_US
dc.titleAdapting Web Archive Catalogues for Dynamic Changeen_US
dc.typeConference Paperen_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.