Persistent Link:
http://hdl.handle.net/10150/105565
Title:
Searching the long tail: Hidden structure in social tagging
Author:
Tonkin, Emma
Editors:
Furner, Jonathan; Tennis, Joseph T.
Citation:
Searching the long tail: Hidden structure in social tagging 2006, 17
Publisher:
dLIST
Issue Date:
2006
URI:
http://hdl.handle.net/10150/105565
Submitted date:
2007-04-16
Abstract:
In this paper we explore a method of decomposition of compound tags found in social tagging systems and outline several results, including improvement of search indexes, extraction of semantic information, and benefits to usability. Analysis of tagging habits demonstrates that social tagging systems such as del.icio.us and flickr include both formal metadata, such as geotags, and informally created metadata, such as annotations and descriptions. The majority of tags represent informal metadata; that is, they are not structured according to a formal model, nor do they correspond to a formal ontology. Statistical exploration of the main tag corpus demonstrates that such searches use only a subset of the available tags; for example, many tags are composed as ad hoc compounds of terms. In order to improve accuracy of searching across the data contained within these tags, a method must be employed to decompose compounds in such a way that there is a high degree of confidence in the result. An approach to decomposition of English-language compounds, designed for use within a small initial sample tagset, is described. Possible decompositions are identified from a generous wordlist, subject to selective lexicon snipping. In order to identify the most likely, a Bayesian classifier is used across term elements. To compensate for the limited sample set, a word classifier is employed and the results classified using a similar method, resulting in a successful classification rate of 88%, and a false negative rate of only 1%.
Type:
Conference Paper
Language:
en
Keywords:
Classification; World Wide Web; Web Metrics; Quantitative Research; Knowledge Structures; Knowledge Organization
Local subject classification:
Social tagging; Automatic classification; Tag analysis

Full metadata record

DC FieldValue Language
dc.contributor.authorTonkin, Emmaen_US
dc.contributor.editorFurner, Jonathanen_US
dc.contributor.editorTennis, Joseph T.en_US
dc.date.accessioned2007-04-16T00:00:01Z-
dc.date.available2010-06-18T23:27:30Z-
dc.date.issued2006en_US
dc.date.submitted2007-04-16en_US
dc.identifier.citationSearching the long tail: Hidden structure in social tagging 2006, 17en_US
dc.identifier.urihttp://hdl.handle.net/10150/105565-
dc.description.abstractIn this paper we explore a method of decomposition of compound tags found in social tagging systems and outline several results, including improvement of search indexes, extraction of semantic information, and benefits to usability. Analysis of tagging habits demonstrates that social tagging systems such as del.icio.us and flickr include both formal metadata, such as geotags, and informally created metadata, such as annotations and descriptions. The majority of tags represent informal metadata; that is, they are not structured according to a formal model, nor do they correspond to a formal ontology. Statistical exploration of the main tag corpus demonstrates that such searches use only a subset of the available tags; for example, many tags are composed as ad hoc compounds of terms. In order to improve accuracy of searching across the data contained within these tags, a method must be employed to decompose compounds in such a way that there is a high degree of confidence in the result. An approach to decomposition of English-language compounds, designed for use within a small initial sample tagset, is described. Possible decompositions are identified from a generous wordlist, subject to selective lexicon snipping. In order to identify the most likely, a Bayesian classifier is used across term elements. To compensate for the limited sample set, a word classifier is employed and the results classified using a similar method, resulting in a successful classification rate of 88%, and a false negative rate of only 1%.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoenen_US
dc.publisherdLISTen_US
dc.subjectClassificationen_US
dc.subjectWorld Wide Weben_US
dc.subjectWeb Metricsen_US
dc.subjectQuantitative Researchen_US
dc.subjectKnowledge Structuresen_US
dc.subjectKnowledge Organizationen_US
dc.subject.otherSocial taggingen_US
dc.subject.otherAutomatic classificationen_US
dc.subject.otherTag analysisen_US
dc.titleSearching the long tail: Hidden structure in social taggingen_US
dc.typeConference Paperen_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.