Modeling semantic coherence from corpus data: the fact and the frequency of a co-occurrence
Author
Pekar, ViktorAffiliation
Bashkir State UniversityIssue Date
2001
Metadata
Show full item recordJournal
Coyote PapersDescription
Published as Coyote Papers: Working Papers in Linguistics, Language in Cognitive ScienceAdditional Links
https://coyotepapers.sbs.arizona.edu/Abstract
The paper presents a preliminary evaluation of a corpus-based representation of individual words and a method to generalize over these representations. The vector space is represented in a way that gives weight to the fact that words co-occur rather than to the frequency of their co-occurrence. This format is hypothesized to allow for reducing the vector space, minimizing negative effects of data sparseness and enhancing ability of the model to generalize words to novel contexts. The model is assessed by comparing computer-calculated probabilities of different verb-argument combinations with human subjects' judgements about appropriateness of these combinations. The results indicate that there is a correlation between the probabilities calculated by the model and the subjects' evaluations.Type
textArticle