Automatic identification of causal relations in text and their use for improving precision in information retrieval

Persistent Link:
http://hdl.handle.net/10150/105106
Title:
Automatic identification of causal relations in text and their use for improving precision in information retrieval
Author:
Khoo, Christopher S. G.
Citation:
Automatic identification of causal relations in text and their use for improving precision in information retrieval 1995-12,
Issue Date:
Dec-1995
Description:
Parts of the thesis were published in: 1. Khoo, C., Myaeng, S.H., & Oddy, R. (2001). Using cause-effect relations in text to improve information retrieval precision. Information Processing and Management, 37(1), 119-145. 2. Khoo, C., Kornfilt, J., Oddy, R., & Myaeng, S.H. (1998). Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary & Linguistic Computing, 13(4), 177-186. 3. Khoo, C. (1997). The use of relation matching in information retrieval. LIBRES: Library and Information Science Research Electronic Journal [Online], 7(2). Available at: http://aztec.lib.utk.edu/libres/libre7n2/. An update of the literature review on causal relations in text was published in: Khoo, C., Chan, S., & Niu, Y. (2002). The many facets of the cause-effect relation. In R.Green, C.A. Bean & S.H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective (pp. 51-70). Dordrecht: Kluwer
URI:
http://hdl.handle.net/10150/105106
Submitted date:
2006-08-08
Abstract:
This study represents one attempt to make use of relations expressed in text to improve information retrieval effectiveness. In particular, the study investigated whether the information obtained by matching causal relations expressed in documents with the causal relations expressed in users' queries could be used to improve document retrieval results in comparison to using just term matching without considering relations. An automatic method for identifying and extracting cause-effect information in Wall Street Journal text was developed. The method uses linguistic clues to identify causal relations without recourse to knowledge-based inferencing. The method was successful in identifying and extracting about 68% of the causal relations that were clearly expressed within a sentence or between adjacent sentences in Wall Street Journal text. Of the instances that the computer program identified as causal relations, 72% can be considered to be correct. The automatic method was used in an experimental information retrieval system to identify causal relations in a database of full-text Wall Street Journal documents. Causal relation matching was found to yield a small but significant improvement in retrieval results when the weights used for combining the scores from different types of matching were customized for each query -- as in an SDI or routing queries situation. The best results were obtained when causal relation matching was combined with word proximity matching (matching pairs of causally related words in the query with pairs of words that co-occur within document sentences). An analysis using manually identified causal relations indicate that bigger retrieval improvements can be expected with more accurate identification of causal relations. The best kind of causal relation matching was found to be one in which one member of the causal relation (either the cause or the effect) was represented as a wildcard that could match with any term. The study also investigated whether using Roget's International Thesaurus (3rd ed.) to expand query terms with synonymous and related terms would improve retrieval effectiveness. Using Roget category codes in addition to keywords did give better retrieval results. However, the Roget codes were better at identifying the non-relevant documents than the relevant ones. Parts of the thesis were published in: 1. Khoo, C., Myaeng, S.H., & Oddy, R. (2001). Using cause-effect relations in text to improve information retrieval precision. Information Processing and Management, 37(1), 119-145. 2. Khoo, C., Kornfilt, J., Oddy, R., & Myaeng, S.H. (1998). Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary & Linguistic Computing, 13(4), 177-186. 3. Khoo, C. (1997). The use of relation matching in information retrieval. LIBRES: Library and Information Science Research Electronic Journal [Online], 7(2). Available at: http://aztec.lib.utk.edu/libres/libre7n2/. An update of the literature review on causal relations in text was published in: Khoo, C., Chan, S., & Niu, Y. (2002). The many facets of the cause-effect relation. In R.Green, C.A. Bean & S.H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective (pp. 51-70). Dordrecht: Kluwer
Type:
Thesis
Language:
en
Keywords:
Information Extraction; Information Retrieval; Natural Language Processing; Computational Linguistics; Linguistics
Local subject classification:
cause-effect relations; causal verbs; semantic relations; information retrieval; newspaper text; information extraction

Full metadata record

DC FieldValue Language
dc.contributor.authorKhoo, Christopher S. G.en_US
dc.date.accessioned2006-08-08T00:00:01Z-
dc.date.available2010-06-18T23:19:34Z-
dc.date.issued1995-12en_US
dc.date.submitted2006-08-08en_US
dc.identifier.citationAutomatic identification of causal relations in text and their use for improving precision in information retrieval 1995-12,en_US
dc.identifier.urihttp://hdl.handle.net/10150/105106-
dc.descriptionParts of the thesis were published in: 1. Khoo, C., Myaeng, S.H., & Oddy, R. (2001). Using cause-effect relations in text to improve information retrieval precision. Information Processing and Management, 37(1), 119-145. 2. Khoo, C., Kornfilt, J., Oddy, R., & Myaeng, S.H. (1998). Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary & Linguistic Computing, 13(4), 177-186. 3. Khoo, C. (1997). The use of relation matching in information retrieval. LIBRES: Library and Information Science Research Electronic Journal [Online], 7(2). Available at: http://aztec.lib.utk.edu/libres/libre7n2/. An update of the literature review on causal relations in text was published in: Khoo, C., Chan, S., & Niu, Y. (2002). The many facets of the cause-effect relation. In R.Green, C.A. Bean & S.H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective (pp. 51-70). Dordrecht: Kluweren_US
dc.description.abstractThis study represents one attempt to make use of relations expressed in text to improve information retrieval effectiveness. In particular, the study investigated whether the information obtained by matching causal relations expressed in documents with the causal relations expressed in users' queries could be used to improve document retrieval results in comparison to using just term matching without considering relations. An automatic method for identifying and extracting cause-effect information in Wall Street Journal text was developed. The method uses linguistic clues to identify causal relations without recourse to knowledge-based inferencing. The method was successful in identifying and extracting about 68% of the causal relations that were clearly expressed within a sentence or between adjacent sentences in Wall Street Journal text. Of the instances that the computer program identified as causal relations, 72% can be considered to be correct. The automatic method was used in an experimental information retrieval system to identify causal relations in a database of full-text Wall Street Journal documents. Causal relation matching was found to yield a small but significant improvement in retrieval results when the weights used for combining the scores from different types of matching were customized for each query -- as in an SDI or routing queries situation. The best results were obtained when causal relation matching was combined with word proximity matching (matching pairs of causally related words in the query with pairs of words that co-occur within document sentences). An analysis using manually identified causal relations indicate that bigger retrieval improvements can be expected with more accurate identification of causal relations. The best kind of causal relation matching was found to be one in which one member of the causal relation (either the cause or the effect) was represented as a wildcard that could match with any term. The study also investigated whether using Roget's International Thesaurus (3rd ed.) to expand query terms with synonymous and related terms would improve retrieval effectiveness. Using Roget category codes in addition to keywords did give better retrieval results. However, the Roget codes were better at identifying the non-relevant documents than the relevant ones. Parts of the thesis were published in: 1. Khoo, C., Myaeng, S.H., & Oddy, R. (2001). Using cause-effect relations in text to improve information retrieval precision. Information Processing and Management, 37(1), 119-145. 2. Khoo, C., Kornfilt, J., Oddy, R., & Myaeng, S.H. (1998). Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary & Linguistic Computing, 13(4), 177-186. 3. Khoo, C. (1997). The use of relation matching in information retrieval. LIBRES: Library and Information Science Research Electronic Journal [Online], 7(2). Available at: http://aztec.lib.utk.edu/libres/libre7n2/. An update of the literature review on causal relations in text was published in: Khoo, C., Chan, S., & Niu, Y. (2002). The many facets of the cause-effect relation. In R.Green, C.A. Bean & S.H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective (pp. 51-70). Dordrecht: Kluweren_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoenen_US
dc.subjectInformation Extractionen_US
dc.subjectInformation Retrievalen_US
dc.subjectNatural Language Processingen_US
dc.subjectComputational Linguisticsen_US
dc.subjectLinguisticsen_US
dc.subject.othercause-effect relationsen_US
dc.subject.othercausal verbsen_US
dc.subject.othersemantic relationsen_US
dc.subject.otherinformation retrievalen_US
dc.subject.othernewspaper texten_US
dc.subject.otherinformation extractionen_US
dc.titleAutomatic identification of causal relations in text and their use for improving precision in information retrievalen_US
dc.typeThesisen_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.