Persistent Link:
http://hdl.handle.net/10150/195311
Title:
Supporting Multilingual Internet Searching and Browsing
Author:
Zhou, Yilu
Issue Date:
2006
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
The amount of non-English information has proliferated rapidly in recent years. The broad diversity of the multilingual content presents a substantial research challenge in the field of knowledge discovery and information retrieval. Therefore there is an increased interest in the development of multilingual systems to support information sharing across languages. The goal of this dissertation is to study how different techniques and algorithms could help in multilingual Internet searching and browsing through a series of case studies.A system development research process was adopted as the methodology in this dissertation. In the first part of the dissertation, I discuss the development of CMedPort, a Chinese medical portal to serve the information seeking needs of Chinese users. A systematic evaluation has been conducted to study the effectiveness and efficiency of CMedPort in assisting human analysis. My experimental results show that CMedPort achieved significant improvement in searching and browsing performance compared to three benchmark regional search engines.The second and third case studies aim to investigate effective and efficient techniques and algorithms that facilitate multilingual Web retrieval. An English-Chinese multilingual Web retrieval system in the business IT domain was developed and evaluated. It was then extended into five languages: English, Chinese, Japanese, German and Spanish. A dictionary-based approach was adopted in query translation. Corpus-based co-occurrence analysis, relevance feedback, and phrasal translation algorithms were used for disambiguation purposes. Evaluation results showed that the system's phrasal translation and co-occurrence disambiguation led to great improvement in performance. The last part of this dissertation studies proper name translation problem. Proper names are often out-of-vocabulary terms and are critical to multilingual Web retrieval. This study proposes a combined Hidden Markov Model and Web mining model to automatically generate proper name translations. The approach was evaluated on two language pairs: English-Arabic and English Chinese. My results are encouraging and show promise for using transliteration techniques to improve multilingual Web retrieval.This dissertation has two main contributions. Firstly, it demonstrated how information retrieval, Web mining and artificial intelligence techniques can be used in a multilingual Web-based context. Secondly, it provided a set of tools that can facilitate users in their multilingual Web searching and browsing activities.
Type:
text; Electronic Dissertation
Degree Name:
DMgt
Degree Level:
doctoral
Degree Program:
Management Information Systems; Graduate College
Degree Grantor:
University of Arizona
Advisor:
Chen, Hsinchun
Committee Chair:
Chen, Hsinchun

Full metadata record

DC FieldValue Language
dc.language.isoENen_US
dc.titleSupporting Multilingual Internet Searching and Browsingen_US
dc.creatorZhou, Yiluen_US
dc.contributor.authorZhou, Yiluen_US
dc.date.issued2006en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractThe amount of non-English information has proliferated rapidly in recent years. The broad diversity of the multilingual content presents a substantial research challenge in the field of knowledge discovery and information retrieval. Therefore there is an increased interest in the development of multilingual systems to support information sharing across languages. The goal of this dissertation is to study how different techniques and algorithms could help in multilingual Internet searching and browsing through a series of case studies.A system development research process was adopted as the methodology in this dissertation. In the first part of the dissertation, I discuss the development of CMedPort, a Chinese medical portal to serve the information seeking needs of Chinese users. A systematic evaluation has been conducted to study the effectiveness and efficiency of CMedPort in assisting human analysis. My experimental results show that CMedPort achieved significant improvement in searching and browsing performance compared to three benchmark regional search engines.The second and third case studies aim to investigate effective and efficient techniques and algorithms that facilitate multilingual Web retrieval. An English-Chinese multilingual Web retrieval system in the business IT domain was developed and evaluated. It was then extended into five languages: English, Chinese, Japanese, German and Spanish. A dictionary-based approach was adopted in query translation. Corpus-based co-occurrence analysis, relevance feedback, and phrasal translation algorithms were used for disambiguation purposes. Evaluation results showed that the system's phrasal translation and co-occurrence disambiguation led to great improvement in performance. The last part of this dissertation studies proper name translation problem. Proper names are often out-of-vocabulary terms and are critical to multilingual Web retrieval. This study proposes a combined Hidden Markov Model and Web mining model to automatically generate proper name translations. The approach was evaluated on two language pairs: English-Arabic and English Chinese. My results are encouraging and show promise for using transliteration techniques to improve multilingual Web retrieval.This dissertation has two main contributions. Firstly, it demonstrated how information retrieval, Web mining and artificial intelligence techniques can be used in a multilingual Web-based context. Secondly, it provided a set of tools that can facilitate users in their multilingual Web searching and browsing activities.en_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
thesis.degree.nameDMgten_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineManagement Information Systemsen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorChen, Hsinchunen_US
dc.contributor.chairChen, Hsinchunen_US
dc.contributor.committeememberChen, Hsinchunen_US
dc.contributor.committeememberNunamaker Jr., Jay F.en_US
dc.contributor.committeememberZhao, J. Leonen_US
dc.identifier.proquest1763en_US
dc.identifier.oclc659747522en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.