Concept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniques

Persistent Link:
http://hdl.handle.net/10150/105711
Title:
Concept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniques
Author:
Chen, Hsinchun; Schatz, Bruce R.; Lin, Chienting
Citation:
Concept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniques 1995, :58-59
Issue Date:
1995
Description:
Artificial Intelligence Lab, Department of MIS, University of Arizona
URI:
http://hdl.handle.net/10150/105711
Submitted date:
2004-09-20
Abstract:
The problems of information overload and vocabulary differences have become more pressing with the emergence of the increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search or hypertext browsing. Keyword search often results in low precision, poor recall, and slow response time due to the limitations of indexing and communication methods, controlled language based interfaces, and the inability of searchers themselves to articulate their needs fully. Hypertext browsing, on the other hand, allows users to explore only a very small portion of a large Internet information space. A large information space can also potentially confuse and disorient its user and it can cause the user to spend a great deal of time while learning nothing specific. This research aims to provide concept-based categorization and search capabilities for Internet WWW servers based on selected machine learning and parallel computing techniques. Our proposed approach, which is grounded on automatic textual analysis of Internet documents, attempts to address the Internet search problem by first categorizing the content of Internet documents and subsequently providing semantic search capabilities based on a concept space approach. As a first step, we propose a multi-layered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize the Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases. After individual subject categories have been created, we propose to generate domain-specific concept spaces for each subject category. The concept spaces can then be used to support concept-based information retrieval, a significant improvement over the existing keyword searching and hypertext browsing options for Internet resource discovery. As Internet information space continues to grow at the present pace, we believe this research would shed light on potentially robust and scalable solutions to the increasingly complex and urgent information access and sharing problems that are certain to emerge in the future Internet society.
Type:
Conference Poster
Language:
en
Keywords:
Internet; Information Seeking Behaviors; Classification
Local subject classification:
National Science Digital Library; NSDL; Artificial intelligence lab; AI lab; Information retrieval

Full metadata record

DC FieldValue Language
dc.contributor.authorChen, Hsinchunen_US
dc.contributor.authorSchatz, Bruce R.en_US
dc.contributor.authorLin, Chientingen_US
dc.date.accessioned2004-09-20T00:00:01Z-
dc.date.available2010-06-18T23:32:54Z-
dc.date.issued1995en_US
dc.date.submitted2004-09-20en_US
dc.identifier.citationConcept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniques 1995, :58-59en_US
dc.identifier.urihttp://hdl.handle.net/10150/105711-
dc.descriptionArtificial Intelligence Lab, Department of MIS, University of Arizonaen_US
dc.description.abstractThe problems of information overload and vocabulary differences have become more pressing with the emergence of the increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search or hypertext browsing. Keyword search often results in low precision, poor recall, and slow response time due to the limitations of indexing and communication methods, controlled language based interfaces, and the inability of searchers themselves to articulate their needs fully. Hypertext browsing, on the other hand, allows users to explore only a very small portion of a large Internet information space. A large information space can also potentially confuse and disorient its user and it can cause the user to spend a great deal of time while learning nothing specific. This research aims to provide concept-based categorization and search capabilities for Internet WWW servers based on selected machine learning and parallel computing techniques. Our proposed approach, which is grounded on automatic textual analysis of Internet documents, attempts to address the Internet search problem by first categorizing the content of Internet documents and subsequently providing semantic search capabilities based on a concept space approach. As a first step, we propose a multi-layered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize the Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases. After individual subject categories have been created, we propose to generate domain-specific concept spaces for each subject category. The concept spaces can then be used to support concept-based information retrieval, a significant improvement over the existing keyword searching and hypertext browsing options for Internet resource discovery. As Internet information space continues to grow at the present pace, we believe this research would shed light on potentially robust and scalable solutions to the increasingly complex and urgent information access and sharing problems that are certain to emerge in the future Internet society.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoenen_US
dc.subjectInterneten_US
dc.subjectInformation Seeking Behaviorsen_US
dc.subjectClassificationen_US
dc.subject.otherNational Science Digital Libraryen_US
dc.subject.otherNSDLen_US
dc.subject.otherArtificial intelligence laben_US
dc.subject.otherAI laben_US
dc.subject.otherInformation retrievalen_US
dc.titleConcept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniquesen_US
dc.typeConference Posteren_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.