Persistent Link:
http://hdl.handle.net/10150/106527
Title:
Focused crawls, tunneling, and digital libraries
Author:
Bergmark, Donna; Lagoze, Carl; Sbityakov, Alex
Citation:
Focused crawls, tunneling, and digital libraries 2002,
Issue Date:
2002
URI:
http://hdl.handle.net/10150/106527
Submitted date:
2002-07-20
Abstract:
Crawling the Web to build collections of documents related to pre-speciï¬ ed topics became an active area of research during the late 1990â s, crawler technology having been developed for use by search engines. Now, Web crawling is being seriously considered as an important strategy for building large scale digital libraries. This paper covers some of the crawl technologies that might be exploited for collection building. For example, to make such collection-building crawls more effective, focused crawling was developed, in which the goal was to make a â best-ï¬ rstâ crawl of the Web. We are using powerful crawler software to implement a focused crawl but use tunneling to overcome some of the limitations of a pure best-ï¬ rst approach. Tunneling has been described by others as not only prioritizing links from pages according to the pageâ s relevance score, but also estimating the value of each link and prioritizing them as well. We add to this mix by devising a tunneling focused crawling strategy which evaluates the current crawl direction on the ï¬ y to determine when to terminate a tunneling activity. Results indicate that a combination of focused crawling and tunneling could be an effective tool for building digital libraries.
Type:
Preprint
Language:
en
Keywords:
Digital Libraries
Local subject classification:
Web Crawling, Mercator

Full metadata record

DC FieldValue Language
dc.contributor.authorBergmark, Donnaen_US
dc.contributor.authorLagoze, Carlen_US
dc.contributor.authorSbityakov, Alexen_US
dc.date.accessioned2002-07-20T00:00:01Z-
dc.date.available2010-06-18T23:48:57Z-
dc.date.issued2002en_US
dc.date.submitted2002-07-20en_US
dc.identifier.citationFocused crawls, tunneling, and digital libraries 2002,en_US
dc.identifier.urihttp://hdl.handle.net/10150/106527-
dc.description.abstractCrawling the Web to build collections of documents related to pre-speciï¬ ed topics became an active area of research during the late 1990â s, crawler technology having been developed for use by search engines. Now, Web crawling is being seriously considered as an important strategy for building large scale digital libraries. This paper covers some of the crawl technologies that might be exploited for collection building. For example, to make such collection-building crawls more effective, focused crawling was developed, in which the goal was to make a â best-ï¬ rstâ crawl of the Web. We are using powerful crawler software to implement a focused crawl but use tunneling to overcome some of the limitations of a pure best-ï¬ rst approach. Tunneling has been described by others as not only prioritizing links from pages according to the pageâ s relevance score, but also estimating the value of each link and prioritizing them as well. We add to this mix by devising a tunneling focused crawling strategy which evaluates the current crawl direction on the ï¬ y to determine when to terminate a tunneling activity. Results indicate that a combination of focused crawling and tunneling could be an effective tool for building digital libraries.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoenen_US
dc.subjectDigital Librariesen_US
dc.subject.otherWeb Crawling, Mercatoren_US
dc.titleFocused crawls, tunneling, and digital librariesen_US
dc.typePreprinten_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.