A webcrawler for indexing a specific site

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Does anyone know of a webcrawler I can use for indexing a specific site
into a local index?

Re: A webcrawler for indexing a specific site

__/ [Andreas Ringdal] on Thursday 09 February 2006 10:36 \__

Quoted text here. Click to load it

Do you intend to use third-party software/Web service that is run by somebody
else to generate indices and then deliver the, to you, e.g. as a download?
Webcrawler is a company rather than more suitable terminology like a Web
crawler. For poor descriptions, there may be poor answers, which is why it's
worth asking before detailed and elaborate answers are given.

To generate indices locally, I know of Entropy Search, phpdig and htdig.
However, the format of the indices may be obscure (e.g. involve binaries)
rather than standardised (e.g. XML). Different search engines retain indices
differently (proprietary methods), I imagine, which make collaboration hard.

[note: groups and followups re-written]

Best wishes,


Roy S. Schestowitz      | Vista: as the reputation of "Longhorn" was mucked
http://Schestowitz.com  |    SuSE Linux     |     PGP-Key: 0x74572E8E
 12:50pm  up 23 days  8:06,  11 users,  load average: 0.09, 0.10, 0.09
      http://iuron.com - Open Source knowledge engine project

Re: A webcrawler for indexing a specific site

We intend to retrieve the data from a specified website (url may vary)
and index it into our index. Currently using dotLucene as index, but
have support for other engines.

The desired output from the web crawler should be reference/url, text
from page and preferably an extracted date when possible.
We have concidered some opensource projects, but none match our
requiremensts (don't have list of requirements available at this location)


Roy Schestowitz skrev:
Quoted text here. Click to load it

Site Timeline