A webcrawler for indexing a specific site

Does anyone know of a webcrawler I can use for indexing a specific site
into a local index?

Re: A webcrawler for indexing a specific site

__/ [Andreas Ringdal] on Thursday 09 February 2006 10:36 \__

Do you intend to use third-party software/Web service that is run by somebody
else to generate indices and then deliver the, to you, e.g. as a download?
Webcrawler is a company rather than more suitable terminology like a Web
crawler. For poor descriptions, there may be poor answers, which is why it's
worth asking before detailed and elaborate answers are given.

To generate indices locally, I know of Entropy Search, phpdig and htdig.
However, the format of the indices may be obscure (e.g. involve binaries)
rather than standardised (e.g. XML). Different search engines retain indices
differently (proprietary methods), I imagine, which make collaboration hard.

Re: A webcrawler for indexing a specific site

We intend to retrieve the data from a specified website (url may vary)
and index it into our index. Currently using dotLucene as index, but
have support for other engines.

The desired output from the web crawler should be reference/url, text
from page and preferably an extracted date when possible.
We have concidered some opensource projects, but none match our
requiremensts (don't have list of requirements available at this location)


