Roll your own search engine

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Is anyone here familiar with what might be some good software for a
niche search engine (crawling a subset of the web - say 10 million
pages).  So far we've found ASPseek seems dead and apparently has issues
on new linux distros.  We're playing wth nutch which is active, but
we're a lamp not a java shop and my developers been struggling with it
for a week or two to get it to really do much (I suspect it's just a
very powerful tool that's poorly documented).  Ideally I'd like
something GPL'ed so I can monkey with the code as needed - for example
so I can specify how I define what sites match the niche.

Is there any software out there that might fit the bill?

Thanks in advance.

Re: Roll your own search engine

There is a web crawler called Larbin which is under the GPL and written
in C++.  Just do a google search for larbin.

Site Timeline