Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- publicly accessible indexed web index?
December 11, 2006, 6:14 pm
rate this thread
dear readers: I would like to do a basic web search and download a
list of all web pages matching the results. I don't care very much
about ordering, because my own perl programs will then wget the
resulting web pages and see if they meet other needs of mine. (I do
need to sift through 1000's of result pages, though.)
of course, I could use one of the many publicly accessible spider
program, and crawl the web myself, but this seems like a waste of
bandwidth. are there public repositories that avoid the need for me to
crawl? google.com used to have an API, but apparently just dropped it.
moreover, I don't need much google or pagerank sophistication---I need
the old altavista-like comprehensiveness more than cleverness.
any pointers would be appreciated.
Re: publicly accessible indexed web index?
From a search engine I guess.
Didn't know that, and I doubt it. One can always spider Google directly,
but it's against Google's policy. The API is limited to 1000 queries a day
if I recall correctly.
Based on what criteria do you want to fetch pages? I doubt you want to
spider away :-)
I do this stuff for a living, 12+ years of Perl experience, see
http://castleamber.com/ for pricing info etc.
John Need help with SEO? Get started with a SEO report of your site:
- » Seeking Iran Intelligence, U.S. Tries Google p2...
- — Previous thread in » Search Engines
- » ssh on command line: force using a group size (prime size) of 1024 (and no...
- — The site's Newest Thread. Posted in » Secure Shell Forum