It appears that only specific bots are not obeying the robots.txt file

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

It appears that only specific bots are not obeying the robots.txt file
and indexing pages are rates that can potentially cause server issues.

The specific IP addresses appear to be in the 74.6.x block. They do
reverse DNS to inktomi, which is correct.

Forum discussion at WebmasterWorld and Search Engine Watch Forums.

Posted by  rustybrick in Yahoo! Search Optimization at January 17, 2007
7:57 AM | Comments (5) | TrackBacks (0)

Social Flares
Email this Subscribe to this feed Digg This! Save to Google Co-op
Entry Technorati Tags :: Related Content
crawlers, slurp, yahoo, yahoo slurp

My blog is the Maytag man of the blogosphere. Yet Yahoo! (Inktomi)
crawls it every day faithfully.
Just me and Inktomi but I am thankful they notice, LOL.
Posted by: Mike at January 17, 2007 9:44 AM Permalink

This was discussed on the LED Digest last week - the original post is
in #2321: with
responses in the next 3-4 issues. As far as I know the OP never
resolved this, but he did offer a piece of advice:

"Feature request for SE spiders: Provide a referrer. Please. It would
make me and I expect other site owners feel grateful when odd URL
requests are noticed. If more than one referrer, then just any one --
the last one, the first one, doesn't matter which. Referrer information
could save people a lot of time, and let them keep their
hair a while longer."

Hope this info helps...

Posted by: Adam Audette at January 17, 2007 11:31 AM Permalink

Thanks Adam, sorry for missing it.
Posted by: Barry Schwartz at January 17, 2007 11:39 AM Permalink

I answered the specific question on webmasterworld. It does not seem
that there is an issue with the crawler in this instance but an
incorrect interpreatation of the robots.txt syntax by the publisher.
Posted by: Tim at January 17, 2007 2:01 PM Permalink

Is Yahoo's bot based on WGet? I get this doubt because there was the
following line in my log file

2007-01-17 17:44:24 W3SVC105 NT-110 XX.XX.XX.XX GET / - 80 - HTTP/1.0 Wget/1.8.2 - - 200 0 0 11155
111 578

The IP Reverse DNSes to
Posted by: Ram at January 18, 2007 12:59 AM Permalink
TrackBack URL for this entry:

Site Timeline