Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
April 23, 2005, 10:57 pm
rate this thread
I'm seeing hundreds of requests like this in my log:
22.214.171.124 - - [23/Apr/2005:22:08:43 +0100] "GET
/uni/course/resources/joshua.smcvt.edu/book.pdf HTTP/1.1" 200 2843632
"http://www.miragorobot.com/scripts/mrinfo.asp " "HenryTheMiragoRobot
They're all for exactly the same URL, from different IPs (all in the
Mirago range). The document isn't changing, here are the HTTP headers:
HTTP/1.1 200 OK
Date: Sat, 23 Apr 2005 22:46:48 GMT
Server: Apache/2.0.53 (Debian GNU/Linux) <snip>
Last-Modified: Tue, 07 Dec 2004 20:12:01 GMT
as you can see, the last modified date is months ago.
I mailed the address on their page a couple of weeks ago, then I mailed
again about 10 days ago, still I've had no reply and the requests keep
coming. I'm going to work out how to block their IP range now (it's eating
my bandwidth like crazy). I should add that they're *not* looking at my
robots.txt file, which specifically forbids them from taking that file.
They haven't requested it since at least 1st March, if they ever have (I
don't have any older logs).
Can someone confirm that my server's set up correctly?
Re: Mirago robot ignoring robots.txt, repeatedly requesting same URL
[ bad bot ]
Use mod_rewrite, see http://johnbokma.com/mexit/2005/01/11/
Instead of checking the referer, check the useragent
RewriteCond % HenryTheMiragoRobot
RewriteRule ^/uni/course/resources/joshua\.smcvt\.edu/book\.pdf$ - [F]
(not tested, I am not sure about the exact name of the USERAGENT header)
You could make the header test more specific, but I doubt that there are
that many HenryTheMirageRobots around. (And otherwise, with a silly name
like that, it should be banned :-D )
John Perl SEO tools: http://johnbokma.com/perl/
Experienced (web) developer: http://castleamber.com/
Get a SEO report of your site for just 100 USD: