Mirago robot ignoring robots.txt, repeatedly requesting same URL

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


I'm seeing hundreds of requests like this in my log: - - [23/Apr/2005:22:08:43 +0100] "GET
/uni/course/resources/joshua.smcvt.edu/book.pdf HTTP/1.1" 200 2843632
"http://www.miragorobot.com/scripts/mrinfo.asp " "HenryTheMiragoRobot
(http://www.miragorobot.com/scripts/mrinfo.asp )"
They're all for exactly the same URL, from different IPs (all in the
Mirago range). The document isn't changing, here are the HTTP headers:
    HTTP/1.1 200 OK
    Date: Sat, 23 Apr 2005 22:46:48 GMT
    Server: Apache/2.0.53 (Debian GNU/Linux) <snip>
    Last-Modified: Tue, 07 Dec 2004 20:12:01 GMT
    ETag: "495-2b63f0-37c40640"
    Accept-Ranges: bytes
    Content-Length: 2843632
    Content-Type: application/pdf
as you can see, the last modified date is months ago.

I mailed the address on their page a couple of weeks ago, then I mailed
again about 10 days ago, still I've had no reply and the requests keep
coming. I'm going to work out how to block their IP range now (it's eating
my bandwidth like crazy). I should add that they're *not* looking at my
robots.txt file, which specifically forbids them from taking that file.
They haven't requested it since at least 1st March, if they ever have (I
don't have any older logs).

Can someone confirm that my server's set up correctly?

Thanks :-)


http://matt.blissett.me.uk /

Re: Mirago robot ignoring robots.txt, repeatedly requesting same URL

Matt wrote:

[ bad bot ]

Quoted text here. Click to load it

Use mod_rewrite, see http://johnbokma.com/mexit/2005/01/11/

Instead of checking the referer, check the useragent

RewriteCond % HenryTheMiragoRobot
RewriteRule ^/uni/course/resources/joshua\.smcvt\.edu/book\.pdf$ - [F]

(not tested, I am not sure about the exact name of the USERAGENT header)

You could make the header test more specific, but I doubt that there are
that many HenryTheMirageRobots around. (And otherwise, with a silly name
like that, it should be banned :-D )

John                       Perl SEO tools: http://johnbokma.com/perl/
                 Experienced (web) developer: http://castleamber.com/
Get a SEO report of your site for just 100 USD:

Re: Mirago robot ignoring robots.txt, repeatedly requesting same URL

Probably referrer spam.


Site Timeline