More on robots.txt / spam

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Hello all,

Just a question about robots and spam, especially in my guestbook.
Below is a screen shot of the statistics of a website I run:

In the list of spiders/robots, I can see the obvious robots: Google,
MSN, and some other names of ones that I don't know. There are several
"unknown robots" is it safe to assume that these are likely the robots
that are auto-filling my guestbook with "link-bombs"?

I added a hidden field to the guest-book form called "manual_entry".
The form e-mails me to inform me that an entry has been made.  As I
suspected, legitimate guest-book entries are coming in with the hidden
field filled in as a "yes". Link bombs and porn bombs are coming in
with the hidden field ignored/blank. I'm guessing that the robots are
bypassing the actual form and submitting through the submission script.

Just wanted your take on this - if I can assume that the unknown robots
are partly responsible for the guestbook entries or if I need to look

Someone mentioned an .htaccess file?

Thanks in advance.

Viken K.

Re: More on robots.txt / spam


Quoted text here. Click to load it

Serious spam is likely to come from well-camouflaged 'bots that don't
observe robots.txt anyway.

"Hackerdoodz k00l web slayer" is probably downloaded from last week's
magazine cover disks and doesn't do more than look for pr0n.

If you care, mod the posting script to record IP and user agent when it
makes postings

Re: More on robots.txt / spam

Quoted text here. Click to load it

I already do. However I'm not really sure what to do with the data.
Here's the data from one such spam entry:

Remote Name:
Remote User:
HTTP User Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;
MyIE2; Maxthon)
Date:            09 Feb 2006
Time:            11:58:53

Thanks for replying!

Viken K.

Re: More on robots.txt / spam

Viken Karaguesian wrote:
Quoted text here. Click to load it

No.  There are lists of known robots - I think might be
a startingpoint for finding more info.

It is, however, safe to be bolshie towards any robot that disobeys
your robots.txt.  There are various ways to deal with them: I believe
there's a recipe with mod_rewrite.  I had to deal with them in a
hurry and didn't have time to figure that out, so I wrote
to serve robots.txt to bad robots (identified manually).

Nick Kew

Re: More on robots.txt / spam

On Tue, 14 Feb 2006 10:28:29 -0800, Viken Karaguesian wrote:

Quoted text here. Click to load it
Quoted text here. Click to load it

Possibly.  It depends on what you mean by "bypassing".  I would have
thought that they submit by issuing an HTTP POST (or GET, presumably but
that would be unlikely) request to the URI listed as the form's "method"

I clever bot would parse the form's fields and be able to submit the same
data as a user, would it not (question to mre knowlegable readers than I)?
Of course, using a client-side script to set the hidden field would fool
(current) bots but it would also rule out people who can't or won't use
such things.

[I posted a possible solution using timestamps a week or so back (28 Jan
2006 13:37:10 if it helps you look) which might help.]


Site Timeline