robots.txt - Question

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I read about robots exclusion here

I am wondering how are search engine bots implemented. Lets assume, I
have Disallow: /foobar in the robots.txt. On the main page of my site,
I link to content say /foobar/pictures.html

So will the search engine bot index /foobar/pictures.html or not ? If
not, does it mean that during the entire period of crawling, it
maintains the information that it has read in robots.txt ?

Thank you for your time.

Re: robots.txt - Question

On Sun, 21 Oct 2007 18:42:41 -0000, khabri put finger to keyboard and

Quoted text here. Click to load it

It won't, if it correctly follows the standards.

Quoted text here. Click to load it

It should cache the contents of robots.txt at the start of every crawl
and obey it thereafter, until it next checks it.

-- - What does your surname say about you?
"All I want is to find an easier way to get out of our little heads"

Re: robots.txt - Question

Quoted text here. Click to load it

Mark Goodge is correct; correctly programmed bots should not access that

Quoted text here. Click to load it

It's up to the bot how often they re-read your robots.txt file. Note
that you can send an 'Expires' header along with your robots.txt file
and it *should* be respected. (No guarantees, though!)

Good luck

Philip /
Whole-site HTML validation, link checking and more

Site Timeline