|
Posted by Joe Fox on February 21, 2008, 1:35 am
Please log in for more thread options
>
>>
>>>
>>>> I'm using a robots.txt file to control what is and is not crawled
>>>> by search engine bots but I'd like to block anything that isn't a
>>>> known search engine bot doesn't get the file I'm feeding to google,
>>>> yahoo and the others.
>>>
>>> Why?
>>>
>>> I can imagine that you want to block your entire site for any bot
>>> that's known to be abusive though, but those probably don't check
>>> your robots.txt anyway.
>>>
>>
>> Perhaps I didn't say it right. I'm wanting to block the robots.txt
>> that I'm feeding search engines from being given to anybody else.
>
> Why? If the reason is that you want to "protect" some folders: it's
> not secure and bound to fail sooner or later. Remember that not all
> bots honor the robots.txt, especially not the ones that you don't want
> on your site in the first place.
I want to keep certain humans from reading the robots.txt that I give to
search engines because it's none of their bloody business what pages I
tell SE's not to index and there are a few that might have mind enough to
look at robots.txt They will not however expect to be handed a tailored
version of it.
>> I
>> realize that they *could* spoof the SE's user agent or something, but
>> my concerns are bright enough to look for robots.txt but not bright
>> enough to expect to be handed a phoney
>
> You want to hide the key under the doormat which has in 5 languages
> "The key is hidden nearby" written on top...
Not really, or is it possible that they could also get my .htaccess? I
didn't think that was possible. If they ask for a robots.txt and get one
that's got nothing more than a pointer to a sitemap that will satisfy
'em.
|