|
Posted by Big Bill on February 21, 2008, 6:54 am
Please log in for more thread options
wrote:
>
>>
>>>
>>>>
>>>>> I'm using a robots.txt file to control what is and is not crawled
>>>>> by search engine bots but I'd like to block anything that isn't a
>>>>> known search engine bot doesn't get the file I'm feeding to google,
>>>>> yahoo and the others.
>>>>
>>>> Why?
>>>>
>>>> I can imagine that you want to block your entire site for any bot
>>>> that's known to be abusive though, but those probably don't check
>>>> your robots.txt anyway.
>>>>
>>>
>>> Perhaps I didn't say it right. I'm wanting to block the robots.txt
>>> that I'm feeding search engines from being given to anybody else.
>>
>> Why? If the reason is that you want to "protect" some folders: it's
>> not secure and bound to fail sooner or later. Remember that not all
>> bots honor the robots.txt, especially not the ones that you don't want
>> on your site in the first place.
>
>I want to keep certain humans from reading the robots.txt that I give to
>search engines because it's none of their bloody business what pages I
>tell SE's not to index and there are a few that might have mind enough to
>look at robots.txt They will not however expect to be handed a tailored
>version of it.
>
>>> I
>>> realize that they *could* spoof the SE's user agent or something, but
>>> my concerns are bright enough to look for robots.txt but not bright
>>> enough to expect to be handed a phoney
>>
>> You want to hide the key under the doormat which has in 5 languages
>> "The key is hidden nearby" written on top...
>
>Not really, or is it possible that they could also get my .htaccess? I
>didn't think that was possible. If they ask for a robots.txt and get one
>that's got nothing more than a pointer to a sitemap that will satisfy
>'em.
Essentially you'd need to claok, feed different content to different
requests. Ask Fantomaster.
BB
--
http://www.kruse.co.uk/ http://www.fat-odin.com/ http://www.here-be-posters.co.uk/
|