Click here to get back home

blocking robots.txt from non-robots

 HomeNewsGroups | Search | About
 alt.internet.search-engines    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
blocking robots.txt from non-robots Joe Fox 02-20-2008
Posted by Joe Fox on February 20, 2008, 12:14 am
Please log in for more thread options

I'm using a robots.txt file to control what is and is not crawled by search
engine bots but I'd like to block anything that isn't a known search engine
bot doesn't get the file I'm feeding to google, yahoo and the others.

From what I've read this could be done with .htacess but I've not been able
to make heads or tails out of that.

I'd really be grateful for some help here.

Thanks

Posted by John Bokma on February 20, 2008, 8:00 am
Please log in for more thread options

> I'm using a robots.txt file to control what is and is not crawled by
> search engine bots but I'd like to block anything that isn't a known
> search engine bot doesn't get the file I'm feeding to google, yahoo
> and the others.

Why?

I can imagine that you want to block your entire site for any bot that's
known to be abusive though, but those probably don't check your robots.txt
anyway.

--
John Bokma http://johnbokma.com/

Posted by Joe Fox on February 20, 2008, 2:16 pm
Please log in for more thread options

>
>> I'm using a robots.txt file to control what is and is not crawled by
>> search engine bots but I'd like to block anything that isn't a known
>> search engine bot doesn't get the file I'm feeding to google, yahoo
>> and the others.
>
> Why?
>
> I can imagine that you want to block your entire site for any bot
> that's known to be abusive though, but those probably don't check your
> robots.txt anyway.
>

Perhaps I didn't say it right. I'm wanting to block the robots.txt that
I'm feeding search engines from being given to anybody else. I realize
that they *could* spoof the SE's user agent or something, but my concerns
are bright enough to look for robots.txt but not bright enough to expect
to be handed a phoney



Posted by John Bokma on February 20, 2008, 3:17 pm
Please log in for more thread options

>
>>
>>> I'm using a robots.txt file to control what is and is not crawled by
>>> search engine bots but I'd like to block anything that isn't a known
>>> search engine bot doesn't get the file I'm feeding to google, yahoo
>>> and the others.
>>
>> Why?
>>
>> I can imagine that you want to block your entire site for any bot
>> that's known to be abusive though, but those probably don't check
>> your robots.txt anyway.
>>
>
> Perhaps I didn't say it right. I'm wanting to block the robots.txt
> that I'm feeding search engines from being given to anybody else.

Why? If the reason is that you want to "protect" some folders: it's not
secure and bound to fail sooner or later. Remember that not all bots honor
the robots.txt, especially not the ones that you don't want on your site
in the first place.

> I
> realize that they *could* spoof the SE's user agent or something, but
> my concerns are bright enough to look for robots.txt but not bright
> enough to expect to be handed a phoney

You want to hide the key under the doormat which has in 5 languages "The
key is hidden nearby" written on top...

--
John Bokma http://johnbokma.com/

Posted by Joe Fox on February 21, 2008, 1:35 am
Please log in for more thread options

>
>>
>>>
>>>> I'm using a robots.txt file to control what is and is not crawled
>>>> by search engine bots but I'd like to block anything that isn't a
>>>> known search engine bot doesn't get the file I'm feeding to google,
>>>> yahoo and the others.
>>>
>>> Why?
>>>
>>> I can imagine that you want to block your entire site for any bot
>>> that's known to be abusive though, but those probably don't check
>>> your robots.txt anyway.
>>>
>>
>> Perhaps I didn't say it right. I'm wanting to block the robots.txt
>> that I'm feeding search engines from being given to anybody else.
>
> Why? If the reason is that you want to "protect" some folders: it's
> not secure and bound to fail sooner or later. Remember that not all
> bots honor the robots.txt, especially not the ones that you don't want
> on your site in the first place.

I want to keep certain humans from reading the robots.txt that I give to
search engines because it's none of their bloody business what pages I
tell SE's not to index and there are a few that might have mind enough to
look at robots.txt They will not however expect to be handed a tailored
version of it.

>> I
>> realize that they *could* spoof the SE's user agent or something, but
>> my concerns are bright enough to look for robots.txt but not bright
>> enough to expect to be handed a phoney
>
> You want to hide the key under the doormat which has in 5 languages
> "The key is hidden nearby" written on top...

Not really, or is it possible that they could also get my .htaccess? I
didn't think that was possible. If they ask for a robots.txt and get one
that's got nothing more than a pointer to a sitemap that will satisfy
'em.

Similar ThreadsPosted
whitehouse.gov is blocking " February 2, 2007, 11:51 am
Semi-OT :How Do I Know If My ISP Is Blocking Pages? August 14, 2007, 10:15 pm
Question about testing for page blocking January 9, 2005, 1:30 pm
Google blocking our Web Position Software March 7, 2005, 10:18 am
Yahoo has been blocking SeoElite's queries January 8, 2006, 8:52 pm
Yahoo has been blocking SeoElite's queries January 8, 2006, 8:55 pm
robots.txt January 12, 2005, 11:56 pm
Robots txt March 20, 2006, 8:19 am
robots.txt April 12, 2006, 8:48 am
Robots.txt April 17, 2006, 6:43 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap