Click here to get back home

blocking robots.txt from non-robots

 HomeNewsGroups | Search | About
 alt.internet.search-engines    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
blocking robots.txt from non-robots Joe Fox 02-20-2008
Posted by John Bokma on February 21, 2008, 10:06 pm
Please log in for more thread options

>
>>
>>> Not really, or is it possible that they could also get my .htaccess?
>>> I didn't think that was possible. If they ask for a robots.txt and
>>> get one that's got nothing more than a pointer to a sitemap that will
>>> satisfy 'em.
>>
>> Let's assume for arguments sake that those people *want* to see your
>> robots.txt. If you feed Google something different than them, they
>> will notice as soon as they check Google, because if you disallow
>> Google some directories, while your robots.txt says allow, they will
>> wonder why all pages in some directory don't show up in Google, but
>> are available on your site.
>
> You give the majority of the general public too much credit ;)
> Comparing a websites robots.txt to google results!

People who are interested in robots.txt certainly do *not* fall in what
you call the "general public". Furthermore, I have no doubt that most
people who *do* know to request robots.txt and need to be stopped from
seeing the one that the OP wants to feed to SEs have at least some basic
knowledge about robots.txt and how to use Google.

--
John Bokma http://johnbokma.com/

Posted by Don on February 21, 2008, 8:52 pm
Please log in for more thread options

>
>>
>>>
>>>>
>>>>> I'm using a robots.txt file to control what is and is not crawled
>>>>> by search engine bots but I'd like to block anything that isn't a
>>>>> known search engine bot doesn't get the file I'm feeding to
>>>>> google, yahoo and the others.
>>>>
>>>> Why?
>>>>
>>>> I can imagine that you want to block your entire site for any bot
>>>> that's known to be abusive though, but those probably don't check
>>>> your robots.txt anyway.
>>>>
>>>
>>> Perhaps I didn't say it right. I'm wanting to block the robots.txt
>>> that I'm feeding search engines from being given to anybody else.
>>
>> Why? If the reason is that you want to "protect" some folders: it's
>> not secure and bound to fail sooner or later. Remember that not all
>> bots honor the robots.txt, especially not the ones that you don't
>> want on your site in the first place.
>
> I want to keep certain humans from reading the robots.txt that I give
> to search engines because it's none of their bloody business what
> pages I tell SE's not to index and there are a few that might have
> mind enough to look at robots.txt They will not however expect to be
> handed a tailored version of it.
>
>>> I
>>> realize that they *could* spoof the SE's user agent or something,
>>> but my concerns are bright enough to look for robots.txt but not
>>> bright enough to expect to be handed a phoney
>>
>> You want to hide the key under the doormat which has in 5 languages
>> "The key is hidden nearby" written on top...
>
> Not really, or is it possible that they could also get my .htaccess?
> I didn't think that was possible. If they ask for a robots.txt and
> get one that's got nothing more than a pointer to a sitemap that will
> satisfy 'em.

The most effective way to do this is not allow the option of vieweing
robots.txt for denied IP ranges within htaccess.

As far as denying robots.txt to the entire general public?
It's a bad practice as the majority of the GP never even heard of
robots.txt



Posted by Phil Payne on February 20, 2008, 3:55 pm
Please log in for more thread options
> Perhaps I didn't say it right. =A0I'm wanting to block the robots.txt that=

> I'm feeding search engines from being given to anybody else.

If Google catch you they will exclude you from the index.

'Don't deceive your users or present different content to search
engines than you display to users, which is commonly referred to as
"cloaking." '

Posted by Joe Fox on February 21, 2008, 1:10 am
Please log in for more thread options

>> Perhaps I didn't say it right.  I'm wanting to block the robots.txt
>> that
>
>> I'm feeding search engines from being given to anybody else.
>
> If Google catch you they will exclude you from the index.
>
> 'Don't deceive your users or present different content to search
> engines than you display to users, which is commonly referred to as
> "cloaking." '
>


I can't believe this.

I'm not trying to cloak my content or pull anything underhanded.

I have robots.txt set to tell google and others disallow certain pages.

I don't want certain humans (only a few hundred in number but all on
dynamic IPs in several countries) to be able to read the robots.txt that
I'm giving search engines because I don't want them to know what pages I
am telling SE's "disallow"

What's so wrong with this?

That robots.txt is not these people's business and I don't want them to
read it. If I knew all of the IP addresses that they connected from I
would block 'em that way but as I said, they're all dynamic from a
variety of ISP's in several countries.

Posted by Big Bill on February 21, 2008, 6:54 am
Please log in for more thread options
wrote:

>
>>> Perhaps I didn't say it right.  I'm wanting to block the robots.txt
>>> that
>>
>>> I'm feeding search engines from being given to anybody else.
>>
>> If Google catch you they will exclude you from the index.
>>
>> 'Don't deceive your users or present different content to search
>> engines than you display to users, which is commonly referred to as
>> "cloaking." '
>>
>
>
>I can't believe this.
>
>I'm not trying to cloak my content or pull anything underhanded.

You are, though, you're trying to cloak your robots.txt.

BB
--

http://www.kruse.co.uk/
http://www.fat-odin.com/
http://www.here-be-posters.co.uk/

Similar ThreadsPosted
whitehouse.gov is blocking " February 2, 2007, 11:51 am
Semi-OT :How Do I Know If My ISP Is Blocking Pages? August 14, 2007, 10:15 pm
Question about testing for page blocking January 9, 2005, 1:30 pm
Google blocking our Web Position Software March 7, 2005, 10:18 am
Yahoo has been blocking SeoElite's queries January 8, 2006, 8:52 pm
Yahoo has been blocking SeoElite's queries January 8, 2006, 8:55 pm
robots.txt January 12, 2005, 11:56 pm
Robots txt March 20, 2006, 8:19 am
robots.txt April 12, 2006, 8:48 am
Robots.txt April 17, 2006, 6:43 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap