Click here to get back home

What is going on with the Search Engines?

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
What is going on with the Search Engines? Steve 03-15-2005
Get Chitika Premium
Posted by Steve on March 15, 2005, 1:54 am
Please log in for more thread options


I notice that search engines are now finding robots.txt files and catalogue
their contents. Is this wise I wonder? Is it a possible security risk?

I even found the White House robots.txt file on Google. Surely disclosing
detains of the directory structure is an open invitation for hackers to 'take a
look'?

Does anyone else feel the same way? Should we be bringing this to the search
engines attention?

Comments appreciated.




Steve


Posted by Leif K-Brooks on March 14, 2005, 9:25 pm
Please log in for more thread options


Steve wrote:
> I notice that search engines are now finding robots.txt files and
> catalogue their contents. Is this wise I wonder? Is it a possible
> security risk?
>
> I even found the White House robots.txt file on Google. Surely
> disclosing detains of the directory structure is an open invitation for
> hackers to 'take a look'?

What to do search engines have to do with it? robots.txt objects aren't
magically private, you know; you can view them even without a search
engine's help (e.g. http://whitehouse.gov/robots.txt).

Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API.


Posted by Steve on March 15, 2005, 11:07 am
Please log in for more thread options


Leif K-Brooks wrote:
> Steve wrote:
>
>> I notice that search engines are now finding robots.txt files and
>> catalogue their contents. Is this wise I wonder? Is it a possible
>> security risk?
>>
>> I even found the White House robots.txt file on Google. Surely
>> disclosing detains of the directory structure is an open invitation
>> for hackers to 'take a look'?
>
>
> What to do search engines have to do with it? robots.txt objects aren't
> magically private, you know; you can view them even without a search
> engine's help (e.g. http://whitehouse.gov/robots.txt).
>
> Also, why would your directory structure be a security risk? If it needs
> to be listed in a robots.txt, it presumably has a URI, and is therefor a
> part of your public Web API.

Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.



Steve


Posted by Leif K-Brooks on March 15, 2005, 6:24 am
Please log in for more thread options


Steve wrote:
> Leif K-Brooks wrote:
>
>> Steve wrote:
>>
>>> I notice that search engines are now finding robots.txt files and
>>> catalogue their contents. Is this wise I wonder? Is it a possible
>>> security risk?
>>
>> Also, why would your directory structure be a security risk? If it
>> needs to be listed in a robots.txt, it presumably has a URI, and is
>> therefor a part of your public Web API.
>
> Fair comment but I still feel that, in this day and age, it is silly to
> advertise anything about your site / server / records other than what
> you truly want to be public information.

If the directory structure is listed in robots.txt, it could presumably
be found by crawling your site even without robots.txt being available.
Do you propose creating Web sites without any internal links as a
security precaution?


Posted by Andy Dingley on March 15, 2005, 1:44 pm
Please log in for more thread options


It was somewhere outside Barstow when Steve

>Fair comment but I still feel that, in this day and age, it is silly to
>advertise anything about your site / server / records other than what you truly
>want to be public information.

robots.txt _is_ public information. So is the content to which it
refers.

The function of robots.txt is to describe publically visible resources
so as to identify those which are worth indexing as potential entry
points to the site, and those which are available to the public, but
should not be treated as entry points.

/css/, /scripts/ and /photos/ should probably be in there and
forbidden, because you want these to be served to "the public", but
you don't want them treated as independent entry points to the site.

/intranet/, /extranet/ and /secret_server_config/ can either be in
there or not. If you want to keep these secure, you _must_ have some
independent means to secure them.

It's a basic principle of good security practice that it must not
matter if these "secrets" are identified by robots.txt etc. They must
have their security enforced independently. There's also a slight
recommendation that they shouldn't be listed, because this highlights
their existence for a minor increase in the risk of encouraging attack
(although the flakey _vti_cnf can be assumed to exist anyway, without
needing to be told about it)


I think reading some of Bruce Schneier's work, or Ross Anderson's
book, would be interesting for you.

--
Smert' spamionam


Similar ThreadsPosted
How I won big with search engines December 6, 2007, 3:58 am
How I won big with search engines December 6, 2007, 6:49 am
How to keep a site out of the search engines? May 24, 2006, 9:52 am
newbie: how to make a url appear in the search engines December 29, 2005, 9:04 am
Not the browsers, dummy, the search engines February 10, 2007, 10:11 am
do search engines index blogs? March 29, 2007, 6:08 pm
Multiviews and multilanguage content (and search engines!) October 23, 2004, 4:36 pm
Effectiveness of META tags in search engines? March 28, 2005, 8:58 pm
How do search engines index multilingual content? January 29, 2006, 2:46 pm
Do splash pages make you invisible to search engines November 10, 2005, 9:37 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap