Do you have a question? Post it now! No Registration Necessary. Now with pictures!
November 24, 2007, 8:02 pm
rate this thread
Unvalidated Robots.Txt Risks Google Banishment
The web crawling Googlebot may find a forgotten line in robots.txt
that causes it to de-index a site from the search engine.
Webmasters welcome being dropped out of Google about as much as
they enjoy flossing with barbed wire. Making it easier for Google
to do that would be anathema to being a webmaster. Why willingly
exclude one's site from Google?
That could happen with an unvalidated robots.txt file. Robots.txt
allows webmasters to provide standing instructions to visiting
spiders, which contributes to having a site indexed faster and
Google has been considering new syntax to recognize within
robots.txt. The Sebastians-Pamphlets blog said Google confirmed
recognizing experimental syntax like Noindex in the robots.txt
This poses a danger to webmasters who have not validated their
robots.txt. A line reading Noindex: / could lead to one's site
being completely de-indexed.
The surname-less Sebastian recommended Google's robots.txt
analyzer, part of Google's Webmaster Tools, and only using
the Disallow, Allow, and Sitemaps crawler directives in the
Googlebot section of robots.txt.
Ed Jay (remove 'M' to respond by email)