Robots.txt - syntax question.

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Is this correctly formatted?

User-agent: ia_archiver
Disallow: /

User-agent: Googlebot-Image
Disallow: /*.gif$

User-agent: Googlebot-Image
Disallow: /*.jpg$

User-agent: *
Disallow: /norobots/

What I'm hoping to accomplish is:

1. to keep the ia_archiver bot (waybackmachine) from archiving any part
of my site.
2. to keep any of my images (gif or jpg) from appearing in the Google
Image search.
3. to allow all other bots to index and follow any part of my site with
the exception of anything that is located in the /norobots/ folder.

I also have a NOARCHIVE tag inserted in the META of each page on my
site - which I guess is a little redundant in the case of the IA bot,
but oh well :)


Re: Robots.txt - syntax question.

Quoted text here. Click to load it

Your syntaxt is incorrect for the gif blocking section. Wildcard symbols are
not allowed in robots.txt. You can only disallow all or nothingm referring
to filename or folder contents. In the case of the Google-image bot you
should use this to block it from indexing your images, anywhere on the

User-agent: Google-image
Disallow: /

Your other two directives are correct. Ia_archiver will index nothing at
all. All well-behaved robots will ignore the "norobots" directory and all of
it's contents.

Posted IMHO, by 'Wiz'
Please reply to the group

Re: Robots.txt - syntax question.

Quoted text here. Click to load it

If I recall correctly wildcards are not allowed by the standard.

Check at

-- - chemical calculators for labs and education
BATE - program for pH calculations
CASC - Concentration and Solution Calculator
pH lectures - guide to hand pH calculation with examples

Re: Robots.txt - syntax question.

Thanks guys...

Wildcard use is specific to the googlebot.

I copied the info from here (at the very bottom of the page):

Additionally, Google has introduced increased flexibility to the
robots.txt file standard through the use asterisks. Disallow patterns
may include "*" to match any sequence of characters, and patterns may
end in "$" to indicate the end of a name. To remove all files of a
specific file type (for example, to include .jpg but not .gif images),
you'd use the following robots.txt entry:
User-agent: Googlebot-Image
Disallow: /*.gif$

Re: Robots.txt - syntax question.

Borek wrote:
Quoted text here. Click to load it

To be pedantic about it.. It's not that wildcards arent ALLOWED by the
standard, but that they arent HANDLED by it. So a "*" is not illegal,
but simply means the literal character "*" rather than some other
special meaning.

However .. Google's handling of robots.txt DOES include special meanings
for wildcards, and in that sense is non-standard.

-- - Scriptable packet match logic for IPCop and
                             other linux-based firewalls.

Re: Robots.txt - syntax question.

Wolfman's Brother wrote:

Quoted text here. Click to load it



Two common errors:

    * Wildcards are _not_ supported: instead of 'Disallow: /tmp/*' just say
'Disallow: /tmp/'.



Roy S. Schestowitz

Re: Robots.txt - syntax question.

Roy Schestowitz wrote:
Quoted text here. Click to load it

Quite so.

"Disallow: /tmp/*" is not wrong, but means: disallow access to files
whos names start with the six characters "/" "t" "m" "p" "/" "*"

If you mean: Disallow all files in /tmp, you say..

Disallow: /tmp/

But if you have files in /tmp/ that have a "*" as the first character of
their names (heaven alone knows why you'd want to), which you want to
disallow, then you'd say..

Disallow: /tmp/*

Only trouble is that Google will read this differently. The moral is:
dont put "*" into your file names.

-- - simple matching of complex protocols with
                             Linux firewalls using Iptables.

Re: Robots.txt - asterisk

Wolfman's Brother wrote:

Quoted text here. Click to load it

Huh? Wha? *smile* I didn't even know it was possible to prefix a file with
an asterisk until I checked. The *NIX filesystem will turn *file into
\*file (note the escape character), but will do so quite transparently.

In any case, preceeding filenames with symbols is bad practice. Likewise my
bad habit of capitalising directory names, which causes poorer browsers
like Lynx to invoke 404's many in my error logs and disappoint the visitor.

Windows, on the other hand, imposes unexplained limitations on path length
and filename length. This is why I had to steer away from it altogether and
can never come back. Good riddance, too.


Roy S. Schestowitz

Re: Robots.txt - syntax question.


Quoted text here. Click to load it

I believe I have a robots.txt validator linked to on

         Elvis does my seo

Site Timeline