html form metacharacters?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I've written a cgi script that puts funny looking links to other cgi
scripts on a page, like:

<a href = " does "better than a turnip" mean?">what does
"better than a turnip" mean?"</a>

Unfortunately, ofcourse ? and " are both metacharacters with special
meaning in html. And this ofcourse completely screws up the execution of
my script.

I just need to write some regular expressions which substitutes these
metacharacters for character combinations a browser better understands,
such as the char combinations on this page.

But I'd like a list of what ALL the metacharacters used on forms are. Does
anyone know of such a list?

Re: html form metacharacters?

Dfenestr8 wrote:

Quoted text here. Click to load it

You need to URL-escape them. e.g. " becomes %22. See RFC 1738 for an idea
of which characters need to be escaped, and which don't. See "man ascii"
for the hex codes you need.

Toby A Inkster BSc (Hons) ARCS
Contact Me  ~

Re: html form metacharacters?

Quoted text here. Click to load it

No, the question mark has no special meaning _in HTML_. It has a special
meaning in a URL, though. The quotation mark has a special meaning in HTML,
but _only_ in an attribute value (which you have here).

Quoted text here. Click to load it

Hopefully not. If you cannot find a library routine for that (often called
"urlencode" or "urlescape" or something like that), find a library that has
such a routine, or switch to a programming language that has such a
library. Remember the four virtues of a programmer: laziness, impatience,
hubris, and short memory.

Quoted text here. Click to load it

Forms are not the issue here, but URLs. See URL specifications if you
_really_ must know exactly which characters need to be URL encoded and
when. Normally only people who write library routines like "urlencode"
need to know such things. And even they can apply the rules somewhat
simplistically, since it's not wrong to URL encode a character that does
not need to be URL encoded in particular context, provided that it is not
used in a specific meaning where the unencoded character is semantically
different from the encoded character.

The current specification of generic URL syntax, including URL encoding
requirements, is RFC 2396, available as HTMLized by me at /
It superseded the generic part of RFC 1738 in 1998, so RFC 1738 should be
referred to _only_ in matters of specific URL schemes such as the specific
constraints on http: and ftp: URLs.

Yucca, /
Pages about Web authoring:

Site Timeline