Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- Passing extended chars
- Sean O'Dwyer
March 10, 2005, 7:48 am
rate this thread
I sometimes need to find a set of records via PHP/SQL with
non-English/extended characters fed to the query via a hyperlink.
For example, I have a navigation link on a site that tries to pass the
phrase "Gesundheit und Schönheit" via http with the extended character
(ö) correctly encoded as an HTML entity (ö)
However "Gesundheit und Schönheit" is not passed to my variable, only
"Gesundheit und Sch". I reckon the ampersand is causing trouble.
If correctly encoding extend chars as html entities isn't working for
what I want, how can I encode them for storage in XHTML or otherwise get
the result I want?
Re: Passing extended chars
Sean O'Dwyer wrote:
(Is that an example? If so, please follow RFC2606 and use
reserved domain names which won't conflict with current or
future ones; e.g., <http://host.invalid/ .)
URIs are made up of only a subset of US-ASCII, so after the
entity ö is replaced, that isn't a URI. You can
convert that IRI to a URI by converting 'ö' (U+00F6) to its
UTF-8 encoding and then percent-encode each octet. Thus
I don't know what happens in the wild, but that's the
ratified way of encoding characters that are not allowed in
URIs. See RFC3987 sec. 3.1.
Here's how the expert Martin Dürst set up his URI:
ö is simply a way to represent the character LATIN
SMALL LETTER O WITH DIAERESIS in HTML. The trouble is that
that character is not allowed unencoded in URIs.