represent any Unicode character by means of a markup string coded in us-ascii

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Quoted text here. Click to load it

I could change any Unicode character to its html notation, if only I
had a way to find out the Unicode value of the characters in the string
I'm given. But given a random set of string inputs, possibly copy and
pasted from WordPerfect or Microsoft Word or BBedit on a Mac, I don't
know how to find the Unicode value of those characters.

Re: represent any Unicode character by means of a markup stringcoded in us-ascii

On Sat, 27 May 2005, wrote:

Quoted text here. Click to load it

What's the context here?  In order to know what "characters" you have
been given, you need to know what encoding they are represented in. If
they're not an encoding of Unicode itself, then you can normally refer
to the appropriate cross-mapping table at the Unicode site to
determine the corresponding hexadecimal Unicode value.  That's the
value that you'd need (converted to decimal if you so choose) in the
&#...; representation in HTML.

Quoted text here. Click to load it

If you're talking about forms submission, then the usual arrangement
is that the characters are submitted using the same character encoding
as the page which contains the form which they're submitted from.  
For working with modern browsers, I'd normally recommend that you use
utf-8 for that.  (No good with NN4.*).

(But if you've been sent utf-8 and you're willing to store files in
utf-8 then you don't really *have* to use &#...; representation
anyway.  It's your choice, really.)

You're then reliant on what the client platform actually does when
copy/pasting from another application window into the form.

That can have some unexpected glitches, since Word (especially older
versions) has a nasty habit of changing to a non-standard font e.g
Symbol and inserting a Latin letter (e.g W) to get a symbol (e.g Omega
or Ohm sign).  This doesn't really work in HTML - MS of course will
fool its users by repeating the error in MSIE, but a properly
conforming www-compatible browser will display the W that the markup
asked for - not the symbol that was intended.

Site Timeline