Converting accented characters to entities

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I have a website with accented characters.  Do I have to convert them
into html entities in XHTML 1.0 strict and charset=iso-8859-1?

If so, could you recommend a freeware?

Thank you.

Re: Converting accented characters to entities

Quoted text here. Click to load it

No, just make sure your pages are properly saved in ISO-8859-1 and that
the server is configured to deliver the correct charset in the
Content-Type header.

That's assuming ISO-8859-1 covers all the accented characters you need--
what language is it for? If it's French then you should be fine. If it's
Vietnamese (say) then you need a different encoding, probably UTF-8.

Re: Converting accented characters to entities

Ben C wrote:
Quoted text here. Click to load it
How to check about the hosting server please?
Quoted text here. Click to load it
Yes that's French.

Re: Converting accented characters to entities

Quoted text here. Click to load it

If you do things correctly, then they'll work equally well in any of
three ways (even mixed on the same page).
* Directly entered characters    "=E9"
* HTML entity references         é
* numeric character entities     é

Just make sure that the web server sends a _matching_ encoding for how
the document was itself encoded. It doesn't matter which encoding you
author in (of encodings that contain the characters you need), so long
as you match it with the HTTP content-type header.

Ignore <meta> inside the page. It's of no use on the web and is often

If you can't reliably control the HTTP content-type header, then use
either form of the entities.

If you can have the HTTP content-type header set once, but only once,
then set it to UTF-8  (this is quite common in a corporate

Some (surprisingly little-known) things that you ought to understand:

 - Unicode is a character set, UTF-8 is an encoding to represent this
as a sequence of data. The two are separate functions.

 - That Unicode character set is used throughout HTML, whether you
like it or not. When you use numeric character entities, even from an
ISO-8859-* page, the numbers you use refer to Unicode, not to ISO.

I would suggest avoiding ISO-8859-* in favour of UTF-8.  Some of your
tools will no longer work, but there are plenty that will replace
them, and for free. These days a tool that isn't UTF-8 clean has
little place in a web design shop.  The great advantage of UTF-8 is
obviously when you have to support multiple languages - it's near-
essential for doing this on the same page, but it's even worth doing
if you only have to support different language clients from the same

Watch out for UTF-16 from some Windows tools!  That "Save as Unicode"
option is often the wrong thing - look further down for UTF-8.

Don't use a BOM (aka UTF-8Y) as that's incompatible with ASCII (and
most ISO-8859-* characters) encodings.

If your authoring process is only ASCII-clean and you only need
Western European characters, then the character entity references
(e.g. &eacute; rather than for &#233; for "=E9") are simple and robust
against mistakes.

If you need characters from outside Western Europpe, then you can't
use character entity references (for any encoding). If you use
ISO-8859-1 encoding then you MUST use numeric character entities.  If
you use UTF-8 then you can use either characters entered directly, or
numeric character entities. As the numerics are hard to proof-read,
this alone is enough reason to favour UTF-8

I'd also suggest dropping XHTML in favour of HTML 4.01 Strict, but
that's for HTML reasons, not character encoding.

Re: Converting accented characters to entities

Quoted text here. Click to load it

Use UTF-8 whenever you can.
UTF-8 is able to represent any character in the Unicode standard, yet
the initial encoding of byte codes and character assignments for UTF-8
is backwards compatible with ASCII.
For these reasons, it is steadily becoming the preferred encoding for
e-mail, web pages, and other places where characters are stored or


    * UTF-8 is a superset of ASCII. Since a plain ASCII string is also
a valid UTF-8 string, no conversion needs to be done for existing
ASCII text. Software designed for traditional non-extended ASCII
character sets can generally be used with UTF-8 with few or no
    * Sorting of UTF-8 strings using standard byte-oriented sorting
routines will produce the same results as sorting them based on
Unicode code points. (This has limited usefulness, though, since it is
unlikely to represent the culturally acceptable sort order of any
particular language or locale.)
    * UTF-8 and UTF-16 are the standard encodings for XML documents.
All other encodings must be specified explicitly either externally or
through a text declaration. [1]
    * Any byte oriented string search algorithm can be used with UTF-8
data (as long as one ensures that the inputs only consist of complete
UTF-8 characters). Care must be taken with regular expressions and
other constructs that count characters, however.
    * UTF-8 strings can be fairly reliably recognized as such by a
simple algorithm. That is, the probability that a string of characters
in any other encoding appears as valid UTF-8 is low, diminishing with
increasing string length. For instance, the octet values C0, C1, F5 to
FF never appear. For better reliability, regular expressions can be
used to take into account illegal overlong and surrogate values (see
the W3 FAQ: Multilingual Forms for a Perl regular expression to
validate a UTF-8 string).

Site Timeline