encoding question

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Somehow I mixed the encodings on various pages. When I
validate the pages at W3, they pass but with a warning. On
the validation page I left "Encoding" and "Doctype" set to
"detect automatically".

You can see the validation warning for
http://www.hlthsys.com/aboutphil.html .

Correspondingly, http://www.hlthsys.com/ passes without the

Is there a short explanation of the mistake I've made, why
it is significant, and the effect it has on different browsers?

If that is too many questions, then, why should I care?

Thank you.

Bill B

Re: encoding question

Bill Braun wrote:
Quoted text here. Click to load it

Apologies, those are direct links to the pages, not the


http://validator.w3.org/check?uri=http://www.hlthsys.com /


Re: encoding question

Bill Braun wrote:

Quoted text here. Click to load it

Not really. You give conflicting information about encoding.

Quoted text here. Click to load it

The warning itself is somewhat misleading:
"Character Encoding mismatch!"
It's really a conflict in information about encoding, not a mismatch of
encodings. The explanation says is this rather well:
"The character encoding specified in the HTTP header (utf-8) is different
from the value in the XML declaration (iso-8859-1). I will use the value
from the HTTP header (utf-8)."

The XML declaration is
<?xml version="1.0" encoding="iso-8859-1"?>

The validator does not comment on the meta tag, which is yet another way to
specify encoding:
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />

The conflict is resolved, according to HTML specifications, by giving
preference to HTTP headers, see

Browsers play by such rules, but the situation isn't really safe. In
particular, if your document is saved locally by a user, information about
HTTP headers is usually lost, i.e. it will be interpreted according to the
XML declaration. If it is saved in the cache of a search engine, who knows
what happens?

So the conflict should be avoided. You should specify the same encoding in
the XML declaration and the meta tag as in the HTTP headers.

The HTTP headers are sent by the server and might not be under your direct
control. The server software appears to be Apache, which means that you can
easily control the encoding information in the headers by using a
".htaccess" file - _if_ the server settings decided by the server
administrator allow that. If you can't control the settings, or just won't
bother, then you should of course make sure that your documents actually use
the declared settings (and, to be on the safe side, declare that same
encoding in XML declaration).

In this particular case, the encoding information doesn't really matter. All
the characters in the document are in the ASCII range, so they have the same
representation in UTF-8 as in ISO-8859-1 (and in ASCII).

But the situation would be completely different if you e.g. entered the
copyright sign directly into the HTML document (instead of using the
entity reference &copy;, which of course works independently of encodings)
or replaced the ASCII apostrophe in "What's" by the typographically correct
curly apostrophe, using that character itself.

Quoted text here. Click to load it

It does not declare character encoding in an XML declaration.

The page still has a similar problem; the validator just doesn't catch it,
since its heuristic checks are... er... heuristic.

By XHTML rules, the default encoding, implied in the absence of an XML
declaration, is utf-8. This could be overridden by HTTP headers but not by a
meta tag.

However, if the document is saved locally and opened in an XHTML-ignorant
browser such as Internet Explorer (any version up to and including IE 8), it
will be processed by HTML rules, without implying any XHTML defaults - and
using the meta tag information.

Yucca, http://www.cs.tut.fi/~jkorpela/

Site Timeline