non-utf characters and XML

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

My problem:
I'm using PHP to dynamically create an XML document. However, some of
my data (from MySQL) contains non-UTF characters such as the umlaut.
Naturally, browsers like IE 7 throw an error when attempting to parse
these characters. I understand that these characters are invalid for

My question:
What is the best to handle these characters when creating XML
documents on the fly? It seems like searching and replacing these
characters would be complicated, and there must be an easier way.


Re: non-utf characters and XML

Quoted text here. Click to load it

Actually Umlauts are in UTF-8. But you should tell your browser which
character set you are using.
You could do that in the xml header, e.g.
   <?xml version="1.0" encoding="utf-8"?>

or set it in the header using php, e.g.
   header('content-type: text/html; charset=utf-8');

which is basically the same as the meta tag
   <meta http-equiv="Content-Type" content="text/html;charset=utf-8">

or let .htaccess do the job, e.g.
   AddCharset utf-8 .css .html .xhtml .xml .php

good luck

online accounting on bash bases
Online Einnahmen-Ausgaben-Rechnung
m2m server software gmbh

Re: non-utf characters and XML

On Thu, 08 Nov 2007 08:10:11 +0100, Martin Mandl - m2m tech support  =

Quoted text here. Click to load it

Indeed. When using UTF-8, avoid a BOM btw.

Quoted text here. Click to load it

Do serve XML as XML though, it isn't HTML.
-- =

Rik Wasmus

Re: non-utf characters and XML

Quoted text here. Click to load it

If you're only trying to communicate plain text, you can wrap your
text in a CDATA block. Or you  can do a lot of str_replace() to change
them all to HTML  entities.

If the problem is that your XML is outputting things that your users
input, and your users are inputting a lot of junk, then all you can do
is filter out the non-UTF8 stuff. seems_utf8 can be a help, and is
mentioned on this page:

Site Timeline