Debugging charset problems with XSLT and PHP4/MySQL4

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


We've been building a pretty big web app here for internal use. SMS
text messages come in from an aggregator and are stored in a MySQL 4
db. Our operators then deal with them using a web interface. The db is
queried using PHP4 and the results output as XML which is then
transformed using XSLT into XHTML.

Now, in our testing environment everything works just fine. However,
when we try and run it with actual live data, any incoming SMS message
that contains a non-ASCII character breaks the system at the Sablotron
stage (invalid token).

Now the aggregating service is sending us the incoming messages UTF-8
encoded. The XML and XSL is all set up to be UTF-8. However, somewhere
along the lines something is getting screwed up so that Sablotron barfs
(typical examples are pound signs or euro signs).

I'm having a hard time debugging this because as far as I can tell
everything is set to be using UTF-8 by default. Clearly something isn't
(MySQL possibly). I'd really appreciate some pointers for things to



Re: Debugging charset problems with XSLT and PHP4/MySQL4

After spending hours googling and checking mailing list archives I can
see that many have come across this problem but there is very little by
way of solutions. However, I think I have managed to isolate what I
think is causing the problem. It's the use of "echo" in various places.
I did not know that echo's output is always ASCII. Now I'd actually
like to re-write the various parts that use "echo" in a totally
different way, but in the meantime what is a multibyte equivalent of

Re: Debugging charset problems with XSLT and PHP4/MySQL4


My last response was a complete red-herring. I got this inaccurate
information from the last comment on this bug report:

After going up that blind alley I did the sensible thing and tested for
myself whether echo would output utf-8 by creating a utf-8 PHP file

echo "<long list of random non-ascii unicode characters>";

in it and of course it worked fine (with default_charset = "utf-8" in

I believe the problem was to do with our MySQL tables using Latin1. We
seem to have some kind of workaround in place using
mb_convert_encoding($xml, "UTF-8", "Latin1") before sending the xml
through Sablotron. Frankly I'm still confused but things are at least
working for now.

Best, Darren

Site Timeline