I'm working on application which stores web page content.  Generally I'm
turning the whole page into base64 for ease of storage (into a TEXT field).

But I have another field which opens a socket to the page, sucks down
the HTML source, runs strip_tags and other PHP cleansing functions on
it, and inserts the remaining words into a mySQL TEXT column which is
straight text (not turned to base64).

I encounter a problem with foreign languages when I do a mysqldump.
Some of the characters are non-standard ASCII and I can't merely "cat"
the file back in to a mySQL database.

How do folks of non-latin alphabets deal with this?  Thanks,

Paul Bramscher

