Escape to Unicode?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I've begun dealing with PHP's XML functions (puttup!)

I shoudl say- php's DEFAULT XML functions, no extensions.  Probably not
5.0.  I don't care...

The POINt is, they choke on funny characters, even encoded funny
characters.  You need to use the unicode.  (change ñ to ñ).


That's why, why- now ignore that part, because it will distract and
proably cause you to misconstrue the thrust of the question to follow:

Does PHP have a function that will escape all funny characters in a
string (encoded, unencoded, both, either...) to their unicode

In a string- ignore the XML parts of this question.

(I'm looking at pre-proscessing the data coming into forms that will
form the offending XML)


Re: Escape to Unicode?

ReGenesis0 wrote:

Quoted text here. Click to load it

There is no such thing as ñ in generic XML. ñ is a purely
HTML concept. There are only five pre-defined entities which XML parsers
are expected to know:


If PHP understood ñ in generic XML it would be behaving  
*incorrectly*. ñ is undefined. (I'm assuming here that you've not
written a DTD that defines what ñ means, which seems like a
reasonable assumption.)

Quoted text here. Click to load it

It seems you want some function that converts:

    ñ    => ñ
    €        => €

You might be able to do this using html_entity_decode() to get everything
in its raw form (e.g. will convert € to €) and then use a regular
expression to convert things into numeric character references (e.g. €
to €). Such a regular expression can be found in soapergem at gmail
dot com's 10 May 2006 comment here:

That said, you're better off correcting the root problem -- that ñ
is not correct XML.

Toby A Inkster BSc (Hons) ARCS
Contact Me  ~

Re: Escape to Unicode?

Toby Inkster wrote:

Quoted text here. Click to load it

...which is precicely what I indent to do-- I want to convert such tags
as they come in as form inputs before they're sent to become XML files.

I'm not asking a question sideways of the problem and missing something
obvious, am I?  I hate when that happens...


Site Timeline