XML Parser

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hello everyone, was wondering if someone might be able to point me in
the right direction. I have been working on a project that involves
reading XML files and taking some information from the file and
inserting into a MySQL db. I am fairly new to processing XML. I am
using xml simple to retrieve the needed information. I am getting all
the information I need, but when I pull it out of the XML I am getting
entities converted to their single characters (&) becomes &, etc.
Is there a way to keep the entity or do I have to use something else
to convert the string and have the characters into their proper format
before I insert it into the db?

Re: XML Parser

Quoted text here. Click to load it

Simple isin't a parser. It uses a parser though. If Simple supports
it, you have to tell it to pass on to the parser that you want raw
content (original_content()) instead of translated.

Usually though, you use a SAX parser (Simple Api Xml) with your own
handlers to capture raw xml (original_content()) then, send the xml,
tags, attrib's, whatever, to Simple to have it convert into a structure.

Hopefully, you aren't using Simple to do the entire xml document.
Thats not such a good way to do it.

Of course, you could use RxParse (my module) version 2b which isin't
released yet, to do all of what Simple does and a hell of alot more.
I'm just finishing up on non-blocking and I will post it soon.


Re: XML Parser

Quoted text here. Click to load it

.. so then, if your source XML had "&" in it, you would
end up with "&" in the result...

Quoted text here. Click to load it

If your XML is truly "simple" (eg. no CDATA sections) then you
can take the simple-minded approach of preprocessing your XML
before feeding it to XML::Simple:

    $xml =~ s/&/&/g;

or, if you'd like to be a little more careful with your hack:

    $xml =~ s/&(\w+;)/&$1/g;

Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

Re: XML Parser

Quoted text here. Click to load it

I am not sure why you would want to store strings with embedded entities
in a database, but you can simply reencode them before inserting them,
for example with escapeHTML from the CGI module or something like

    $s =~ s/([&<>'"])/sprintf("&#%d;", ord($1))/eg;


Site Timeline