XML parsing and HTML comments

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hi all,

I find myaelf writing a template system (yeah, I know - but there is a
reason I'm not using an existing one). So I'm trying to parse xhtml using
the builtin expat parser. Mostly it works fine, however it ignores anything
that looks like an HTML comment. This is a bit of a problem as I see a lot
of code written like:

<script type='text/javascript>
        alert("hello world');
// -->
<style type="text/css">
.style1 {
        font-family: Verdana, Arial, Helvetica, sans-serif;
        font-size: 9px;

Now obviously the browser is seeing the stuff inside '<!--' ...'-->' but
expat doesn't. I tried adding a non-parsed handler, but still can't see it.

Anybody fixed this?


Re: XML parsing and HTML comments

On Tue, 13 Sep 2005 13:27:18 +0100, Colin McKinnon

Quoted text here. Click to load it

 expat (the XML parser used in these functions) has support for adding comment
handlers, but that doesn't appear to be hooked into the PHP extension, so you
can't get at that functionality without patching the source of the extension.

 If you look in the PHP source, under ext/xml/xml.c you see:

/* Short-term TODO list:
 * - Implement XML_ExternalEntityParserCreate()
 * - XML_SetCommentHandler
 * - XML_SetCdataSectionHandler
 * - XML_SetParamEntityParsing

 The second one being what you want.

Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool

Re: XML parsing and HTML comments

Andy Hassall wrote:

Quoted text here. Click to load it


Thanks Andy. At least I know I'm not doing something stupid.

For software I'm planning to release, patching the source isn't an ideal
solution. I managed to implement a workaround by running this on the XML

$xml=str_replace('<!--', '<![CDATA[<!--', $xml);
$xml=str_replace('-->', '-->]]>', $xml);

(again not ideal, but hopefully less painful than recompiling/maintaining


Site Timeline