" ?"/>

HTML:Parser how to remove "//" ?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


I'm using HTML::Parser to strip HTML tags from my files. I noticed
how //<![cdata[ ... //]]> and the javascript between that is not
stripped. Any idea how to do this?


Re: HTML:Parser how to remove "//" ?

The CDATA tag can be looked upon as being a comment in HTML.

According to the documentation at
you have to disable the strict_comment switch to strip such tags:

$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".

The official behaviour is enabled by enabling this attribute.

Enabling of 'strict_comment' also disables recognizing these forms as

  </ comment>
  <! comment>                         notice how this is similar to
the first two and last characters of <!  [cdata[...//]] >

Site Timeline