Click here to get back home

HTML:Parser how to remove "//" ?

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
HTML:Parser how to remove "//" ? Gerwin 01-31-2007
Posted by Gerwin on January 31, 2007, 6:00 am
Please log in for more thread options


Hi,

I'm using HTML::Parser to strip HTML tags from my files. I noticed
how //<![cdata[ ... //]]> and the javascript between that is not
stripped. Any idea how to do this?

-Gerwin


Posted by Andy on January 31, 2007, 2:15 pm
Please log in for more thread options


The CDATA tag can be looked upon as being a comment in HTML.

According to the documentation at
http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm
you have to disable the strict_comment switch to strip such tags:

$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".

The official behaviour is enabled by enabling this attribute.

Enabling of 'strict_comment' also disables recognizing these forms as
comments:

</ comment>
<! comment> notice how this is similar to
the first two and last characters of <! [cdata[...//]] >




Similar ThreadsPosted
Possible bug in HTML::Parser November 15, 2005, 5:05 pm
HTML::Parser error December 1, 2005, 8:31 am
I want to learn something about HTML parser. December 8, 2005, 12:12 am
HTML-Parser-3.56 build problem February 6, 2007, 4:32 am
ANNOUNCE: spodcxx v0.21, a (s)POD Parser and (s)POD to HTML converter August 3, 2005, 10:44 am
HTML-Parser: storing into a DB words with special chars September 21, 2005, 2:40 am
Woes installing HTML::Parser using -MCPAN or by hand September 3, 2005, 2:11 am
Problem with body text extraction with HTML::Parser December 13, 2005, 3:28 pm
Need to find a module which can remove attachment from emails July 15, 2004, 6:18 pm
How do I totally remove module installed via MakeMaker February 5, 2005, 1:58 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap