HTML::FormatText problem

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!


I have a curious problem with HTML::FormatText and I wonder if anybody
can help me.

I have a bunch of patent documents in a local directory from which I am
extracting the title, abstract, etc for each patent to insert into a
MySQL database. The core lines of the script where I am having problems

use HTML::FormatText;
my $plain_page =
HTML::FormatText->new->format(parse_htmlfile($local_patent_file)) regex stuff with $plain_page...

This works fine - except - it seems - when the patent document contains
the string "##STR1##" which is used in the patent documents to
represent a complex formula. This seems to kill HTML::FormatText, in
other words $plain_page is undefined.

Obviously '#' is used in Perl to represent a comment but I'm surprised
if it affects HTML::FormatText is such a simple way. Maybe ##X## does
something, I honestly don't know.

If anybody had any suggestions, opinions, work-arounds or alternative
suggestions I'd be very grateful.



Site Timeline