Click here to get back home

HTML to XML in Perl?

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
HTML to XML in Perl? Ilya Zakharevich 05-12-2006
Get Chitika Premium
Posted by Ilya Zakharevich on May 12, 2006, 7:05 am
Please log in for more thread options


Suppose I want to translate an HTML to a XML-well-formed HTML (so that
I can, e.g., apply xsltproc to the result). E.g., HTML::TreeBuilder
can apply "usual heuristics" to parse HTML; how to get XML out of it?

Thanks,
Ilya

Posted by John Bokma on May 12, 2006, 12:34 pm
Please log in for more thread options



> Suppose I want to translate an HTML to a XML-well-formed HTML (so that
> I can, e.g., apply xsltproc to the result). E.g., HTML::TreeBuilder
> can apply "usual heuristics" to parse HTML; how to get XML out of it?

Question: I use XML, not XHTML, at home, and use XML::Twig to convert it
to HTML. I can use xsltproc if I want to on the XML file.

You might want to traverse the parse tree HTML::TreeBuilder generates.
Also, not 100% sure, but it might me that HTML tidy can do the XHTML
conversion for you:

Google...

"Validator fixes errors in HTML and XHTML. Converts HTML to XHTML. Free
Software."

http://www.google.com/search?q=html%20tidy%20xhtml

Sounds like it does :-D.

--
John Bokma Freelance software developer
&
Experienced Perl programmer: http://castleamber.com/

Similar ThreadsPosted
HTML::TreeBuilder eating my entities using perl 5.8.x May 18, 2005, 3:08 am
Quick Perl, HTML, CSS, JavaScript reference April 26, 2006, 10:34 am
HTML::Mason, mod_perl on Win32 w/ActiveState Perl December 21, 2004, 11:46 pm
CGI::StringDB Embedding perl data structures in an HTML post. July 10, 2005, 9:13 pm
[RFC] HTML::Dashboard (Spreadsheet-like formatting for HTML tables) April 16, 2007, 4:50 pm
I want an perl module for conver large html page file to multi little pages November 14, 2004, 3:02 am
HTML ---> PDF October 27, 2004, 2:13 am
HTML::TableExtract October 11, 2004, 9:30 pm
[RFC] HTML::FormatData May 13, 2005, 2:51 pm
[RFC] HTML::CheckArgs May 13, 2005, 2:49 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap