Click here to get back home

How to render HTML as text (like lynx does) ?

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
How to render HTML as text (like lynx does) ? Adrian 04-18-2005
Get Chitika Premium
Posted by Adrian on April 18, 2005, 6:53 pm
Please log in for more thread options


Hi,

I've been playing around with a lot of modules but can't find
one that simply represents HTML as text, laying out the tables
so that things appear in approximately the right place on the
page according to table structures etc.. lynx does a good job
of it (eg 'lynx http://groups.google.com' then redirect output
to a file) but surely there's a good perl module around?

Can anyone help?

Thanks,
Adrian.



Posted by Keith Keller on April 18, 2005, 10:28 pm
Please log in for more thread options



> I've been playing around with a lot of modules but can't find
> one that simply represents HTML as text, laying out the tables
> so that things appear in approximately the right place on the
> page according to table structures etc.. lynx does a good job
> of it (eg 'lynx http://groups.google.com' then redirect output
> to a file) but surely there's a good perl module around?

use HTML::FormatText;

--keith

--
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://wombat.san-francisco.ca.us/cgi-bin/fom
see X- headers for PGP signature information



Posted by Adrian on April 18, 2005, 11:33 pm
Please log in for more thread options


Hi Keith,

Thanks for your reply..

Keith Keller wrote:
>
> > I've been playing around with a lot of modules but can't find
> > one that simply represents HTML as text, laying out the tables
> > so that things appear in approximately the right place on the
> > page according to table structures etc.. lynx does a good job
> > of it (eg 'lynx http://groups.google.com' then redirect output
> > to a file) but surely there's a good perl module around?
>
> use HTML::FormatText;
>

I tried HTML::FormatText but it doesn't seem to render tables
correctly (or at all).. Compare the output from HTML::FormatText
to that of lynx when dealing with basic HTML tables..

eg: http://www.uottawa.ca/student/infoservice/nav_e.html

The HTML::FormatText just sticks any table elements (TD or TR)
on a new line.

Any other suggestions?

Thanks,
Adrian.



Posted by Rudi on April 19, 2005, 9:57 pm
Please log in for more thread options


Adrian,

not a solution, but maybe a starting point:
http://www.perl.com/pub/a/2003/09/17/perlcookbook.html

I just came across the same issue.

Rudi


Adrian wrote:
....

> The HTML::FormatText just sticks any table elements (TD or TR)
> on a new line.
>
> Any other suggestions?
>
> Thanks,
> Adrian.
>


Similar ThreadsPosted
How to text in HTML::Element October 23, 2004, 7:31 pm
How to *modify* text in HTML::Element October 23, 2004, 8:16 pm
Problem with body text extraction with HTML::Parser December 13, 2005, 3:28 pm
trying to use HTML::Mason on apache2 but scripts come up as plain text in the browser October 23, 2006, 1:50 am
[RFC] HTML::Dashboard (Spreadsheet-like formatting for HTML tables) April 16, 2007, 4:50 pm
text-chm May 6, 2005, 10:53 pm
Help reading PDF to get text... November 26, 2004, 3:50 am
ANNOUNCE: Text::Iconv 1.4 July 18, 2004, 1:41 am
Text::CHM on SuSE 9.3 x86_64 September 19, 2005, 8:08 pm
text::tagtemplate question. December 6, 2004, 12:19 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap