|
Posted by Adrian on April 18, 2005, 6:53 pm
Please log in for more thread options
Hi,
I've been playing around with a lot of modules but can't find
one that simply represents HTML as text, laying out the tables
so that things appear in approximately the right place on the
page according to table structures etc.. lynx does a good job
of it (eg 'lynx http://groups.google.com' then redirect output
to a file) but surely there's a good perl module around?
Can anyone help?
Thanks,
Adrian.
|
|
Posted by Keith Keller on April 18, 2005, 10:28 pm
Please log in for more thread options
> I've been playing around with a lot of modules but can't find
> one that simply represents HTML as text, laying out the tables
> so that things appear in approximately the right place on the
> page according to table structures etc.. lynx does a good job
> of it (eg 'lynx http://groups.google.com' then redirect output
> to a file) but surely there's a good perl module around?
use HTML::FormatText;
--keith
--
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://wombat.san-francisco.ca.us/cgi-bin/fom see X- headers for PGP signature information
|
|
Posted by Adrian on April 18, 2005, 11:33 pm
Please log in for more thread options
Hi Keith,
Thanks for your reply..
Keith Keller wrote:
>
> > I've been playing around with a lot of modules but can't find
> > one that simply represents HTML as text, laying out the tables
> > so that things appear in approximately the right place on the
> > page according to table structures etc.. lynx does a good job
> > of it (eg 'lynx http://groups.google.com' then redirect output
> > to a file) but surely there's a good perl module around?
>
> use HTML::FormatText;
>
I tried HTML::FormatText but it doesn't seem to render tables
correctly (or at all).. Compare the output from HTML::FormatText
to that of lynx when dealing with basic HTML tables..
eg: http://www.uottawa.ca/student/infoservice/nav_e.html
The HTML::FormatText just sticks any table elements (TD or TR)
on a new line.
Any other suggestions?
Thanks,
Adrian.
|
|
Posted by Rudi on April 19, 2005, 9:57 pm
Please log in for more thread options
Adrian,
not a solution, but maybe a starting point:
http://www.perl.com/pub/a/2003/09/17/perlcookbook.html
I just came across the same issue.
Rudi
Adrian wrote:
....
> The HTML::FormatText just sticks any table elements (TD or TR)
> on a new line.
>
> Any other suggestions?
>
> Thanks,
> Adrian.
>
|
| Similar Threads | Posted | | How to text in HTML::Element | October 23, 2004, 7:31 pm |
| How to *modify* text in HTML::Element | October 23, 2004, 8:16 pm |
| Problem with body text extraction with HTML::Parser | December 13, 2005, 3:28 pm |
| trying to use HTML::Mason on apache2 but scripts come up as plain text in the browser | October 23, 2006, 1:50 am |
| [RFC] HTML::Dashboard (Spreadsheet-like formatting for HTML tables) | April 16, 2007, 4:50 pm |
| text-chm | May 6, 2005, 10:53 pm |
| Help reading PDF to get text... | November 26, 2004, 3:50 am |
| ANNOUNCE: Text::Iconv 1.4 | July 18, 2004, 1:41 am |
| Text::CHM on SuSE 9.3 x86_64 | September 19, 2005, 8:08 pm |
| text::tagtemplate question. | December 6, 2004, 12:19 am |
|