Perl HTML::TableExtract Question

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Hi !

I hope someone can help.

I want to extract data from a table with 2 columns.

A sample of the table can be generated with:-

" "

(Sorry about the long URL :-) )

What I want is the field from the top table Labelled - "Tot. Shares Out."

My Current Code is :-

#!/usr/bin/perl -w

use strict;
use HTML::TableExtract;

my $inFile = "/home/mas/development/URLTemp.tmp";
my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);
$te->parse_file( $inFile );
foreach my $ts ( $te->table_states ) {
         foreach my $row ( $ts->rows ) {
                 print join( ",", @$row, "," ), "\n";

But this seems to get the table lower down the page. This wouldn't be so
bad as it has the value I need repeated but - "How do I get an
un-labelled column ????"

Any help would be appreciated.


Re: Perl HTML::TableExtract Question

Quoted text here. Click to load it

The headers approach will not work since there are no headers
on the table that contains the data that you are after.

Quoted text here. Click to load it


"Tot. Shares Out." is the 7th column in the 12th row of the table
at depth=2 and count=1.

Quoted text here. Click to load it

   my $te = HTML::TableExtract->new( depth => 2, count => 1);
   my $total_outstanding = ($ts->rows)[11]->[6];

    Tad McClellan                          SGML consulting                   Perl programming
    Fort Worth, Texas

Re: Perl HTML::TableExtract Question

Paul wrote:
Quoted text here. Click to load it
" "
Quoted text here. Click to load it
Just a bit more info on this - the ", '*'" doesn't work - in fact it
returns empty data. Without it it assumes that the rows below are what
is wanted and it returns:-

Market Capitalization,,

The real question is "How do I specify a row with a NULL header ??

Re: Perl HTML::TableExtract Question

Tad McClellan wrote:
Quoted text here. Click to load it
Thanks for that Tad !! I got the same answer at about 0230 in the
morning :-(

It seems the page isn't very well constructed.

I spent lots of time looking for the new version of HTML::TableExtract
which is supposed to address rows as well as columns but could only find
fleeting references to it.


Site Timeline