Click here to get back home

HTML::TableExtract with headers constraint, exluding right-most column

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
HTML::TableExtract with headers constraint, exluding right-most column Jim Monty 05-15-2005
Get Chitika Premium
Posted by Jim Monty on May 15, 2005, 3:31 pm
Please log in for more thread options


I'm using the fine module HTML::TableExtract v1.10 by Matt Sisk to
extract data from an HTML table, but I'm getting unexpected behavior.
When I use the headers constraint on a simple three-column table and
request all three columns, all's well. If I specify just the two
right-most columns, all's still well. But if I exclude the right-most
column, I get a bogus first row of empty values.

C:\>cat table.pl
#!/usr/bin/perl

use strict;
use warnings;
use HTML::TableExtract;
use Data::Dumper;

my $html = <<EOT;
<html><head><title>Names</title></head>
<body>
<table>
<tr><td>LastName</td><td>FirstName</td><td>MI</td></tr>
<tr><td>Doe</td><td>Jane</td><td></td></tr>
<tr><td>Doe</td><td>John</td><td></td></tr>
<tr><td>Public</td><td>John</td><td>Q</td></tr>
</table>
</body>
</html>
EOT

my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);

$te->parse($html);

my @rows = $te->rows;
print Dumper @rows;

exit 0;

__END__

C:\>perl table.pl
$VAR1 = [
'Doe',
'Jane',
''
];
$VAR2 = [
'Doe',
'John',
''
];
$VAR3 = [
'Public',
'John',
'Q'
];

C:\>

If I change

my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);

to

my $te = HTML::TableExtract->new(
headers => [ qw( FirstName MI ) ]
);

I still get good results:

C:\>perl table.pl
$VAR1 = [
'Jane',
''
];
$VAR2 = [
'John',
''
];
$VAR3 = [
'John',
'Q'
];

C:\>

But when I exclude the right-most column (in this case, "MI")

my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName ) ]
);

I get an unexpected (and unwanted) empty first row:

C:\>perl table.pl
$VAR1 = [
'',
''
];
$VAR2 = [
'Doe',
'Jane'
];
$VAR3 = [
'Doe',
'John'
];
$VAR4 = [
'Public',
'John'
];

C:\>

¿Qué pasa?

--
Jim Monty



Similar ThreadsPosted
HTML::TableExtract October 11, 2004, 9:30 pm
Find Missing Column and Extra Column March 12, 2007, 6:33 am
find a first column value ... May 19, 2006, 12:54 pm
Everything but the column: win32::OLE Excel fun June 7, 2006, 9:07 pm
Win32::ODBC Find Primary Key Column February 6, 2005, 8:21 pm
http request headers October 1, 2004, 12:47 pm
http headers with CGI.pm and mod_perl (bug?) November 10, 2004, 10:19 pm
XML::Twig parseurl with input Headers/XML January 16, 2005, 10:58 pm
Premature Ending of script headers August 17, 2005, 4:53 pm
Problem with CPAN and tar file headers May 16, 2007, 4:57 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap