|
Posted by Jim Monty on May 15, 2005, 3:31 pm
Please log in for more thread options
I'm using the fine module HTML::TableExtract v1.10 by Matt Sisk to
extract data from an HTML table, but I'm getting unexpected behavior.
When I use the headers constraint on a simple three-column table and
request all three columns, all's well. If I specify just the two
right-most columns, all's still well. But if I exclude the right-most
column, I get a bogus first row of empty values.
C:\>cat table.pl
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use Data::Dumper;
my $html = <<EOT;
<html><head><title>Names</title></head>
<body>
<table>
<tr><td>LastName</td><td>FirstName</td><td>MI</td></tr>
<tr><td>Doe</td><td>Jane</td><td></td></tr>
<tr><td>Doe</td><td>John</td><td></td></tr>
<tr><td>Public</td><td>John</td><td>Q</td></tr>
</table>
</body>
</html>
EOT
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);
$te->parse($html);
my @rows = $te->rows;
print Dumper @rows;
exit 0;
__END__
C:\>perl table.pl
$VAR1 = [
'Doe',
'Jane',
''
];
$VAR2 = [
'Doe',
'John',
''
];
$VAR3 = [
'Public',
'John',
'Q'
];
C:\>
If I change
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName MI ) ]
);
to
my $te = HTML::TableExtract->new(
headers => [ qw( FirstName MI ) ]
);
I still get good results:
C:\>perl table.pl
$VAR1 = [
'Jane',
''
];
$VAR2 = [
'John',
''
];
$VAR3 = [
'John',
'Q'
];
C:\>
But when I exclude the right-most column (in this case, "MI")
my $te = HTML::TableExtract->new(
headers => [ qw( LastName FirstName ) ]
);
I get an unexpected (and unwanted) empty first row:
C:\>perl table.pl
$VAR1 = [
'',
''
];
$VAR2 = [
'Doe',
'Jane'
];
$VAR3 = [
'Doe',
'John'
];
$VAR4 = [
'Public',
'John'
];
C:\>
¿Qué pasa?
--
Jim Monty
|
| Similar Threads | Posted | | HTML::TableExtract | October 11, 2004, 9:30 pm |
| Find Missing Column and Extra Column | March 12, 2007, 6:33 am |
| find a first column value ... | May 19, 2006, 12:54 pm |
| Everything but the column: win32::OLE Excel fun | June 7, 2006, 9:07 pm |
| Win32::ODBC Find Primary Key Column | February 6, 2005, 8:21 pm |
| http request headers | October 1, 2004, 12:47 pm |
| http headers with CGI.pm and mod_perl (bug?) | November 10, 2004, 10:19 pm |
| XML::Twig parseurl with input Headers/XML | January 16, 2005, 10:58 pm |
| Premature Ending of script headers | August 17, 2005, 4:53 pm |
| Problem with CPAN and tar file headers | May 16, 2007, 4:57 am |
|