Parsing HTML with HTML::Tree

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


  I am trying to parsing the following HTML content:

-- first part
<td class="storyTitle"> @
  <a href="/GeneralContent/MySearch.aspx?PagePrefix=IN&amp;
target="_new"> @

-- second part
<td class="storyTitle"> @
  <b> @
    "Something here"

I am using HTML:Tree to parse the HTML and what I would like to do is
that whenever there isn't any  <a href=.....> segment as in the second
part of the HTML, I will print something else, such as "Error
occurred". Notice that both first and second parts of the HTML have
common text of "<td class="storyTitle">", which I use for search

My problem is that I don't know what the following code will return
whenever <a href=...> is not found. I tried to test against "" or
undef, but doesn't seem to work.

The following is some of my code and it doesn't work as I wish.

use strict;
use LWP::Simple;
use HTML::Tree;

if ($td->attr('class') eq 'storyTitle')
  if (my $sym = $td->find('a'))
    if ($sym->as_text() ne '')
      print $sym->as_text() . "\n";
      print "Error Occurred" . "\n";

Re: Parsing HTML with HTML::Tree

Quoted text here. Click to load it

You have a logic problem.

You have written:

   if ( found a <a> )
       # do something

So your code cannot to anything if an <a> is not found.

Quoted text here. Click to load it

If an <a> is not found then this if-condition is false and the program
is done, none of the code below here will be executed. So you want your
code to be structured something like this:

    if (my $sym = $td->find('a')) {
        print $sym->as_text(), "\n";
    else {
        print "Error Occurred\n";

Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg0cm.j.dat/"

Re: Parsing HTML with HTML::Tree


  Thanks for your advice. You hit the nail on the head and it works
well now.


Site Timeline