Click here to get back home

Parsing XML data as it arrives from LWP call

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Parsing XML data as it arrives from LWP call Steve B 01-11-2005
Posted by Steve B on January 11, 2005, 1:57 am
Please log in for more thread options
Greetings,

I am trying to improve the performance of a Perl application that uses LWP
to
request, and then parse XML (via XML::Node) data from eBay using their
prescribed developer API.

I would like to parse the resulting XML data as it's being returned as
opposed to waiting for all the data to arrive as my code is designed now.

The existing high level flow is as follows -

Build request XML input parms
Request the XML from eBay
Register XML::Node XML tag/variable names to save and subroutines to call
during the XML::Node parse
The registered subroutines are called by XML::Node based on XML start/end
structures
Parse the returned XML data using the Perl XML::Node package.

I have another LWP routine that parses for image data as it arrives. It
requests a page and parses all the <img> tags but I'm having difficulty
converting the XML::Node routine to use this same design.

This is my attempt so far to redesign my XML:Node parsing to perform the
parse while data is being returned -

$posturl = 'https://api.' . $eBayURL . '/ws/api.dll';
sub LWPcallback {
$XML_reply = @_->content;
?????????
}
$objUserAgent = LWP::UserAgent->new; # Create user agent
$objRequest = HTTP::Request->new("POST", $posturl, $objHeader, $request); #
Build the request
$xml_node = XML::Node->new(&LWPcallback); # Define the parser
# Register XML tag/variables to save, and subroutines to call, during the
parse
$xml_node->register(">eBay>SellerList>Item>Id","char" => $api_itemnum);
$xml_node->register(">eBay>SellerList>Item>SiteId","char" =>
$api_siteid);
$xml_node->register(">eBay>SellerList>Item","end" => &handle_item_end);
etc.
$objResponse = $objUserAgent->request($objRequest,
sub); #Issue request and parse response

I'm confused on what/when/where the parsing will take place in this scenario
and have the following initial questions -

1) What processing needs to take place in the LWPCallback subroutine ?
2) Do I need to re-register the XML::Node tag/variables and subroutines in
the callback ?

This is the working image parse code that I based the above design on -

$ua = new LWP::UserAgent;
@imgs = ();
sub LWPcallback {
my($tag, %attr) = @_;
return if $tag ne 'img'; # we only look closer at <img ...>
push(@imgs, values %attr);
}
$LWP_p = HTML::LinkExtor->new(&LWPcallback);
$res = $ua->request(HTTP::Request->new(GET => $LWP_itemurl),
sub);
my $base = $res->base;
@imgs = map { $_ = url($_, $base)->abs; } @imgs;


My environment -

OS - Red Hat EL 3 (Intel)
Perl Version - v5.8.1 built for i686-linux
Perl Modules -
perl-libwww-perl-5.65-6
>> rpm -qa | grep XML
perl-XML-Dumper-0.4-25
perl-XML-Twig-3.09-3
perl-XML-Encoding-1.01-23
perl-XML-Grove-0.46alpha-25
PyXML-0.7.1-9
perl-XML-Parser-2.31-15

Any assistance or advice is most appreciated.

Thanks,
Steve




Posted by Bart Lateur on January 11, 2005, 11:35 am
Please log in for more thread options
Steve B wrote:

>I am trying to improve the performance of a Perl application that uses LWP
>to
>request, and then parse XML (via XML::Node) data from eBay using their
>prescribed developer API.
>
>I would like to parse the resulting XML data as it's being returned as
>opposed to waiting for all the data to arrive as my code is designed now.

May I point you to a recent thread on Perlmonks,

        Incremental parsing of multiple XML streams?
        <http://perlmonks.org/?node_id=420383>

As a result, the starter of that thread, has written a XML::SAX plugin
module to process incoming XML in chunks. I'm sure it'll be on CPAN
soon.

        <http://search.cpan.org/~nuffin/>

Ah, there... check those two XML::SAX::* modules, ExpatNB and
Expat::Incremental, they are related.

The way I'd tackle it is to use LWP to fetch a page using callbacks on
chunks (of for example 4k), and feed them to the incremental parser, one
by one.

--
        Bart.


Similar ThreadsPosted
Image data parsing October 27, 2004, 3:36 pm
How to solve memory problems while running a script parsing huge data July 13, 2004, 1:23 pm
Win32::Ole and Call by reference October 22, 2004, 10:06 am
How to call PL/SQL procs using Oraperl ? December 8, 2004, 5:22 pm
How can I call MFC functions from Perl July 19, 2005, 9:46 pm
System call fails in webserver February 11, 2006, 11:04 am
Can't call method on an undefined value at June 27, 2005, 2:56 pm
Stored Procedure call using MSSQL::DBLIB April 6, 2006, 5:06 pm
Crypt::CBC Can't call method "blocksize" on unblessed reference July 12, 2004, 10:01 am
XML::DOM parsing pb March 9, 2006, 1:27 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap