Click here to get back home

LWP user agent grabs the intermediate wait page after POST intead of the actual result page

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
LWP user agent grabs the intermediate wait page after POST intead of the actual result page bhabs 02-12-2008
Posted by bhabs on February 12, 2008, 12:50 am
Please log in for more thread options
Hi,

I wrote a small LWP based perl program to search the air fare from a
travel website using POST.

#!/usr/bin/perl
use strict;
use CGI;
use LWP;

my $web_browser = LWP::UserAgent->new();
push @{ $web_browser->requests_redirectable }, 'POST';
$web_browser->timeout(300);
my $web_response = ();

$web_response = $web_browser->post('http://blabla.com/travel/
InitialSearch.do',
[
'fromCity' =>
'SFO',
'toCIty'
=> 'CVG'
.... #the rest
of the fields occur here
],
);

die "Error: ", $web_response->status_line()
unless $web_response->is_success;

my @content = $web_response->content;
print "@content";

When I print the content, I see the "intermediate" wait page (where it
displays the progress bar using javascript.... => I matched the
content with the "view source" from IExplorer)
I am unable to capture the final air fare page. It takes time for the
website to do the search and then display the air fare result page.
How do I make my program wait for the actual result and not grab the
intermediate response.

Could anyone please help me on this?

Regards,
bhabs

Posted by Christian Winter on February 12, 2008, 1:15 am
Please log in for more thread options
bhabs wrote:
> I wrote a small LWP based perl program to search the air fare from a
> travel website using POST.
>
[...code snipped]
>
> When I print the content, I see the "intermediate" wait page (where it
> displays the progress bar using javascript.... => I matched the
> content with the "view source" from IExplorer)
> I am unable to capture the final air fare page. It takes time for the
> website to do the search and then display the air fare result page.
> How do I make my program wait for the actual result and not grab the
> intermediate response.

You have to simulate what the browser does, and from your
description, this is most likely a repeated ajax request
to the server. Analyze the behaviour of the javascript
and see how it fetches the progress state and what it
does once the result is calculated, then craft those
actions yourself. You best chances to see exactly what is going
on in the background is with a network sniffer like wireshark,
or a browser plugin like Firefox' Live HTTP Headers.

-Chris

Posted by Ben Morrow on February 12, 2008, 11:08 am
Please log in for more thread options

> bhabs wrote:
> > I wrote a small LWP based perl program to search the air fare from a
> > travel website using POST.
> >
> [...code snipped]
> >
> > When I print the content, I see the "intermediate" wait page (where it
> > displays the progress bar using javascript.... => I matched the
> > content with the "view source" from IExplorer)
> > I am unable to capture the final air fare page. It takes time for the
> > website to do the search and then display the air fare result page.
> > How do I make my program wait for the actual result and not grab the
> > intermediate response.
>
> You have to simulate what the browser does, and from your
> description, this is most likely a repeated ajax request
> to the server. Analyze the behaviour of the javascript
> and see how it fetches the progress state and what it
> does once the result is calculated, then craft those
> actions yourself. You best chances to see exactly what is going
> on in the background is with a network sniffer like wireshark,
> or a browser plugin like Firefox' Live HTTP Headers.

Or http://www.research.att.com/sw/tools/wsp/ , which will write a Perl
script to make the appropriate requests for you.

Ben


Posted by Tad J McClellan on February 12, 2008, 8:55 pm
Please log in for more thread options
> bhabs wrote:
>> I wrote a small LWP based perl program to search the air fare from a
>> travel website using POST.
>>
> [...code snipped]
>>
>> When I print the content, I see the "intermediate" wait page (where it
>> displays the progress bar using javascript.... => I matched the
>> content with the "view source" from IExplorer)
>> I am unable to capture the final air fare page. It takes time for the
>> website to do the search and then display the air fare result page.
>> How do I make my program wait for the actual result and not grab the
>> intermediate response.
>
> You have to simulate what the browser does, and from your
> description, this is most likely a repeated ajax request
> to the server. Analyze the behaviour of the javascript
> and see how it fetches the progress state and what it
> does once the result is calculated, then craft those
> actions yourself. You best chances to see exactly what is going
> on in the background is with a network sniffer like wireshark,


I like the Web Scraping Proxy for this, it logs the traffic in
the form of LWP Perl code:

http://www.research.att.com/sw/tools/wsp/


> or a browser plugin like Firefox' Live HTTP Headers.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

Similar ThreadsPosted
Is there a module that grabs a remote page and prints thumbnail image? May 26, 2006, 12:13 am
How to create a Wait Page? February 25, 2006, 8:25 pm
How can I use wget to get the result of jsp page April 17, 2007, 10:25 pm
Replacing the POD "User Contributed Perl Documentation" page title January 5, 2006, 7:57 am
How to show 'expiry page' when user click 'back' button in browser ? October 14, 2004, 10:45 am
LWP user agent query August 26, 2005, 2:40 pm
Facile user-agent statistics tool December 9, 2004, 5:35 pm
How to get the DOM from a XML page November 27, 2006, 6:54 am
How to automatically log in a web page? October 8, 2004, 8:53 am
How do I parse this page? October 26, 2004, 2:25 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap