|
Posted by bhabs on February 12, 2008, 12:50 am
Please log in for more thread options
Hi,
I wrote a small LWP based perl program to search the air fare from a
travel website using POST.
#!/usr/bin/perl
use strict;
use CGI;
use LWP;
show/hide quoted text
my $web_browser = LWP::UserAgent->new();
push @{ $web_browser->requests_redirectable }, 'POST';
$web_browser->timeout(300);
my $web_response = ();
show/hide quoted text
$web_response = $web_browser->post('http://blabla.com/travel/
InitialSearch.do',
[
show/hide quoted text
'fromCity' =>
'SFO',
'toCIty'
show/hide quoted text
=> 'CVG'
.... #the rest
of the fields occur here
],
);
show/hide quoted text
die "Error: ", $web_response->status_line()
unless $web_response->is_success;
show/hide quoted text
my @content = $web_response->content;
print "@content";
When I print the content, I see the "intermediate" wait page (where it
show/hide quoted text
displays the progress bar using javascript.... => I matched the
content with the "view source" from IExplorer)
I am unable to capture the final air fare page. It takes time for the
website to do the search and then display the air fare result page.
How do I make my program wait for the actual result and not grab the
intermediate response.
Could anyone please help me on this?
Regards,
bhabs
|
|
Posted by Christian Winter on February 12, 2008, 1:15 am
Please log in for more thread options
bhabs wrote:
show/hide quoted text
> I wrote a small LWP based perl program to search the air fare from a
> travel website using POST.
>
[...code snipped]
show/hide quoted text
>
> When I print the content, I see the "intermediate" wait page (where it
> displays the progress bar using javascript.... => I matched the
> content with the "view source" from IExplorer)
> I am unable to capture the final air fare page. It takes time for the
> website to do the search and then display the air fare result page.
> How do I make my program wait for the actual result and not grab the
> intermediate response.
You have to simulate what the browser does, and from your
description, this is most likely a repeated ajax request
to the server. Analyze the behaviour of the javascript
and see how it fetches the progress state and what it
does once the result is calculated, then craft those
actions yourself. You best chances to see exactly what is going
on in the background is with a network sniffer like wireshark,
or a browser plugin like Firefox' Live HTTP Headers.
-Chris
|
|
Posted by Ben Morrow on February 12, 2008, 11:08 am
Please log in for more thread options
show/hide quoted text
> bhabs wrote:
> > I wrote a small LWP based perl program to search the air fare from a
> > travel website using POST.
> >
> [...code snipped]
> >
> > When I print the content, I see the "intermediate" wait page (where it
> > displays the progress bar using javascript.... => I matched the
> > content with the "view source" from IExplorer)
> > I am unable to capture the final air fare page. It takes time for the
> > website to do the search and then display the air fare result page.
> > How do I make my program wait for the actual result and not grab the
> > intermediate response.
>
> You have to simulate what the browser does, and from your
> description, this is most likely a repeated ajax request
> to the server. Analyze the behaviour of the javascript
> and see how it fetches the progress state and what it
> does once the result is calculated, then craft those
> actions yourself. You best chances to see exactly what is going
> on in the background is with a network sniffer like wireshark,
> or a browser plugin like Firefox' Live HTTP Headers.
Or http://www.research.att.com/sw/tools/wsp/ , which will write a Perl
script to make the appropriate requests for you.
Ben
|
|
Posted by Tad J McClellan on February 12, 2008, 8:55 pm
Please log in for more thread options show/hide quoted text
> bhabs wrote:
>> I wrote a small LWP based perl program to search the air fare from a
>> travel website using POST.
>>
> [...code snipped]
>>
>> When I print the content, I see the "intermediate" wait page (where it
>> displays the progress bar using javascript.... => I matched the
>> content with the "view source" from IExplorer)
>> I am unable to capture the final air fare page. It takes time for the
>> website to do the search and then display the air fare result page.
>> How do I make my program wait for the actual result and not grab the
>> intermediate response.
> You have to simulate what the browser does, and from your
> description, this is most likely a repeated ajax request
> to the server. Analyze the behaviour of the javascript
> and see how it fetches the progress state and what it
> does once the result is calculated, then craft those
> actions yourself. You best chances to see exactly what is going
> on in the background is with a network sniffer like wireshark,
I like the Web Scraping Proxy for this, it logs the traffic in
the form of LWP Perl code:
http://www.research.att.com/sw/tools/wsp/
show/hide quoted text
> or a browser plugin like Firefox' Live HTTP Headers.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"
|
| Similar Threads | Posted | | Is there a module that grabs a remote page and prints thumbnail image? | May 26, 2006, 12:13 am |
| How to create a Wait Page? | February 25, 2006, 8:25 pm |
| How can I use wget to get the result of jsp page | April 17, 2007, 10:25 pm |
| Replacing the POD "User Contributed Perl Documentation" page title | January 5, 2006, 7:57 am |
| How to show 'expiry page' when user click 'back' button in browser ? | October 14, 2004, 10:45 am |
| transfer value from one page to single frame of a second,frameset page | September 14, 2008, 7:25 am |
| LWP user agent query | August 26, 2005, 2:40 pm |
| Facile user-agent statistics tool | December 9, 2004, 5:35 pm |
| How to get the DOM from a XML page | November 27, 2006, 6:54 am |
| How to automatically log in a web page? | October 8, 2004, 8:53 am |
|
push @{ $web_browser->requests_redirectable }, 'POST';
$web_browser->timeout(300);