Click here to get back home

Using Perl to get data from website

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Using Perl to get data from website fiazidris 03-07-2008
Posted by fiazidris on March 7, 2008, 4:58 am
Please log in for more thread options
Previously, I have written a perl script to access data from this URL:

http://www.bangkokflightservices.com/our_cargo_track.php

Some sample: MAWB - Master Airwaybill Number

724-26332482
724-61480672
724-61441122

and this was the final URL:

http://203.151.118.123:8090/showc_track.php?m_prefix=724&m_sn=26332482&h_prefix=HWB&h_sn=

But, now there is a change on the website and I couldn't extract
through the same script. One change I noticed is the URL has changed
to:

<iframe src="http://203.151.118.123:8090/showc_track.php?
m_prefix=724&m_sn=26332482&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14db65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc072b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74be&ch=
" frameborder="0" scrolling="yes" height="700" width="100%"> </iframe>

How can I programmatically obtain data for a list of MAWBs.

Here is a sample script that I wrote which previously worked:

#!/usr/bin/perl

while (<>) {
chomp;

$mprefix = substr($_, 0, 3);
$msn = substr($_, 4, 8);

if (length($mprefix) ne 3) { next; }

$currurl = 'http://203.151.118.123:8090/showc_track.php?
m_prefix=' . $mprefix . '&m_sn=' . $msn .
'&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14db65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc072b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74be&ch=
';


        $currresult = qx{curl -s '$currurl'};

        while ( $currresult=~ m#(.*)#g ) {
$currline=$1;

                if ($currline =~ m#style12#i) {

                 $currline =~ m#.*>(.*?)<.*#i;
                        $result = $result . " / " . $1;
                 }

        }
        print "***$result\n";
        $result = '';
}


Posted by Ben Morrow on March 7, 2008, 6:41 am
Please log in for more thread options

> Previously, I have written a perl script to access data from this URL:
>
> http://www.bangkokflightservices.com/our_cargo_track.php
>
> Some sample: MAWB - Master Airwaybill Number
>
> 724-26332482
> 724-61480672
> 724-61441122
>
> and this was the final URL:
>
> http://203.151.118.123:8090/showc_track.php?m_prefix=724&m_sn=
> 26332482&h_prefix=HWB&h_sn=
>
> But, now there is a change on the website and I couldn't extract
> through the same script. One change I noticed is the URL has changed
> to:
>
[url trimmed]
> <iframe src="http://203.151.118.123:8090/showc_track.php?
> m_prefix=724&m_sn=26332482&h_prefix=HWB&h_sn=&ecy=e076438db64c61..."
> frameborder="0" scrolling="yes" height="700" width="100%"> </iframe>
>
> How can I programmatically obtain data for a list of MAWBs.

Yuck, what a horrible page. <input> without <form>... I would use
something like

#!/usr/bin/perl

use WWW::Mechanize;

my $baseurl =
'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
my $hawb = 'h_prefix=HAWB&h_sn=';

my $M = WWW::Mechanize->new(auto_check => 1);

while (<>) {
chomp;

my ($mprefix, $msn) = /(...)(........)/ or do {
warn "invalid MAWB: '$_'";
next;
};

$M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb");
$M->follow_link(url_regex => qr/showc_track/);
my $content = $M->content;

# process $content as before
}

You may need to adjust the follow_link call if there are several links on
the same page that match that regex; see perldoc WWW::Mechanize for the
arguments. If the server checks the Referer, you may also need to ->get
/our_cargo_track.php first.

Ben


Posted by ifiaz on March 7, 2008, 9:46 am
Please log in for more thread options
You may need to adjust the follow_link call if there are several links
on
the same page that match that regex; see perldoc WWW::Mechanize for
the
arguments. If the server checks the Referer, you may also need to -
>get
/our_cargo_track.php first.

Ben
----

Thank you for your prompt response.

When I used the code with minor modifications, I still have the
problem that I can't access the data as the process throws me to
another page as below.

This is what the $content contains:

                <script> window.open ('http://www.bangkokflightservices.com/
our_cargo_track.php') ;
                        setTimeout("window.close();", 10);
                </script>

How to get to the actual data page. Please guide me here as I am a
newbie.

I don't know how to implement Referer and all that.


### This is the complete code I used.
#!/usr/bin/perl

use WWW::Mechanize;

my $baseurl =
'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
my $hawb = 'h_prefix=HAWB&h_sn=';

my $M = WWW::Mechanize->new(auto_check => 1);

        ## Added code for testing Only
        my $F = WWW::Mechanize->new(auto_check => 1);
        $F->get("http://www.bangkokflightservices.com/our_cargo_track.php");
        my $contentF = $F->content;
        #print "$contentF\n";
        #$M->add_header("Referer => 'http://www.bangkokflightservices.com/
our_cargo_track.php'" )

while (<>) {
chomp;

my ($mprefix, $msn) = /(...)-(........)/ or do {
warn "invalid MAWB: '$_'";
next;
};

print "$mprefix $msn\n";

$M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb");
$M->follow_link(url_regex => qr/showc_track/);
my $content = $M->content;

print "$content\n"; # for debugging

# process $content as before
#
while ( $content =~ m#(.*)#g ) {
$currline=$1;

if ($currline =~ m#style12#i) {

$currline =~ m#.*>(.*?)<.*#i;
$result = $result . " / " . $1;
}
}
print "***$result\n";
$result = '';
}

Posted by ifiaz on March 8, 2008, 10:34 am
Please log in for more thread options
Also, please so you know,

my $baseurl =
'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
my $hawb = 'h_prefix=HAWB&h_sn=';

h_prefix should be HWB and not HAWB.

I have fixed that in my code and still the same problem that it throws
me to a different page.



> You may need to adjust the follow_link call if there are several links
> on
> the same page that match that regex; see perldoc WWW::Mechanize for
> the
> arguments. If the server checks the Referer, you may also need to ->get
>
> /our_cargo_track.php first.
>
> Ben
> ----
>
> Thank you for your prompt response.
>
> When I used the code with minor modifications, I still have the
> problem that I can't access the data as the process throws me to
> another page as below.
>
> This is what the $content contains:
>
> <script> window.open ('http://www.bangkokflightservices.com/
> our_cargo_track.php') ;
> setTimeout("window.close();", 10);
> </script>
>
> How to get to the actual data page. Please guide me here as I am a
> newbie.
>
> I don't know how to implement Referer and all that.
>
> ### This is the complete code I used.
> #!/usr/bin/perl
>
> use WWW::Mechanize;
>
> my $baseurl =
> 'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
> my $hawb = 'h_prefix=HAWB&h_sn=';
>
> my $M = WWW::Mechanize->new(auto_check => 1);
>
> ## Added code for testing Only
> my $F = WWW::Mechanize->new(auto_check => 1);
> $F->get("http://www.bangkokflightservices.com/our_cargo_track.php");
> my $contentF = $F->content;
> #print "$contentF\n";
> #$M->add_header("Referer => 'http://www.bangkokflightservices.com/
> our_cargo_track.php'" )
>
> while (<>) {
> chomp;
>
> my ($mprefix, $msn) = /(...)-(........)/ or do {
> warn "invalid MAWB: '$_'";
> next;
> };
>
> print "$mprefix $msn\n";
>
> $M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb");
> $M->follow_link(url_regex => qr/showc_track/);
> my $content = $M->content;
>
> print "$content\n"; # for debugging
>
> # process $content as before
> #
> while ( $content =~ m#(.*)#g ) {
> $currline=$1;
>
> if ($currline =~ m#style12#i) {
>
> $currline =~ m#.*>(.*?)<.*#i;
> $result = $result . " / " . $1;
> }
> }
> print "***$result\n";
> $result = '';
>
> }


Posted by fiazidris on March 10, 2008, 4:42 am
Please log in for more thread options
> Also, please so you know,
>
> my $baseurl =
> 'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
> my $hawb = 'h_prefix=HAWB&h_sn=';
>
> h_prefix should be HWB and not HAWB.
>
> I have fixed that in my code and still the same problem that it throws
> me to a different page.
>

I have reached to a level where the following URL works on a browser:
prefix and serials can be changed.

http://203.151.118.123:8090/showc_track.php?m_prefix=724&m_sn=61441122&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14db65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc072b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74be&ch=%A0%A0%A0%A0

but this URL doesn't return results using perl or curl.

Ben Morrow, please help.

Similar ThreadsPosted
faq on perl website May 9, 2006, 2:19 pm
Want to create a website using perl and CGI December 17, 2006, 11:58 pm
Logging into and parsing a website using Perl February 15, 2005, 10:11 pm
automate website login using perl September 23, 2005, 6:04 pm
website development and maintenance, using Perl. Please help! September 30, 2005, 9:56 am
need help in updating a website using perl script. November 21, 2005, 4:10 am
Perl Search & Replace Script For Website August 23, 2004, 7:30 pm
how to use Perl to input username and password for website? August 14, 2005, 8:20 pm
Printing Data usinf Perl to an HTML does not work with large amount of data December 6, 2004, 11:36 pm
perl - data structure build to transpose data August 28, 2004, 9:34 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap