Click here to get back home

Looking for modules to help downlaod web-pages...

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Looking for modules to help downlaod web-pages... Koppe 07-20-2007
Posted by Koppe on July 20, 2007, 6:53 pm
Please log in for more thread options


I'm afraid I'm a bit of a newbee when it comes to Perl,
though I have some experience with other languages
(mostly C++).

I would like to make a script to automate the downloading
some pages on the Web, and thought Perl should be
suitable for this. However, I'll undoubtfully need some
modules, and I have no idea of which ones... So I would
appriciate suggestions to what modules I may need and
should take a closer look at.

I'm planning on making something similar to 'wget', but
specialized to the type of pages I want; so it will mostly
be a matter of downloading web-pages, saving them,
and parsing them for links to other web-pages to download.
I may also need to save other page contents (e.g. images),
and maybe event content refered to by CSS (e.g. background
images). Many of the pages I'm after are PHP-pages (but
AFAIK that is handled on the server-side, isn't it).

Some of the pages require log-in, so an ability for the script
to recognize a password-form, fill-in user-name and
password and post it -- as well as accepting cookies -- are
needed too. Pages containing just a confirmation-button
for proceding, may also need to be "pushed" by the script.
There may also be need to fill-in and send forms with things
like date-of-birth -- maybe also in the form of drop-down lists.
Many of these are redirects; e.g. I want a page with text, but
unless I've previously logged-in, specified dob or confirmed,
I'm redirected to forms. After I've filled in the form, I procede
to the page I wanted. However -- at least in my browser -- these
pages (the one I want and the one I need to fill stuff in on) seem
to have the same URL and be "identical" from the browsers pov.

Some limited emulation of JavaScript would also be great. E.g.
the ability to "fake" a pop-up dialog-box and "press" "OK" or
"Yes"; for posting some forms; and for redirecting.

So any idea for modules I ought to look at for accomplising
some or all of the above, would be very much appriciated.

-Koppe


Posted by Peter Wyzl on July 20, 2007, 9:05 pm
Please log in for more thread options


> I'm afraid I'm a bit of a newbee when it comes to Perl,
> though I have some experience with other languages
> (mostly C++).
>
> I would like to make a script to automate the downloading
> some pages on the Web, and thought Perl should be
> suitable for this. However, I'll undoubtfully need some
> modules, and I have no idea of which ones... So I would
> appriciate suggestions to what modules I may need and
> should take a closer look at.
>
> I'm planning on making something similar to 'wget', but
> specialized to the type of pages I want; so it will mostly
> be a matter of downloading web-pages, saving them,
> and parsing them for links to other web-pages to download.
> I may also need to save other page contents (e.g. images),
> and maybe event content refered to by CSS (e.g. background
> images). Many of the pages I'm after are PHP-pages (but
> AFAIK that is handled on the server-side, isn't it).
>
> Some of the pages require log-in, so an ability for the script
> to recognize a password-form, fill-in user-name and
> password and post it -- as well as accepting cookies -- are
> needed too. Pages containing just a confirmation-button
> for proceding, may also need to be "pushed" by the script.
> There may also be need to fill-in and send forms with things
> like date-of-birth -- maybe also in the form of drop-down lists.
> Many of these are redirects; e.g. I want a page with text, but
> unless I've previously logged-in, specified dob or confirmed,
> I'm redirected to forms. After I've filled in the form, I procede
> to the page I wanted. However -- at least in my browser -- these
> pages (the one I want and the one I need to fill stuff in on) seem
> to have the same URL and be "identical" from the browsers pov.
>
> Some limited emulation of JavaScript would also be great. E.g.
> the ability to "fake" a pop-up dialog-box and "press" "OK" or
> "Yes"; for posting some forms; and for redirecting.
>
> So any idea for modules I ought to look at for accomplising
> some or all of the above, would be very much appriciated.

Big job... start with LWP modules which are installed as part of Perl. That
will in turn lead to to many others that will possibly be helpful, cookies
etc.

Also search CPAN http://www.cpan.org/ for various other things you need.

P


Posted by Sisyphus on July 20, 2007, 11:24 pm
Please log in for more thread options



.
.
> I would like to make a script to automate the downloading
> some pages on the Web
.
.

Sounds like you might be interested in WWW::Mechanize.

Cheers,
Rob


Posted by Tim Southerwood on July 21, 2007, 4:22 am
Please log in for more thread options


Sisyphus coughed up some electrons that declared:

>
> .
> .
>> I would like to make a script to automate the downloading
>> some pages on the Web
> .
> .
>
> Sounds like you might be interested in WWW::Mechanize.
>
> Cheers,
> Rob

Also LWP or Net::HTTP for more traditional approaches.

Don't overlook driving wget as another way.

Cheers
Tim

Similar ThreadsPosted
GD modules September 1, 2004, 8:20 am
Perl PDF modules - help please August 11, 2004, 2:31 pm
installing modules? September 29, 2005, 1:52 pm
Need new maintainers for my modules January 8, 2005, 7:24 am
3rd party modules March 23, 2005, 4:56 pm
Installing modules August 5, 2005, 8:32 pm
Captcha Modules August 15, 2005, 5:35 pm
writing modules in 'c' August 27, 2005, 3:18 pm
Modules with several packages October 16, 2005, 10:50 am
Which Perl modules to use January 25, 2006, 12:37 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap