Looking for modules to help downlaod web-pages...

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I'm afraid I'm a bit of a newbee when it comes to Perl,
though I have some experience with other languages
(mostly C++).

I would like to make a script to automate the downloading
some pages on the Web, and thought Perl should be
suitable for this.  However, I'll undoubtfully need some
modules, and I have no idea of which ones...  So I would
appriciate suggestions to what modules I may need and
should take a closer look at.

I'm planning on making something similar to 'wget', but
specialized to the type of pages I want; so it will mostly
be a matter of downloading web-pages, saving them,
and parsing them for links to other web-pages to download.
I may also need to save other page contents (e.g. images),
and maybe event content refered to by CSS (e.g. background
images).  Many of the pages I'm after are PHP-pages (but
AFAIK that is handled on the server-side, isn't it).

Some of the pages require log-in, so an ability for the script
to recognize a password-form, fill-in user-name and
password and post it -- as well as accepting cookies -- are
needed too.  Pages containing just a confirmation-button
for proceding, may also need to be "pushed" by the script.
There may also be need to fill-in and send forms with things
like date-of-birth -- maybe also in the form of drop-down lists.
Many of these are redirects; e.g. I want a page with text, but
unless I've previously logged-in, specified dob or confirmed,
I'm redirected to forms.  After I've filled in the form, I procede
to the page I wanted.  However -- at least in my browser -- these
pages (the one I want and the one I need to fill stuff in on) seem
to have the same URL and be "identical" from the browsers pov.

Some limited emulation of JavaScript would also be great.  E.g.
the ability to "fake" a pop-up dialog-box and "press" "OK" or
"Yes"; for posting some forms; and for redirecting.

So any idea for modules I ought to look at for accomplising
some or all of the above, would be very much appriciated.


Re: Looking for modules to help downlaod web-pages...

Quoted text here. Click to load it

Big job... start with LWP modules which are installed as part of Perl.  That
will in turn lead to to many others that will possibly be helpful, cookies

Also search CPAN http://www.cpan.org/ for various other things you need.


Re: Looking for modules to help downlaod web-pages...

Quoted text here. Click to load it

Sounds like you might be interested in WWW::Mechanize.


Re: Looking for modules to help downlaod web-pages...

Sisyphus coughed up some electrons that declared:

Quoted text here. Click to load it

Also LWP or Net::HTTP for more traditional approaches.

Don't overlook driving wget as another way.


Site Timeline