Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- Caching robots.txt in LWP::RobotUA
March 15, 2010, 8:15 pm
rate this thread
I'm using LWP::RobotUA to download a series of pages with the
my %options = ('agent' => 'crawler', 'show_progress' => 1, 'delay' =>
10/60, 'from' => 'email@example.com');
my $ua = LWP::RobotUA->new(%options);
my @all_urls = (array of liniks populated from elsewhere);
foreach my $url (@all_urls)
$filename = "$url.html";
The problem is that LWP::RobotUA seems to make a GET request for the
robots.txt file each time I call the mirror() method, even though all
of the URLs are on the same domain. I'd expect the module to cache the
file, either in memory or on disk, because it's highly unlikely to
change between requests, but it doesn't seem to do so.
Do I need to write my own cache module, or tack on an existing one
from CPAN? I was hoping that calling mirror() would Just Work.
Thanks in advance!
- » FAQ 7.18 How can I access a dynamic variable while a similarly named lexical is in scope?
- — Next thread in » PERL Discussions