# Opening files on the web for reading

Can anyone give me some Perl code to open an html file on the web (i.e. an
html file stored on somebody elses web server and not mine), for reading. Or
is it more complicated than that?

## Re: Opening files on the web for reading

You can use the LWP::Simple module. The example in the documentation
should tell you how to do it.

//Makholm

## Re: Opening files on the web for reading

Is there anything wrong with the answer in "perldoc -q HTML":

How do I fetch an HTML file?

jue

## Re: Opening files on the web for reading

Other than it not answering the question?  At least on my Perl version,
none of the answers there return a file handle opened for reading.  Now
memory) and then reading from that, but I'd be inclined to give the benefit
of the doubt that he meant what he asked.

LWP::UserAgent using a callback with for example :content_cb would "stream"
the data back, but not via a file handle.  One could probably come up with
an adaptor that ties a file handle front end to the callback backend.

There might be a more direct way, but I don't know what it is.

Xho

## Re: Opening files on the web for reading

xhoster@gmail.com wrote:

Fair enough. I interpreted "to open an html file on the web [...] for
reading" as he just wants to get he content of that file (which as we
all know may not be a file in the first place), not to actually have a
read file handle to a URL.
At the very least his terminology is sloppy and your interpretation may
very well be closer to his intentions.

jue

## Re: Opening files on the web for reading

Quoth xhoster@gmail.com:

IO::All::LWP

Ben

## Re: Opening files on the web for reading

BM> Quoth xhoster@gmail.com:

BM> IO::All::LWP

Unfortunately, the docs say "The bad news is that the whole file is
stored in memory after getting it or before putting it. This may cause
problems if you are dealing with multi-gigabyte files!"

It would be nice to have a buffered reader/writer which wouldn't grab
the whole file, using the LWP callbacks, as xhoster suggests...  I
haven't seen such a module.

Ted

## Re: Opening files on the web for reading

And it doesn't seem as easy as I thought.  In order for the callback to be
invoked, the thing invoking the callback has to be "in control".  But to
read from a file handle, the thing reading is in control.  You'd have to
fork a process and in one have the callback invoker in control, streaming
data to the other process as it comes in and the callback is invoked.  So
then you would have portability problems.

It seems like it is easy to write a wrapper that turns an iterator into a
callback, but vice versa is not easy.

Xho

## Re: Opening files on the web for reading

On 25 Sep 2008 15:23:24 GMT xhoster@gmail.com wrote:

x> And it doesn't seem as easy as I thought.  In order for the callback to be
x> invoked, the thing invoking the callback has to be "in control".  But to
x> read from a file handle, the thing reading is in control.  You'd have to
x> fork a process and in one have the callback invoker in control, streaming
x> data to the other process as it comes in and the callback is invoked.  So
x> then you would have portability problems.

You can do it with buffering but it's ugly code I would not want to
write.  It's very easy to get it wrong.

x> It seems like it is easy to write a wrapper that turns an iterator into a
x> callback, but vice versa is not easy.

Right, since iterators are stateful, so you have to manufacture and
preserve the state when you only have a callback.

Ted

## Re: Opening files on the web for reading

That's not the issue: callbacks in Perl are closures, so they do have
state. The trouble is that you would need LWP::UserAgent->simple_request
and whatever is driving the <$FH> loop to be coroutines, and Perl doesn't have 'yield'. Just for fun, here's an implementation using Coro: #!/usr/bin/perl use warnings; use strict; { package LWP::FH; use Coro; use Coro::Channel; use LWP::UserAgent; use overload '<>' => sub { my ($s) = @_;
my $eol; until (($eol = length($/) + index$s->, $/) > 0) { my$new = $s->->get; if (defined$new) {
$s-> .=$new;
}
else {
$eol = length$s->;
last;
}
}
return substr $s->, 0,$eol, "";
};

my $UA = LWP::UserAgent->new; sub new { my ($c, $url) = @_; my$s = bless {
buf => "",
ch  => Coro::Channel->new(1),
}, $c; async { my ($UA, $s) = @_;$UA->get(
$url, ":content_cb" => sub {$s->->put($_[0]); }, );$s->->put(undef);
} $UA,$s;
return $s; } } my$FH = LWP::FH->new("http://perl.org ");
while (<$FH>) { print "LINE:$_";
}

__END__

Ben

## Re: Opening files on the web for reading

Quoth xhoster@gmail.com:

So use Net::HTTP::NB. Not quite as convenient as LWP::UA, but it

It's a real shame Perl doesn't have a decent lightweight userland thread
library, as this sort of thing is exactly what it would be useful for.
If I *wanted* to write select loops, I'd be writing C; since I'm writing
Perl, it would be nice if perl could handle the messy stuff for me :).

Ben

## Re: Opening files on the web for reading

Obviously I've got something wrong (or, as ever, I'm incompetent).  The
server must have means to be told stop-feeding/resume-feeding.  Or (in
case I understand networking a least bit) those gigabytes would be
buffered in kernel.  What I don't know?

## Re: Opening files on the web for reading

Yes. See
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control
.

Once the kernel buffers are full, the receiving end instructs the
sending end to stop sending data.

Ben

## Re: Opening files on the web for reading

BM> Yes. See
BM> http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control
BM> .

BM> Once the kernel buffers are full, the receiving end instructs the
BM> sending end to stop sending data.

Also, HTTP 1.1 supports partial transfers of data, so you can open a
persistent connection and keep requesting small pieces.  I'd guess it's
better that TCP flow control if the goal was to allow random seeks, not
just sequential writes.  Handling errors and chunk boundaries would
be... let's say "interesting to the right developer." :)

Ted

Aha, pleased to hear that.  What's worse that've read almost (or all?)
dead trees I've found.  So $Subject. -- Torvalds' goal for Linux is very simple: World Domination ## Re: Opening files on the web for reading On Sep 24, 8:47 am, xhos...@gmail.com wrote: S Another possibility but still indirect (and w/o graceful error handling): use LWP::Simple; my$pid = open( my $fh, "-|" ); die "fork:$!" unless defined $pid; if ($pid ) { while <\$fh> { ... }  }
else { getprint( ...); }
...

