Re: Problem in parsing from a pipe

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Quoted text here. Click to load it

Maybe I'm not too bright, but I don't get it :( Would you mind being a
little more verbose? I mean, you can do the same with 3 arg open:

  perl -wle '$cmd = q|rm -rf /; true|; open( $fh, "-|", "$cmd" ) or die $!'

Where is the difference?  I understand that using open( $fh, $file )
instead of open( $fh, "<$file" ) can in some cases lead to problems (if
$file becomes ">something"), but in this particular case we are reading
from a pipe anyways, and if the $cmd has been manipulated (and we were
careless and haven't checked it) than the tree args version will not be of
any help.

And anyway, I have always thought that preventing malicious input from the
users should be happening on an altogether different level, starting with
at least using taint mode -- am I wrong?

Quoted text here. Click to load it

$ du -hs human_est.out
1.2G  human_est.out
$ du -hs nr
3.5G  nr

Quoted text here. Click to load it

Maybe, but this is not going to happen. I want to stop reading a huge file
after I have collected all the information that I need from it - why should
I slurp 3.5 gb if I have what I need after reading 10k?


Re: Problem in parsing from a pipe

Quoted text here. Click to load it

Think C<"|something">.  That would result in "Can't open biderectional
pipe"...  warning.  While opening C<"something|"> pipe for writing.
What would be run via shell with output of F<something> just going
through I<STDOUT> uncatched.

Quoted text here. Click to load it

Forget what I've said.  3-arg B<open> is no-way safer, in this regard.
Splitting on spaces wouldn't help in all cases (while, I suppose, in
most).  F<perlipc> suggests going B<fork>/B<exec> to avoid shell
invocation.  What I do.

So, let me rephrase: 3-arg B<open> avoids misinterpretting redirecting
metachars as a mode specs, while stays with shell for pipes.  Then --
3-arg B<open> used consistently (or constantly) is just a matter of

I've trusted Perl that much.  What a sad day.

Quoted text here. Click to load it

No and yes.  (maybe I'm wrong, again) A tainted string just indicates
that it wasn't preprocessed.  While amount of preprocessing is left at
coders option.  I haven't fought taintedness a lot.  Quite simple (but
non-trivial, in my case) regexp removes taintedness.  Does it make a
string safe?  Who knows, it depends on task.

Quoted text here. Click to load it

Define "quite large".  (I think, sizes are in bytes).

    perl -wle '
    open $fh, "<", "/proc/$$/stat";
    print +(split / /, <$fh>)[22]'

    time perl -wle '
    $x = " " x (512 * 1E6);
    open $fh, "<", "/proc/$$/stat";
    print +(split / /, <$fh>)[22]'
    Name "main::x" used only once: possible typo at -e line 2.

    real    1m51.687s
    user    0m3.588s
    sys     0m7.068s

    time perl -wle '
    $x = " " x (256 * 1E6);
    open $fh, "<", "/proc/$$/stat";
    print +(split / /, <$fh>)[22]'
    Name "main::x" used only once: possible typo at -e line 2.

    real    0m11.334s
    user    0m1.788s
    sys     0m1.668s

I have only 512Mb real memory.  However, looking at B<time> output I
should agree, that loading even 1.2G (virtual memory provided) would be
quite exciting.  Remember, that's string but array.

Quoted text here. Click to load it

I wasn't about quitting the file.  I was about quitting c.l.p.m.

Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

Re: Problem in parsing from a pipe

Quoted text here. Click to load it

None that I can see at the moment. But there is a diffence between

    my $file = get_filename_from_user();
    open $fh, "$cmd $file|" or die;


    my $file = get_filename_from_user();
    open $fh, '-|', $cmd, $file or die;

In the first case, if the user enters "/dev/null; rm -rf /" as the file
name, the command "rm -rf /" will be executed, while in the second case
the single argument "/dev/null; rm -rf /" will be passed to $cmd (which
will probably complain that there is no file with this name).

Of course, if the user already has a shell and invokes your script
interactively, that doesn't make any difference, either: The user can
simply invoke "rm -rf /" from the shell with exactly the same result.

However it is a good idea to err on the side of caution, because some
time later you might want to reuse your code in a library, CGI script
or cron job and then you must be paranoid to avoid having your system
wiped out by accident or malice. So get into the practice of being

Quoted text here. Click to load it

That, too. But taint mode is only a tool which helps you to detect
untrusted input. It isn't foolproof.

Quoted text here. Click to load it

You shouldn't. There are situations where it is a good idea to read the
whole file into memory and there are situations where it isn't. Clearly
if you only need the first 10k of a 3.5GB file it would be insane to
all of it into memory. Even if you need to read the whole file it it
probably better to read it line by line. But that depends on your data
and what information you need to extract. "Always slurp" is just as
idiotic as "always read line by line".


Re: Problem in parsing from a pipe

Quoted text here. Click to load it

It's well worth noting, actually, that (exactly-) 3-arg piped opens are
still unsafe, since they still invoke the shell. Quite what you're
supposed to do if you want to invoke a command without arguments safely
I'm not sure. Something like

    open my $fh, "-|" or exec $cmd $cmd;

(note the lack of comma in the arguments to exec) perhaps, which is ugly
and (probably) unportable.

Of course, if you're execing a command taken from (untrusted) user input
you're insane anyway :). system LIST and open "-|" are meant for
situations like

    open my $SYMBOLS, "-|", "nm", $lib;

where you'd like to avoid needing to check $lib for shell metachars. If
you can assume that 1. your path is safe and 2. nm will treat arguments
as filenames (or at least never do anything dangerous) you can quite
safely pass that untrusted input without validation.


Site Timeline