Want your opinion on @ARGV file globbing

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


   Recently someone asked me to write a Perl script that would operate
on a bunch of input files specified at the command line.  This script
was meant for a Unix-ish system, but I developed it on an MSWin

   Normally, when I write a Perl script that takes an arbitrary number
of input files, I will include the line:

      @ARGV = map glob, @ARGV;  # for DOS globbing

I do this because the DOS shell passes in wildcard designators
unexpanded -- that is, if "*.txt" is specified as the only argument,
@ARGV will have "*.txt" as its only element.  Unix shells, on the
other hand, usually expand the wildcards, so the glob()bing doesn't
have to be done.

   So by using the above line I ensure that the script will have the
same wildcard-expanding behavior whether it is run in DOS or in Unix.

   However, if the above line is called when run under Unix, then
technically the wildcard expansions get run twice:  Once at the
command line, and once in my script.  This will be a problem if any of
the input files have wildcard characters or spaces in them.  For
example, if I have a file named "a b", and I call my Perl script with
"perl script.pl a*", then "a*" expands to include "a b", but then the
glob() call in the script expands that to "a" and "b", ignoring file
"a b" altogether.

   So to work around that problem, I wrote my script so that it only
did file globbing if it was running on a Windows platform, like this:

   if ($^O =~ m/^MSWin/i)
      @ARGV = map glob, @ARGV;  # for DOS globbing

This way, input arguments won't get "double-globbed."

   Happy with this, I sent my script to the person who needed it.  He
responded by saying that "the argument list [was] too long."  It turns
out that the wildcard expression he was using expanded out to nearly
16,000 files, which caused the Unix shell he was using to refuse to
run the resulting (long) command line.

   So I made a quick change to my script:  I removed the above if-
check and advised him to pass in the wildcarded arguments surrounded
by quotes.  That way the shell wouldn't expand out the wildcards,
leaving Perl to do it.

   That "work-around" worked great.  But that led me to ask:  In
future scripts, should I include the check for $^O before calling glob
()?  If I don't, then the files risk being "double-globbed" on Unix
systems -- but if I do, then I run the risk of the shell refusing to
call the script (without an available work-around).

   Of course, this is often a moot point, as more than 99% of the
input files I ever processed have no wildcard characters or spaces in
their filenames.  But that's a guarantee I can't always make.

   Perhaps I could still call glob() by default on all systems, but
include a command-line switch that forces that not to happen (in order
to prevent double-globbing).  That way, the switch could be mostly
ignored, but it is there  in case it's ever needed.

   Or am I just overthinking this?  After all, glob()bing @ARGV in all
instances (that is, regardless of platform) has never given me a
problem (yet).  Maybe I should just leave it in (to be called all the
time) after all.

   What are your opinions on this?  Is there a convention you use that
addresses this issue?  In there an alternate way you prefer to handle

   Your thoughts and opinions are welcome.


   -- Jean-Luc

Re: Want your opinion on @ARGV file globbing

In article

[problems using glob on Unix and Windows systems snipped]

Your problem is one of the reasons I never use glob to find files to
process. I always use File::Find and specify the top-level directory,
either as a default or a command-line parameter. Then, I can apply any
appropriate filters to the actual file name and directory.

Jim Gibson

Re: Want your opinion on @ARGV file globbing

Quoted text here. Click to load it
Quoted text here. Click to load it

Yes, you are. There is nothing wrong with the original version of your
script and he has a problem with his shell, not with your Perl program.
The correct solution is the same as for any program on UNIX when the
shell complains about a too long arg list: use the find utility with the
execute option.

Actually your "fix" made it worse because your forced globbing in your
Perl program blocks the user from naming files with a star or a tilde in
their filenames.
If you really want to offer this changed behaviour then I would do it at
most as an additional option, controlled by a command line parameter.
Otherwise your script behaves different than any other Unix program and
this inconsistency will bite you sooner or later.


Re: Want your opinion on @ARGV file globbing

Quoted text here. Click to load it

Agreed.  You should *not* do your own globbing by default on Unix.
If I type
    prog 'foo bar'
and it processes the two files "foo" and "bar", that's extremely
counterintiutive behavior; it's also difficult to work around it
if I really want to process a file called "foo bar".

An option tell your program to do its own globbing wouldn't be
unreasonable, but personally I wouldn't use it; the right Unixish
solution is to use "find", "xargs", or something similar.

Or you could add an option to specify a file containing a list
of files.  If you're processing 16,000 files, generating a list
of them isn't much of a burden.

I'm not sure what the default behavior should be on Windows.
Consistency between the Unix and Windows versions argues for not
doing your own globbing by default.  Consistency between your program
and other Windows programs might argue for enabling it by default.

One thing you should look into: what does "glob" do with whitespace?
Many Windows file names contain spaces; you don't want to make it
difficult to process such files.

Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Re: Want your opinion on @ARGV file globbing

Jürgen Exner wrote:
Quoted text here. Click to load it

Many programs do different things if invoked once on 16,000 files,
versus 16,000 times on one file each.  Take "sort", for example.


Re: Want your opinion on @ARGV file globbing

Quoted text here. Click to load it

Obviously. However in that case you need a different method to pass that
list of 16000 values anyway because is it not possible to pass them as
command line arguments.
Writing them to a file and loading that file via a -f option comes to
mind as one simple and effective solution. I am sure there are others.


Re: Want your opinion on @ARGV file globbing

jl_post@hotmail.com wrote:

Quoted text here. Click to load it

If I am running a script on Linux, I'd generally expect it to work the
way almost every other Linux program works, and not double glob.

Quoted text here. Click to load it

I often process input files that do have spaces in them, because we have
network drives that are cross mounted on both Linux and Windows, and
Windows users often include spaces in their file names.

Quoted text here. Click to load it

I'd probably reverse that, and have it manually glob only with the
switch.  But I guess it depends on what you think would be more likely,
huge file lists or file lists with space/wildcards.

One thing to consider is failure mode.  I think a "argument list too
long" error is more likely to be noticed and correctly interpreted and
acted upon than a program which silently tries to process a double
globbed, and hence incorrect, list of files.

Quoted text here. Click to load it

Are you sure you would know if it had?

Quoted text here. Click to load it

I have some scripts which are routinely called on thousands of files.
However, when used routinely, all the files are in a standard location
following a standard naming convention.  So I have the program look at
@ARGV, after all switches are processed, and if it has exactly 1
argument and that one argument looks like, say, "ProjectXXX", then it
automatically does @ARGV=glob"/aldfj/dsf/ad/sdf/$ARGV[0]/klj*/*.csv";

If you have highly specialized needs, then you do highly specialized things.


Re: Want your opinion on @ARGV file globbing

On Tue, 02 Feb 2010 08:56:40 -0800, jl_post@hotmail.com wrote:
Quoted text here. Click to load it
Quoted text here. Click to load it

I would add an option (say, -g) to the program meaning, arguments should
be globbed internally.  -g enabled by default on MS Win.  Then people in
your user's unusual situation who don't want to or can't use find/xargs
have a solution.

Peter Scott
http://www.perlmedic.com /
http://www.perldebugged.com /

Site Timeline