An efficient way to make a large directory tree?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on

Let's say I have a list of directories:

a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y

I'm trying to find the most efficient way to parse this list and come up
with the fewest mkpath call.  In other words, only mkpath the leaf nodes:

a/b/c/d
a/b/c/e
a/b/o
a/c/q
a/x/y

So far I'm parsing the list, then for each path, find the parent directory,
use that as the key of a hash, and increment it by 1 for each path under that
key, and only call mkpath on the hash elements with a value of 0.  Is there a
better way of doing this?

Re: An efficient way to make a large directory tree?

[...]

> I'm trying to find the most efficient way to parse this list and come up
> with the fewest mkpath call.  In other words, only mkpath the leaf nodes:

[...]

> So far I'm parsing the list, then for each path, find the parent directory,
> use that as the key of a hash, and increment it by 1 for each path under that
> key, and only call mkpath on the hash elements with a value of 0.  Is there a
> better way of doing this?

That sounds like a pretty good technique.  From an algorithmic
perspective, it's O(n lg n), which probably the best you can expect
unless there's a limit to the depth of the directories or something
similar.  From a practical perspective it sounds fairly sound, too.
Perl's implementation of hashes is very efficient, and it sounds like
that's where the biggest bottleneck will be.  It also sounds
straightforward to code and to follow, which is a plus.

It would be interesting to know if it's more efficient to do this
calculation or just mkpath() everything.  Algorithmically a bunch of
mkpath()s would be O(n), but practically it would probably be slower,
because of the cost of system calls.

Just MHO, but hope it's helpful!

----ScottG.

Re: An efficient way to make a large directory tree?

On Fri, 03 Dec 2004 16:11:27 +0000, Nan Wang wrote:

>
> Let's say I have a list of directories:
>
> a/b/c
> a/b/c/d
> a/b/c/e
> a/b/o
> a/c
> a/c/q
> a/x
> a/x/y
>
> I'm trying to find the most efficient way to parse this list and come up
> with the fewest mkpath call.  In other words, only mkpath the leaf nodes:
>
> a/b/c/d
> a/b/c/e
> a/b/o
> a/c/q
> a/x/y
>
> So far I'm parsing the list, then for each path, find the parent directory,
> use that as the key of a hash, and increment it by 1 for each path under that
> key, and only call mkpath on the hash elements with a value of 0.  Is there a
> better way of doing this?
>

I'm trying to find a way to do the loop with map (and maybe grep) but I

rich@richg ~/perl \$ cat test.pl
#!/usr/bin/perl

use strict;
use warnings;

chomp (my @paths = reverse sort { \$a=~s#/#/#g <=> \$b=~s#/#/#g } <DATA>);
my \$seen = '';
foreach (@paths) {
next if \$seen =~ /:\$_/;
print "\$_\n";
\$seen .= ":\$_";
}

__END__
a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y
rich@richg ~/perl \$ ./test.pl
a/b/c/e
a/b/c/d
a/x/y
a/c/q
a/b/o

Re: An efficient way to make a large directory tree?

On Fri, 03 Dec 2004 17:06:15 +0000, Richard Gration wrote:
> I'm trying to find a way to do the loop with map (and maybe grep) but I
> can't find it atm

Got it.

rich@richg ~/perl \$ cat test.pl
#!/usr/bin/perl

use Data::Dump qw(dump);

use strict;
use warnings;

chomp (my @paths = reverse sort { \$a=~s#/#/#g <=> \$b=~s#/#/#g } <DATA>);
my \$seen = '';
print join "\n", grep map { \$seen =~ /:\$_/ ? undef : (\$_,\$seen .=
":\$_")[0] } @paths;

Re: An efficient way to make a large directory tree?

[...]

> my \$seen = '';
> print join "\n", grep map { \$seen =~ /:\$_/ ? undef : (\$_,\$seen .=
":\$_")[0] } @paths;

This would be more efficient, simpler, and more correct in the face of
directory names containing colons, if \$seen were a hash.  Also,
returning an empty list to map gets the item ignored, which can
eliminate the grep .  Something like this (untested):

my %seen;
print join "\n", map { \$seen ? () : (\$_,\$seen++)[0] } @paths;

Regardless, I'm not sure this is any better than the original.

---ScottG.

Re: An efficient way to make a large directory tree?

On Fri, 03 Dec 2004 12:26:31 -0500, Scott W Gifford wrote:

>
> [...]
>
>> my \$seen = '';
>> print join "\n", grep map { \$seen =~ /:\$_/ ? undef :
>> (\$_,\$seen .= ":\$_")[0] } @paths;
>
> This would be more efficient, simpler, and more correct in the face of
> directory names containing colons, if \$seen were a hash.  Also,
> returning an empty list to map gets the item ignored, which can
> eliminate the grep .  Something like this (untested):
>
>     my %seen;
>     print join "\n", map { \$seen ? () : (\$_,\$seen++)[0] }
>     @paths;

This doesn't work the same. Mine will not create /a/b if it's seen /a/b/c,
yours will. However the empty list I did not know about, thank you.

> Regardless, I'm not sure this is any better than the original.

No, nor me, but it's Friday afternoon and I've already succeeded ... I've
learned something :-)

map { chomp; \$seen =~ /:\$_/ ? () : (\$_,\$seen .= ":\$_")[0] } reverse sort {
\$a=~s#/#/#g <=> \$b=~s#/#/#g } <DATA>

Re: An efficient way to make a large directory tree?

[...]

>>     my %seen;
>>     print join "\n", map { \$seen ? () : (\$_,\$seen++)[0] }
>>     @paths;
>
> This doesn't work the same. Mine will not create /a/b if it's seen /a/b/c,
> yours will. However the empty list I did not know about, thank you.

You're right, sorry!

---ScottG.

Re: An efficient way to make a large directory tree?

#!/usr/bin/perl
print "\$_\n" for grep map { chomp; \$seen =~ /:\$_/ ? undef :
(\$_,\$seen .= ":\$_")[0] } reverse sort { \$a=~s#/#/#g <=> \$b=~s#/#/#g }
<DATA>
__END__
a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y

Re: An efficient way to make a large directory tree?

Richard Gration wrote:
> reverse sort { \$a=~s#/#/#g <=> \$b=~s#/#/#g }

sort { \$b=~tr#/#/# <=> \$a=~tr#/#/# }

-Joe