undef takes forever

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

  I have a script that, over a period of several *days* gradually
builds a very large Perl hash.  Periodically, it saves this large
hash to a file, using the Storable module.  In the past, this
storage process has resulted in a corrupted (and unusable) file,
so the current version of the script tests for the soundness of
the stored file by saving the hash to a dummy file first, then
retrieving the hash from memory into a temporary variable $temp,
and making sure that $temp is defined and that %$temp has the right
number of keys.  If all this is as it should be, then the dummy
file is used to overwrite the old version of the hash stored on

  It turns out, however, that this version of the script is about
10x slower than the original version, which did not do this extra
check on the stored hash.  Using carefully placed print statements,
I determined that the bottleneck is not due to the extra retrieval
and checking steps, but to the deallocation of %$temp that happens
when $temp goes out of scope.  Since %$temp is very large and
useless once the check is done, I don't want it hanging around
longer than necessary, but the deallocation step takes 3-4 minutes!
This is about 100 times slower than the time it takes to allocate
%$temp in the first place!  It's crazy.  I confirmed this by
inserting an explicit statement "undef $temp" right before the end
of the enclosing scope, and noting (via print statements) that this
step is the script's worst bottleneck by far.

  It's the same thing if I make $temp file-global and skip the
explicit deallocation step.  Now the bottleneck becomes every time
that I assign a new value to $temp, which (except for the first
time) involves deallocating the last contents of %$temp.

  Is there any way to speed up the deallocation of %$temp (and



Sent from a spam-bucket account; I check it once in a blue moon.  If
you still want to e-mail me, cut out the extension from my address,
and make the obvious substitutions on what's left.

Re: undef takes forever

Quoted text here. Click to load it

You can't really be sure of the "soundness" of the file using this method.
It is hard to come up with a recommendation without knowing how much that
hash really needs to stay in memory at any given time. If, most of the
time, you don't need to reference previously computed elements of the hash,
I'd recommend at least using a tied hash, a DBM module. Alternatively, my
favorite at this point, you can look at SQLite with Class::DBI.

Quoted text here. Click to load it

Again, without code, I have no idea what you are talking about. How big is
this thing?

Over time, most of the memory your script is using is being paged out to
the hard drive. On my Win 98 PIII500 with 128 Mb RAM, I ran the following

#! perl

use strict;
use warnings;

print "Filling the hash now:\n";
my $t0 = time;
    my $h;
    $h-> = $_ for (1 .. 750_000);
    print <<EOT;
It took @{[ time - $t0 ]} seconds to fill the hash.
Now let's undef it:
    $t0 = time;

print "It took @{[ time - $t0 ]} seconds to undef the hash.\n"    

D:\Home> perl t.pl
Filling the hash now:
It took 11 seconds to fill the hash.
Now let's undef it:
It took 65 seconds to undef the hash.

On the other hand, with 500_000 elements instead of 750_000, I get:

D:\Home> perl t.pl
Filling the hash now:
It took 6 seconds to fill the hash.
Now let's undef it:
It took 3 seconds to undef the hash.

So, the solution seems to be to move away from holding all your data in


Re: undef takes forever

KKramsch wrote:

Quoted text here. Click to load it

I think the extra slowness is due to your memory being full,causing
extensive swapping.

I'd try to move the check to an external script, which needn't be so
careful about carefully releasing all memory. If necessary, you can
shortcircuit the garbage collection for that external script using a
carefully chosen exec().


Re: undef takes forever

Quoted text here. Click to load it

How so?  Did you mean exit()?

....or even POSIX::_exit?  That bypasses detailed de-allocation and
frees all memory at once -- much faster.  It also bypasses any
DESTROY calls and END blocks, so it isn't always applicable.


Re: undef takes forever

Quoth anno4000@lublin.zrz.tu-berlin.de (Anno Siegel):
Quoted text here. Click to load it

A common trick for programs which leak, at least on Unix, is to re-exec
yourself every so often (having arranged things so you can get back into
the state you were in, of course), which will 'deal' with the leaks.

I'm not sure if this applies here though: will perl go through a full GC
run if you exec, even though it doesn't need to free the memory? I guess
it may, in order to call DESTROY handlers...


Heracles: Vulture! Here's a titbit for you / A few dried molecules of the gall
   From the liver of a friend of yours. / Excuse the arrow but I have no spoon.
(Ted Hughes,        [ Heracles shoots Vulture with arrow. Vulture bursts into ]
 /Alcestis/)        [ flame, and falls out of sight. ]         ben@morrow.me.uk

Re: undef takes forever

Quoted text here. Click to load it


Quoted text here. Click to load it

Oh, right.

Apparently not.  From the Camel:

    It [END and DESTROY] isn't run if, instead of exiting, the current
    process just morphs itself from one program to another via exec.

I can't find the equivalent passage in perldoc.


Re: undef takes forever

Anno Siegel wrote:

Quoted text here. Click to load it

See the bottom of "perldoc -f exec":

           Note that "exec" will not call your "END" blocks, nor will it
           call any "DESTROY" methods in your objects.

I know of a person who used


as a way to quickly get out of a perl script. "true" is a Unix/BSD/Linux
command line "tool" that doesn't do much at all. (I think its main
purpose is to be used in "make" setups.)

man true:  <http://www.rt.com/man/true.1.html


Site Timeline