Click here to get back home

m// on very long lines leaks memory

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
m// on very long lines leaks memory ShaunJ 03-13-2008
Posted by ShaunJ on March 13, 2008, 5:26 pm
Please log in for more thread options
The following snippet leaks memory until it breaks and falls down when
m// is used on a very long line. It works fine if the line lengths are
short. Try
./test.pl /usr/share/dict/words /usr/share/dict/words
Depending on your dictionary, you'll see that compiling the regex
takes about 200 MB. However the following matching loop leaks memory
at an alarming rate. Start up `top` and watch it run. I'm using Perl
5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
or deny this behaviour for other architectures or version of Perl,
that would be interesting too.

Cheers,
Shaun

#!/usr/bin/perl
use strict;
use English;
open REFILE, '<' . shift;
chomp (my @restrings = <REFILE>);
close REFILE;
my @re = map { qr/$_/ } @restrings;

open TEXTFILE, '<' . shift;
chomp (my @text = <TEXTFILE>);
close TEXTFILE;
my $text = join '', @text;

foreach my $re (@re) {
        if ($text =~ m/$re/) {
                print $LAST_MATCH_START[0], "\n";
        }
}

Posted by John W. Krahn on March 13, 2008, 5:47 pm
Please log in for more thread options
ShaunJ wrote:
> The following snippet leaks memory until it breaks and falls down when
> m// is used on a very long line. It works fine if the line lengths are
> short. Try
> ./test.pl /usr/share/dict/words /usr/share/dict/words
> Depending on your dictionary, you'll see that compiling the regex
> takes about 200 MB. However the following matching loop leaks memory
> at an alarming rate. Start up `top` and watch it run. I'm using Perl
> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
> or deny this behaviour for other architectures or version of Perl,
> that would be interesting too.
>
> Cheers,
> Shaun
>
> #!/usr/bin/perl
> use strict;
> use English;
> open REFILE, '<' . shift;
> chomp (my @restrings = <REFILE>);
> close REFILE;
> my @re = map { qr/$_/ } @restrings;
>
> open TEXTFILE, '<' . shift;
> chomp (my @text = <TEXTFILE>);
> close TEXTFILE;
> my $text = join '', @text;
>
> foreach my $re (@re) {
>         if ($text =~ m/$re/) {
>                 print $LAST_MATCH_START[0], "\n";
>         }
> }

I tested it and if I remove the English module it works fine.
(So don't use English.pm!)



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Posted by John W. Krahn on March 13, 2008, 5:53 pm
Please log in for more thread options
John W. Krahn wrote:
> ShaunJ wrote:
>> The following snippet leaks memory until it breaks and falls down when
>> m// is used on a very long line. It works fine if the line lengths are
>> short. Try
>> ./test.pl /usr/share/dict/words /usr/share/dict/words
>> Depending on your dictionary, you'll see that compiling the regex
>> takes about 200 MB. However the following matching loop leaks memory
>> at an alarming rate. Start up `top` and watch it run. I'm using Perl
>> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
>> or deny this behaviour for other architectures or version of Perl,
>> that would be interesting too.
>>
>> Cheers,
>> Shaun
>>
>> #!/usr/bin/perl
>> use strict;
>> use English;
>> open REFILE, '<' . shift;
>> chomp (my @restrings = <REFILE>);
>> close REFILE;
>> my @re = map { qr/$_/ } @restrings;
>>
>> open TEXTFILE, '<' . shift;
>> chomp (my @text = <TEXTFILE>);
>> close TEXTFILE;
>> my $text = join '', @text;
>>
>> foreach my $re (@re) {
>> if ($text =~ m/$re/) {
>> print $LAST_MATCH_START[0], "\n";
>> }
>> }
>
> I tested it and if I remove the English module it works fine.
> (So don't use English.pm!)

Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:

use English qw( -no_match_vars );



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Posted by ShaunJ on March 13, 2008, 6:51 pm
Please log in for more thread options
...
> > I tested it and if I remove the English module it works fine.
> > (So don't use English.pm!)
>
> Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:
>
> use English qw( -no_match_vars );

Wow, thanks! If I use either English.pm or $& (even without
English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
10.4.11). If I use neither English.pm or $& it works fine.

If I use Perl 5.10.0 built from source it works for every case.

Cheers,
Shaun

Posted by Uri Guttman on March 13, 2008, 7:24 pm
Please log in for more thread options

S> ...
>> > I tested it and if I remove the English module it works fine.
>> > (So don't use English.pm!)
>>
>> Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:
>>
>> use English qw( -no_match_vars );

S> Wow, thanks! If I use either English.pm or $& (even without
S> English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
S> 10.4.11). If I use neither English.pm or $& it works fine.

i was going to mention that but didn't want to get into this thread. $&
(which is used in english.pm without that option) is a known memory hog
(not a leak). since $& is global it must copy the entire match string
for each regex in case it might be used later anywhere in the
program. this is a well known issue and you should google for more about
it or find the points in perldoc perlvar.

S> If I use Perl 5.10.0 built from source it works for every case.

they seem to have fixed this problem (partially from what i heard but i
could be wrong) in 5.10. i still recommend never using $& and no one who
knows perl uses english.pm.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Architecture, Development, Training, Support, Code Review ------
----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

Similar ThreadsPosted
What to do about memory leaks July 20, 2007, 7:08 am
Memory leaks with SOAP Lite August 30, 2007, 10:40 am
perl open/close file leaks memory?? September 27, 2007, 1:30 am
long lines October 28, 2004, 12:25 pm
--chop-long-lines April 2, 2007, 10:04 am
split long string over two lines June 15, 2005, 10:22 pm
printLDIF wraps long lines of text October 22, 2004, 3:07 pm
Need help with leaks December 18, 2007, 4:45 pm
'long long integer' in Perl February 15, 2006, 2:30 pm
Huge Memory Load for reading into memory November 6, 2006, 7:10 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap