Click here to get back home

Speeding my script

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Speeding my script Petyr David 02-22-2008
Posted by Petyr David on February 22, 2008, 1:38 pm
Please log in for more thread options
have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

I know i should RTFM (LOL) and I will, but just looking for some
quick guidance/suggestions

pseudo code;

cd root of document directory

Load array with names of directories

forech subdir in @dirnames

cd $subdir
lots of if statements to figure what find command and what
option to use
@temp_array=`$long_find_grep_command`
push @temp_array onto big array
other processing
end foreach

what I'd like to do is to be able to simultaneously be searching more
than 1 subdirectory

TX for your help -



Posted by smallpond on February 22, 2008, 1:48 pm
Please log in for more thread options
> have a web page calling PERL script that searches for patterns in 20,
> 000 files + and returns link to files and lines found matching
> pattern. I use a call to `find` and `egrep`
>
> Q: Script works - but is straining under the load - files are in the
> Gbs.
> How to speed process? How simple to employ threads or slitting
> off
> new processes?
>
> I know i should RTFM (LOL) and I will, but just looking for some
> quick guidance/suggestions
>
> pseudo code;
>
> cd root of document directory
>
> Load array with names of directories
>
> forech subdir in @dirnames
>
> cd $subdir
> lots of if statements to figure what find command and what
> option to use
> @temp_array=`$long_find_grep_command`
> push @temp_array onto big array
> other processing
> end foreach
>
> what I'd like to do is to be able to simultaneously be searching more
> than 1 subdirectory
>
> TX for your help -

Your idea is only likely to help if the directories reside on
different
disks, otherwise it will slow down the search by thrashing the disks.

Better would be to analyze the type of requests. Maybe there
are common searches you can cache. For example, a search for
/the magic words are squeamish ossifrage/ need only be performed
on files known to contain the common word "ossifrage".

Posted by cvh@LE on February 22, 2008, 2:23 pm
Please log in for more thread options
>
>
>
> > have a web page calling PERL script that searches for patterns in 20,
> > 000 files + and returns link to files and lines found matching
> > pattern. I use a call to `find` and `egrep`
>
> > Q: Script works - but is straining under the load - files are in the
> > Gbs.
> > How to speed process? How simple to employ threads or slitting
> > off
> > new processes?
>
> > I know i should RTFM (LOL) and I will, but just looking for some
> > quick guidance/suggestions
>
> > pseudo code;
>
> > cd root of document directory
>
> > Load array with names of directories
>
> > forech subdir in @dirnames
>
> > cd $subdir
> > lots of if statements to figure what find command and what
> > option to use
> > @temp_array=`$long_find_grep_command`
> > push @temp_array onto big array
> > other processing
> > end foreach
>
> > what I'd like to do is to be able to simultaneously be searching more
> > than 1 subdirectory
>
> > TX for your help -
>
> Your idea is only likely to help if the directories reside on
> different
> disks, otherwise it will slow down the search by thrashing the disks.
>
> Better would be to analyze the type of requests. Maybe there
> are common searches you can cache. For example, a search for
> /the magic words are squeamish ossifrage/ need only be performed
> on files known to contain the common word "ossifrage".

To me this very much sounds like the 20k+ files are changed too often.
If this is the case you very likely might be able to speed up the
process by using an index of some sort which is updated by another
perl-process in regular periods, i.e. running as cron. I personally
recommend a sql database of some sort against which your web-request
run their queries.
this db can be updated every x mins.

another idea could be to have various flat-file index-database against
which you query using awk in subprocesses, since awk can be a lot
faster than perl in specific cases ...


Posted by J. Gleixner on February 22, 2008, 2:48 pm
Please log in for more thread options
Petyr David wrote:
> have a web page calling PERL script that searches for patterns in 20,
> 000 files + and returns link to files and lines found matching
> pattern. I use a call to `find` and `egrep`
>
> Q: Script works - but is straining under the load - files are in the
> Gbs.
> How to speed process? How simple to employ threads or slitting
> off
> new processes?
>
> I know i should RTFM (LOL) and I will, but just looking for some
> quick guidance/suggestions

No need to LOL at your laziness.

Using find/grep on thousands of files and Gb of data is a poor
choice. Try looking at various indexing tools: htdig, glimpse,
Swish-e, etc.

Posted by Petyr David on February 22, 2008, 7:32 pm
Please log in for more thread options
wrote:
> Petyr David wrote:
> > have a web page calling PERL script that searches for patterns in 20,
> > 000 files + and returns link to files and lines found matching
> > pattern. I use a call to `find` and `egrep`
>
> > Q: Script works - but is straining under the load - files are in the
> > Gbs.
> > How to speed process? How simple to employ threads or slitting
> > off
> > new processes?
>
> > I know i should RTFM (LOL) and I will, but just looking for some
> > quick guidance/suggestions
>
> No need to LOL at your laziness.
>
> Using find/grep on thousands of files and Gb of data is a poor
> choice. Try looking at various indexing tools: htdig, glimpse,
> Swish-e, etc.

Agreed, but it was my first project in PERL. It started out as a very,
very simple file searcher
and then a bunch of people asked if anyone knew of file search
software that could be implementd quickly.

I meekly raised my hand. Since then a lot of options have been added
and I do believe
that I either take this to the next step, using one of the indexing
tools mentioned, or I
leave it "as is". I have plenty of other things to do. It's just that
I like programming.
My other responsibilities pay me plenty, but are boring and are almost
clerical in nature

TX to all for the help

Similar ThreadsPosted
speeding up perl script execution under apache October 29, 2004, 5:29 pm
Speeding up February 19, 2006, 2:38 am
Speeding up glob? April 25, 2005, 2:34 pm
Speeding up writes to STDOUT June 4, 2006, 11:55 pm
Speeding up an application - general rules December 21, 2006, 10:13 pm
Re: Need ideas on how to make this code faster than a speeding turtle May 15, 2008, 6:16 pm
Re: Need ideas on how to make this code faster than a speeding turtle May 16, 2008, 3:17 am
Re: Need ideas on how to make this code faster than a speeding turtle May 16, 2008, 3:54 am
Re: Need ideas on how to make this code faster than a speeding turtle May 16, 2008, 7:57 pm
How to generate radio buttons in Perl/CGI script with call to shell script? November 23, 2007, 11:24 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap