Speeding my script

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

Q: Script works - but is straining under the load - files are in the
     How to speed process? How simple to employ threads or slitting
     new processes?

    I know i should RTFM (LOL) and I will, but just looking for some
quick guidance/suggestions

 pseudo code;

 cd root of document directory

 Load array with names of directories

 forech subdir in @dirnames

      cd $subdir
      lots of if statements to figure what find command and what
option to use
       push @temp_array onto big array
       other processing
end foreach

what I'd like to do is to be able to simultaneously be searching more
than 1 subdirectory

TX for your help -

Re: Speeding my script

Quoted text here. Click to load it

Your idea is only likely to help if the directories reside on
disks, otherwise it will slow down the search by thrashing the disks.

Better would be to analyze the type of requests.  Maybe there
are common searches you can cache. For example, a search for
/the magic words are squeamish ossifrage/ need only be performed
on files known to contain the common word "ossifrage".

Re: Speeding my script

Quoted text here. Click to load it

To me this very much sounds like the 20k+ files are changed too often.
If this is the case you very likely might be able to speed up the
process by using an index of some sort which is updated by another
perl-process in regular periods, i.e. running as cron. I personally
recommend a sql database of some sort against which your web-request
run their queries.
this db can be updated every x mins.

another idea could be to have various flat-file index-database against
which you query using awk in subprocesses, since awk can be a lot
faster than perl in specific cases ...

Re: Speeding my script

Petyr David wrote:
Quoted text here. Click to load it

No need to LOL at your laziness.

Using find/grep on thousands of files and Gb of data is a poor
choice.  Try looking at various indexing tools: htdig, glimpse,
Swish-e, etc.

Re: Speeding my script

Quoted text here. Click to load it

Agreed, but it was my first project in PERL. It started out as a very,
very simple file searcher
and then a bunch of people asked if anyone knew of file search
software that could be implementd quickly.

I meekly raised my hand. Since then a lot of options have been added
and I do believe
that I either take this to the next step, using one of the indexing
tools mentioned, or I
leave it "as is". I have plenty of other things to do. It's just that
I like programming.
My other responsibilities pay me plenty, but are boring and are almost
clerical in nature

TX to all for the help

Re: Speeding my script

Quoted text here. Click to load it

That is going to take a long, long time.

Quoted text here. Click to load it

Thats an option. Check into File::Find, fork() and pipes. You could
create some pipes, fork several processes, do a select on the handles
and run the commands in parallel.

This will still run awfully slow though.

Quoted text here. Click to load it

If you don't need full regex capability, you could check into indices. If you
know one of the words, you can use that to filter out which documents to scan.

If you can get the words sorted, look into Search::Dict (or, use a tied hash)

Best bet is to use an index though. Even if it's crude, a substantial amount
of your time is probably spent opening and closing files. (well, find/grep

An example of a "crude index" is the whatis database.

When you type 'apropos keyword' you're not opening a zillion manpages and
scanning them.

http://www.geniegate.com Custom web programming
Perl * Java * UNIX                        User Management Solutions

Re: Speeding my script

On Feb 23, 3:07 am, nos...@geniegate.com (Jamie) wrote:
Quoted text here. Click to load it

but I do. I've considered, and will install Swish-e. Would i not be
able to use regexes with  something like  Swishe-e?

Site Timeline