|
Posted by Petyr David on February 25, 2008, 11:03 am
Please log in for more thread options
On Feb 23, 3:07 am, nos...@geniegate.com (Jamie) wrote:
>
> >have a web page calling PERL script that searches for patterns in 20,
> >000 files + and returns link to files and lines found matching
> >pattern. I use a call to `find` and `egrep`
>
> That is going to take a long, long time.
>
> >Q: Script works - but is straining under the load - files are in the
> >Gbs.
> > How to speed process? How simple to employ threads or slitting
> >off
> > new processes?
>
> Thats an option. Check into File::Find, fork() and pipes. You could
> create some pipes, fork several processes, do a select on the handles
> and run the commands in parallel.
>
> This will still run awfully slow though.
>
> >what I'd like to do is to be able to simultaneously be searching more
> >than 1 subdirectory
>
> If you don't need full regex capability, you could check into indices. If you
> know one of the words, you can use that to filter out which documents to scan.
>
> If you can get the words sorted, look into Search::Dict (or, use a tied hash)
>
> Best bet is to use an index though. Even if it's crude, a substantial amount
> of your time is probably spent opening and closing files. (well, find/grep
anyway)
>
> An example of a "crude index" is the whatis database.
>
> When you type 'apropos keyword' you're not opening a zillion manpages and
> scanning them.
>
> Jamie
> --http://www.geniegate.com Custom web programming
> Perl * Java * UNIX User Management Solutions
> If you don't need full regex capability, you could check into indices. If you
> know one of the words, you can use that to filter out which documents to scan.
but I do. I've considered, and will install Swish-e. Would i not be
able to use regexes with something like Swishe-e?
|