Help with pattern matching - Page 2

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Re: Help with pattern matching

Am 11.04.2012 15:31, schrieb ExecMan:
Quoted text here. Click to load it

What are you doing?!!! Way to many slashes to be able to read this.
I guess you are trying to achieve what just a \Q would do.

Did you even read Ben's and/or my answer?

Quoted text here. Click to load it

Now you are reading the file multiple times. Do you really think that is

If the log file is really too large (probably it isn't) then read it
line by line as suggested in my previous posting in b2).

- Wolf

Re: Help with pattern matching

On Apr 11, 9:07=A0am, Wolf Behrenhoff
Quoted text here. Click to load it

Ok, your solution seems to work.   Nice:

open (FILE,"<","/home/httpdlogs/apache2/access_log") or die "Can't
open log!";
@log = <FILE>;
close (FILE);

foreach $tag (keys(%url_tags)) {
  $url = $url_tags;
  $count = grep { /\Q$url/ } @log;
  $url_counts = $count;

I'm just worried about a 4 million line file going into an array.  As
long as it does not take up too many resources.  If the file is say,
300MB, that is a lot to put into an array......

Re: Help with pattern matching

On 04/11/12 09:38, ExecMan wrote:
Quoted text here. Click to load it

There are many ways to do what you are asking, however think about
what is it -exactly- that you're trying to count?  Is the 'url' to match
against the referrer?  Is it hits to certain pages?  Something else?

If what you're after doesn't occur in most of the lines, you might
be able to greatly reduce the number of lines you want to look at
by narrowing down your universe to only lines that might contain what
you're after.

open( FILE, '/bin/egrep 'abc123|thispage|zzzyyyxxx'
/home/httpdlogs/apache2/access_log |' );


/bin/egrep 'abc123|thispage|zzzyyyxxx'
/home/httpdlogs/apache2/access_log |

where reads from STDIN.

That approach can be used to not include lines too. e.g.  those
with '.gif' or '.jpg', or any other string, using '-v'.

If you're after referrer, or certain strings in the URL part of the
line, then possibly parse every line and store the specific field
you're after, into a hash, with the count as the value,. Once
they are all gathered, go through that hash looking for those that
contain your 'urls'. That way you parse the file once, gathering only
the relevant data, then go through that as many times as you need.
You'll avoid having to run multiple regular expressions on every
single line in the log.

Possibly you could use Apache::ParseLog to do all of
the work and then simply grep/filter the output as needed.

See also:
perldoc -q 'How do I efficiently match many regular expressions at once'

Re: Help with pattern matching

On Apr 12, 12:18=A0pm, "J. Gleixner" <glex_no-s...@qwest-spam-
no.invalid> wrote:
Quoted text here. Click to load it

I love this.  From a simple programming question I am accumulating all
this wisdom in Perl.

Another thing I was wondering about is why to use the 'strict'
method?  Advantages?  Disadvantages?

Re: Help with pattern matching

Quoted text here. Click to load it

You mean

    use strict;

?  The technical term is "Perl pragma".  The purpose is described at
the top of the man page as "strict - Perl pragma to restrict unsafe

Since it restricts unsafe constructs, it is deemed a very very good
idea indeed.  If it flags an error, it's much more likely than not
(though not guaranteed) that you're doing something wrong than that
you're doing something reasonable and need to turn off "use strict".

The usual idiom is

    use strict;
    use warnings;

(or vice versa).  My ork-place codes so that they often add
    no warnings 'uninitialized';
but I prefer to just not code that way, and I suspect that that's the
most common warning to turn off (though I could easily be wrong).

Tim McDaniel,

Re: Help with pattern matching


Quoted text here. Click to load it

Whether it is 'a lot' depends on how much memory you want to dedicate
to this task and if you're reasonably sure that your the size of your
input will never exceed that. The latter is especially problematic
because an out-of-memory failure happening because of 'a large input'
is a very bad situation for rewriting the code supposed to process
that input. There's also the question if this task is so much more
important than all other tasks running on the same computer that
you're willing to maximize its resource usage in order to minimize the
wallclock time it needs to complete.

Except in cases where it is known that the input file will always be
'rather small', eg, if it is a configuration file, the safe,
conservative choice is to process it line-by-line and assume that the
buffering layer between the Perl code and the system I/O facilities
will employ a 'sensible' buffering strategy.

Re: Help with pattern matching

On Wed, 11 Apr 2012 15:43:46 +0100, Rainer Weikusat wrote:

Quoted text here. Click to load it

This is such important advice. Absolutely spot on. Well worded too.
Should go in a FAQ somewhere as this subject actually comes up fairly


Site Timeline