How is 'split' working here ?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Code :

use strict ;

my $fname = "My-range-20080511-20080514.txt" ;
my ($pattern1, $pattern2);
$pattern1 = '(.*)-(\d+)-(\d+).txt$';

my @parts = split ( /$pattern1/, $fname) ;
print "$parts[0] Z $parts[1] Z $parts[2] Z $parts[3] Z\n" ;

Output :
 Z My-range Z 20080511 Z 20080514 Z

Why is $parts[0] a blank string? Besides, (\d+) is the second bracket
of pattern1, so I would expect it to be the second element, or
$parts[1], of @parts. Yet $parts[1] is something else. I am confused
as to how 'split' is working here.

Please advise. Thanks in advance.

Re: How is 'split' working here ?

Quoted text here. Click to load it

You are splitting the string "My-range-20080511-20080514.txt".  The
pattern on which you are splitting atually matches the entire
pattern.   Therefore, there would normally be exactly two returned
elements - an empty string in the front, and an empty string in the
back.   By default, split() drops ending empty strings, however.

Make it simpler:  Say my pattern is actually /-!-/.   Here's what I
would get for splitting each of these strings:

'foo-!-bar'    =>   ('foo', 'bar')

'foo-!--!-bar' =>   ('foo', '', 'bar')

'-!-bar'       =>   ('', 'bar')

'-!-bar-!-'    =>   ('', 'bar')  #remember, trailing empty fields are

'-!-'          =>   ('')         #again, trailing empty fields are

Your example is the equivalent of the last example.   Your split
pattern matches the entire string.   That is why the first element
returned is an empty string.

HOWEVER, you did something else - you used capturing parentheses
within your pattern.  When you do that, split returns not only the
pieces of the string that have been split, but also whatever was
captured.   Let's take another look.  This time, pretend your pattern
is /-(!)-/.  That is, the same pattern, but now you're capturing the
exclamation point:

'foo-!-bar'    => ('foo', '!', 'bar')

'foo-!--!-bar' =>   ('foo', '!', '', '!', 'bar')

'-!-bar'       =>   ('', '!', 'bar')

'-!-bar-!-'    =>   ('', '!', 'bar', '!')  #remember, trailing empty
fields are dropped

'-!-'          =>   ('', '!')              #again, trailing empty
fields are dropped.

This is what you did.  Your pattern matches the entire string, so you
get an empty string returned, but you also get one result for each
capturing parentheses.

If you don't want those "extra" results, use non-capturing
$pattern1 = '(?:.*)-(?:\d+)-(?:\d+).txt$';

Alternatively, if what you're actually trying to do is find just those
captured parts, and your confusion is why that empty string appeared
in the first place, the answer is that you shouldn't be using split at
all.  You should just be using the normal =~ operator, like so:

my @parts = ($fname =~ /$pattern1/);

Hope that helps,
Paul Lalli

Site Timeline