|
Posted by Gunnar Hjalmarsson on February 27, 2008, 8:21 pm
Please log in for more thread options
j ellings wrote:
>
> (html has been converted)
Yes, but why on earth did you post the data in that format?
<non-html data snipped>
> I am trying to capture the information between the <i><b>
> tags as these are the only unique delimiters between entries.
>
> My regex is as follows:
>
> while ($html =~ mgs) {
> #do something
> }
>
> Unfortunately, the regex will match the first instance( Z & A
> Newsstand), but ignore the second (Newstand) and then match on the
> third (Pudgies Deli).
>
> I can see that the match is working according to what I wrote; I am
> trying to fine tune it so that I can grab every match. Is there a way
> to include the previous <i><b> in the next match such that
> it will not skip a potential match?
A zero-width positive look-ahead assertion may be what you are after;
see "perldoc perlre".
while ($html =~ mgs) {
---------------------------------^^^------^
Another approach that doesn't slurp the whole file into a scalar variable:
local $/ = '<i><b>';
while ( my $html = <> ) {
#do something
}
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
|