Click here to get back home

preg_match_all

 HomeNewsGroups | Search | About
 comp.lang.php    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
preg_match_all Anthony Smith 05-31-2008
Get Chitika Premium
Posted by Anthony Smith on May 31, 2008, 10:49 am
Please log in for more thread options
I am trying to take a web page and get all of the links. It almost
works, but I am missing a few links.
Here is what I am using.
preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER);


It will not pick up links like this:

<a class="highlight" href="browse.php?region=West
+Tennessee&amp;zips=38115&amp;mgrp=13&amp;p=2">
<b>Next &gt;</b>
</a>


How do I get it to pickup hrefs like the one above?

Posted by Rik Wasmus on May 31, 2008, 11:08 am
Please log in for more thread options
=

wrote:

> I am trying to take a web page and get all of the links. It almost
> works, but I am missing a few links.
> Here is what I am using.
> preg_match_all('/href=3D[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
> $s,$matches,PREG_SET_ORDER);
>
>
> It will not pick up links like this:
>
> <a class=3D"highlight" href=3D"browse.php?region=3DWest
> +Tennessee&amp;zips=3D38115&amp;mgrp=3D13&amp;p=3D2">
> <b>Next &gt;</b>
> </a>
>
>
> How do I get it to pickup hrefs like the one above?

Add the /s modifier
-- =

Rik Wasmus
...spamrun finished

Posted by AnrDaemon on June 4, 2008, 3:00 pm
Please log in for more thread options
Greetings, Rik Wasmus.
In reply to Your message dated Saturday, May 31, 2008, 19:08:16,

>> I am trying to take a web page and get all of the links. It almost
>> works, but I am missing a few links.
>> Here is what I am using.
>> preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
>> $s,$matches,PREG_SET_ORDER);
>>
>>
>> It will not pick up links like this:
>>
>> <a class="highlight" href="browse.php?region=West
>> +Tennessee&amp;zips=38115&amp;mgrp=13&amp;p=2">
>> <b>Next &gt;</b>
>> </a>
>>
>>
>> How do I get it to pickup hrefs like the one above?

> Add the /s modifier

That would work, after some deeper think about it...
But I wish to offer a bit different approach:

preg_match_all('#href=(?:([\"\'])([^\"\'>]\S*?)[^>]*|([^>\"\']+))>(.*?)</a>#is',
$s, $matches, PREG_SET_ORDER);


It have one downside: your URL will be in (2) or (3) depends on the quotes
around URL.
So you must pull result with construction like

$url_link = empty($matches[N][3]) ? $matches[N][2] : $matches[N][3];
$url_text = $matches[N][4];


--


Posted by AnrDaemon on June 6, 2008, 6:44 am
Please log in for more thread options
Greetings, AnrDaemon.
In reply to Your message dated Wednesday, June 4, 2008, 23:00:34,

>
preg_match_all('#href=(?:([\"\'])([^\"\'>]\S*?)[^>]*|([^>\"\']+))>(.*?)</a>#is',
$s, $matches, PREG_SET_ORDER);


Regexp should be spelled as
'#href=(?:([\"\'])([^\"\'>]\S*?)|([^>\"\'\s]+))[^>]*>(.*?)</a>#is'


--



Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap