Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- Parsing content for links
February 21, 2007, 8:20 pm
rate this thread
field in the database and I need to verify if those links are correct.
What I need to have happen is have a php script query the database and
then parse through the content field to find all the <a href> tags to
get the href attribute value and the link text.
Does anyone have a way of doing this or a regex to do this?
Re: Parsing content for links
Yeah, regex would be easiest, and there should be plenty out there,
but I might do something like this:
$re = '%
<a[^<>]+ # href may or may not come first
href=(['"]) # capture single/double quote
# match a valid URI
[\w.-]+:(?://)? # scheme
[^?"]+ # authority
# possible query string and fragment
(?: \# [^"]+ )?
# captured quote from above
[^<>]* # possible remaining attributes
>( .*? ) # allow for nested tags
</a> # closing <a> tag
The match for the URI would be in $match and the text for the <a>
tag is in $match.
Just use this $re var in the preg_* functions.
Hope this helps,