HTML Dom Parser

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

So I was looking for a way to be able to parse images (<img>) from a
given url. It so happened that I stumbled upon a nice little piece of
code called "PHP Simple HTML DOM Parser" found here : /

On the first page, I made a form where you can enter a URL, and then
the script tries to fetch the images.
It was indeed what I needed. A simple code like this did a portion of
the job:

include('simple_html_dom.php');                              // This
is the script, download it on their sourceforge.

// Create DOM from URL or file
$dom = file_get_dom('');         // In my code,
this was replaced by a url variable.

// Find all <img>
foreach($dom->find('img') as $element)

       echo $element->src . "<br />" ;


I understood the code, but I'm still a newbie in PHP. What I  still
want to do is:

*Be able to specify that it only fetches .jpeg files for example.
*Only allow images that are bigger than a certain dimensions.
*For now it only gives me the URL (relative or absolute, depending on
the html of the source). What I also want is that it displays the
images parsed.

This is mainly for educational purposes, as the best way to learn PHP
is to keep writing small applications with it. So if anyone can point
me in the right direction, it'll be great. And if you know of another
script with the same functionality, it'll be great, I like learning
different ways to achieve something.


Re: HTML Dom Parser


Quoted text here. Click to load it

Is it faster then PHP5's native DOM (don't mix up Dom & DOM in the manua=
l  =


Quoted text here. Click to load it

preg_match() the src attribute you found, or use DOM & XPATH with a more=

sofisticated XPATH query.

Quoted text here. Click to load it

getimagesize(), keep in mind relative URL's of the page, build a proper =

URL string for this.

Quoted text here. Click to load it

Then output HTML, with img tags with the proper src attributes.
-- =

Rik Wasmus
...spamrun finished

Site Timeline