Extracting between nested HTML tags

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hello all,

First of all, thanks for taking a look at this. I appreciate your
time. I have a string of HTML that defines a list

<li class="one">ABC</li>
<li class="one">DEF</li>
*<li class="two">GHI
***<ul class="three">
*******<li class="four">123</li>
*******<li class="two">456
***********<ul class="three">
*****************<li class="one">RST</li>
*****************<li class="one">UVW</li>
*****************<li class="one">XYZ</li>
*********<li class="four">789</li>
*********<li class="four">123</li>
*********<li class="four">456</li>
*********<li class="four">789</li>
<li class="four">ABC</li>
<li class="four">DEF</li>
<li class="one">GHI</li>

I am looking to take from this everything between "<li class="two">"
and "</li>" (so from the beginning to the end of the nested list). I
have tried with regular expressions, but obviously the "</li>" tag
appears within the nested list so this doesn't work. The string does
not have line breaks or spaces to use explode either. I am getting the
impression I might need to use some form of XML parsing but I really
don't know where to begin.

I would really appreciate some tips or pointers on how to best go
about this, or a tutorial that will likely give me the info I need.

Thanks again


Re: Extracting between nested HTML tags


Quoted text here. Click to load it

Something like this (untested, I might have some method names wrong, if =
it  =

doesn't work check php.net/DOM):
$htmlstring = '';//put the string in there.
$doc = new DOMDocument();
$xpath = new XPath($doc);
$result = $xpath->query("//li[class=3D'two']");
$htmlarray = array();
    for($i = 0;$node = $result->item($i);$i++){
        $htmlarray[] = $doc->saveXML($node);
-- =

Rik Wasmus

Site Timeline