Pulling a synopsis from text

I am trying to automatically pull a beginning section from submitted
text and return it with a More.. link. The submitted text is in html
created by FckEditor (http://www.fckeditor.net /).
The trouble I am running into is the cutoff point is often inside of a
tag - ie after an opening <div> but the closing div is cut.
The only idea I have come up with is to build an array of all possible
html tags and search for a close for each but I am hoping there is a
cleaner method. Has anyone attempted such a feat previously?

function getSynop($input="", $more_link="", $synop_size='750') {
   $tmp_str = substr($input, 0, $synop_size);
   $end_val = strrpos($tmp_str, ">") + 1;
   if($end_val < ($synop_size)) {
       $end_val = strrpos($tmp_str, ".") + 1;
   if($end_val < ($synop_size)) {
       $end_val = strrpos($tmp_str, ">") + 1;
   Return substr($input, 0, $end_val) ." <a

Re: Pulling a synopsis from text

crucialmoment wrote:
The trick here is to ignore the tags and only operate on what's between
the tags. Say if we have the following:

This is <div>a test</div> and this is only <div>a test.</div>

and we want 10 characters, we would look at "This is " and grab 8
characters. Then we look at "a test" and retain only 2 characters. As
we have want we need, we will retain 0 characters from " and this is
only " and "a test.". The end result will be:

This is <div>a </div><div></div>

Once the empty tags are discarded we end up with

This is <div>a </div>

which is want we want.

Here's an implementation of the technique:


$s = 'This is some <strong>sample text</strong>. You are using <a
href="http://www.fckeditor.net /">FCKeditor</a>.';

function synop_callback($m) {
    global $synop_char_to_fetch;
    $tag = $m[2];

    // got enough characters already, return just the tag
    if($synop_char_to_fetch < 0) {
        return $tag;

    // decode HTML entities to avoid undercounting
    $inner_html = $m[1];
    $inner_text = html_entity_decode($inner_html);

    if(strlen($inner_text) > $synop_char_to_fetch) {
        // retain up to $synop_char_to_fetch, ending
        // at a word boundary
        $r = preg_replace("/^(.\b)?.*/", '',
        $inner_html = htmlspecialchars(rtrim($r));

    // substract the number of characters retained
    $synop_char_to_fetch -= strlen($inner_text);
    return "$inner_html$tag";

function synop_chop($s, $num) {
    // chop off extra text beyond $num characters
    global $synop_char_to_fetch;
    $synop_char_to_fetch = $num;
    $s = preg_replace_callback('/([^<]*)(<.*?>)?/s', 'synop_callback',

    // collapse empty tags
    do {
        $r = $s;
        $s = preg_replace('/<(\S*?)[^>]*?>\s*<\/>/i', '', $r);
    } while($r != $s);

    // add ellipsis
    $s = preg_replace('/\.?$/', '...', trim($s), 1);
    return $s;

echo synop_chop($s, 20);


Re: Pulling a synopsis from text

1.  Get text.
2.  Remove tags
3.  Take first <n> characters.

$text="This is some <div>text</div> isn't it interesting.  <b>send  
money.</b>  <i>and beer</i>";

function getSynop($input="", $more_link="", $synop_size=750) {
    $syn=substr(strip_tags($text), 0, $synop_size);
    return $syn." <a href='".$more_link.."'>more...</a>";

hope that helps!


