string difference and similarity

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33

If I were writing a function, I'd probably compare increasingly
longer substrings, but I'm thinking that php probably already has
functions for that. What is that?

I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.

Jeff

Re: string difference and similarity

I haven't looked into that formula yet. One way to think about it as an
alternative would be to turn the strings into arrays and use either
array_intersect, array_diff, or loop through one of the arrays checking to
see if that value is in_array of the second. I'm not sure how your strings
are created, so it's hard to tell what would be appropriate...since:

I'm a crochety old man.

is different than

Old man, I'm crochety.

and not just by two characters.  :^)

Re: string difference and similarity

On Thu, 13 Nov 2008 01:39:42 -0500, jeff@spam_me_not.com wrote:

If your string comparison needs are all as simple as your example,
using strspn() could probably suit your needs.  Perhaps something
like:

<?php
\$s1 = 'Long Sleeve White P/C Sm 32/33';
\$s2 = 'Long Sleeve White P/C Med 32/33';

\$matchlen = strspn(\$s1, \$s2);

// remove 1st non-matching char
\$same = substr(\$s1, 0, \$matchlen - 1);

// include 1st non-matching char
\$diff = substr(\$s2, \$matchlen - 1);

printf("Same: [%s]\nDiff: [%s]", \$same, \$diff);
?>

strspn() will give us the length of the initial matching segment in
\$s1.  When writing a function, I'd check to see if the strings are
equal first, and preemptively return the string or whatever suits
your needs.

If you need a more complex algorithm, see the manual:

<URL:http://php.net/manual/en/function.levenshtein.php
--
Curtis
\$email = str_replace('sig.invalid', 'gmail.com', \$from);

Re: string difference and similarity

Jeff escribió:

The documentation looks slightly scarce but this package features inline
diffs:

http://pear.php.net/package/Text_Diff

--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com
--