metaphone(), levanshtein() and similar text

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hi Guys,

I have been writing a database search for my site, to increase the
accuracy and chance of a successful resut i have used the metaphone() and
similar_text() comparisons to find the database entries that contain the
most words closely resembling the entered search criteria (only words with
a 80%+ similarity are recorded).  The value for each word over 80% is
stored in an array, then the average worked out from that array to gauge
the rows ranking in the search results.

However, each row in the database searched contains, different amounts of
words, some with many words and some with very few.  This means that the
colum with the higher amount of words has a greater chance of containing
words that score higher than 80% of the search criteria.

I was wondering if anybody knows a mathematical way of making this a more
even search, or any tips how i can make this more accurate.  My site
already searches using fulltext, this is just a backup catering for
results with similar spellings etc.

I dont know if any of that made sense, but any input would be



Re: metaphone(), levanshtein() and similar text

it is difficult to understand from your description what the problem is. are
you talking about cases when a database field containing, for example, just
two words, one of which matches the search word and the other one does not,
not being reported in the search results? if so, why do you think it should
be reported, it just has 50% percent of words matching the search criteria.
what do you mean by an "even" search?


Quoted text here. Click to load it

Site Timeline