Using PHP to calculate MySQL fulltext relevancy scores by hand

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index.  The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4

This is the Fed.  Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query.  So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik.  Here it is:

   * Calculate the relevancy score of a search query if the "SELECT
MATCH () AGAINST ()" fulltext indexing query is unavailable
   * @access public
   * @param object $result Resultset
   * @param object $row Row of resultset to calculate relevancy
   * @param mixed $keyword
   * @param mixed $tableName name of table to perform counting query
to calculate $rows value
   * @return float $score
   * @see DBConnection
   * @see MySQLQuery
   * @see +fulltext+maximum+score&hl=en&lr=&
function calculate_relevancy($result, $row, $keyword, $tableName) {
        Explanation of integer variables:

        1) $w1 = intermediate weight
        2) $dcount = number of times word is present in document (or query
        3) $sumw1 = sum of all values of $w1 found
        4) $uniq = number of unique words in the document (or query row)
        5) $rows = total number of rows in the table
         6) $found = number of rows in the table that contain the word in
question (@sizeof($result))
        7) $qcount = number of times this word is present in the query

    list($w1, $dcount, $sumw1, $uniq, $rows, $found, $qcount) = array(0,
0, 0, 0, 0, 0, 0);

    $isSetUniq = false;

    for ($i = 0; $i < @sizeof($result); $i++) {
     foreach (@array_keys(get_object_vars($row)) as $field) {
      $dcount += @substr_count($result[$i]->$field, $keyword);
      if (!$isSetUniq) $uniq +=
@sizeof(array_flip(str_word_count($row->$field), 1));
     $w1 = log((float)$dcount) + 1;
     $sumw1Array[$i] += $w1;
     $qcount += $sumw1Array[$i];
     $dcount = 0;
     $isSetUniq = true;
    $found = @sizeof($result);

    if (!isset($rows)) {
     static $rows = 0;
     global $dbHost, $dbPort, $dbUser, $dbPwd, $dbDefaultName;
     $dbConnObj =& new DBConnection($dbHost, $dbPort, $dbUser, $dbPwd,
     $dbConn = $dbConnObj->connect();
     $query =& new MySQLQuery('SELECT count(r.*) AS rows FROM ' .
$this->getTempNameName() . ' r LIMIT 1', $dbConn);
     $tempTableRowCountResult = $query->getResult();
     $query = null;
     $dbConn = null;
     $dbConnObj = null;
     $rows = $tempTableRowCountResult[0]->rows;

    $score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);
    return $score;



This completely fails because it produces NAN due to $sumw1 always
equaling INF (infinity).  I am not math-savvy enough to know how to
fix this problem so I appeal to higher intelligence for help on this.


Re: Using PHP to calculate MySQL fulltext relevancy scores by hand

Phil Powell wrote:

Quoted text here. Click to load it

<snip code>

Quoted text here. Click to load it

try replacing the $sumw1 with:


Justin Koivisto -

Re: Using PHP to calculate MySQL fulltext relevancy scores by hand

Quoted text here. Click to load it

I had the formula interpretation wrong.  $sumw1 is the sum of all $w1
values which are the weights of each word that comprises the compound
keyword, for example, if your keyword is "Hello World" then you have
$w1 for "Hello" and $w1 for "World".


Site Timeline