Good search theory

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


I'm a webmaster for a college newspaper and I'm implementing an article
search. I'm running PHP with a MySQL database to store the weekly
stories. Does anyone know of an article that could offer good search

My top priority right now is multiple search terms and relevance
sorting based on how many word hits are returned.

It's easy to search for a single word or term in a body of text. I can
just use the MySQL "WHERE `body` LIKE 'term'" query. But what about
searching for two terms, or searching for the most relevant document
based on how many hits of the term are found.

I imagine I would split up the search query and run multiple "LIKE
'term'" queries to find multiple hits. I would have to pick some
arbitrary number of searches because searching each article 50 times is
not an option.

Seems like there are a lot of choices in how to set up a good search
system and I'd like to get started on the right foot to reduce my work


Re: Good search theory

Quoted text here. Click to load it

Just let Google do it.

Re: Good search theory

Quoted text here. Click to load it

If it's an option for you, have a look at swish-e

I don't know if there is a PHP interface or not though. It's semi-difficult to
set up, but the folks who wrote it really did a good job. There are all kinds
of ways of setting up Swish-e for META tags and the like.

Proximity and phrases are quite difficult, tricky stuff but swish-e handles

If swish-e won't work another option might be Lucene: /

Been a few years, but when I checked into it Lucene was quite good as well.
It's java, which may be an issue if you're not already running servlets.
Surprisingly fast, especially considering it's java.

Another option is Ht://dig /

Last I checked, it didn't do phrase matching, but it's quite mature. Been
around a long time, several people are using it. It's the easiest one I've
seen where setup is concerned. If you don't require phrase match, it's pretty

All of them that I've listed use an index and are pretty good at scale.
Wouldn't try to use them in place of, (With the possible exception of
multiple Lucene's) but I bet they would work well for your application.

One could probably fill a small library (or at least a full section of a
library) with books on the subject of searching full text. 'tis not an easy

Quoted text here. Click to load it

Maybe I'm prejudiced, but in my opinion SQL databases are not really designed
for searching full text. (Been awhile, but I've been burned by them for
fulltext search in the past) I suppose for a few hundred articles and/or
highly custom search tools, an SQL database would work. (If your articles are
in XML, then such a database would be OK for searching in titles or maybe within
pre-determined XML containers like <var>..</var>)

The "issue" I take with them is that you are effectively using a database
AS an index. A database's primary goal is (or should be) data storage. Fulltext
indices are a different beast altogether.

They are excellent for setting up prototype "proof of concept" but quickly
break down when using them for larger quantities of data. (This opinion based
on a context-aware search tool, done in 1999, 6 years is a long time and things
may have changed.)

They do make good URL storage devices, last index time, things like that.

--                     Custom web programming
guhzo_42@lnubb.pbz (rot13)                User Management Solutions

Re: Good search theory

Thanks for the many solutions everyone. I'll start with Fulltext
because it will take the least effort to get something rudimentary
working in short order. I'll examine the other options listed as well.

Re: Good search theory

AaronV wrote:
Quoted text here. Click to load it
You could look at fulltext searches.">

Look especially at the MATCH bits to get the relevance of the result.


Re: Good search theory

Since your search will be done on a body of text, I would suggest using
MySQL's fulltext search.  It is more efficient and accurate than using
simple LIKE queries.  Fulltext searches will also allow you to
determine the relevancy of the results.  All the searches that I've
done over the years haven't ever worked "exactly" right, but fulltext
is as close as I've ever gotten.  Below are some links that hopefully
will point you in the right direction.">">


Site Timeline