Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- ted holden
April 11, 2005, 6:41 am
rate this thread
I'm looking for two or three forum webmasters who'd like to try using an
index and retrieval technology which could conceivably make everything
which had ever been posted on a forum searchable with very little overhead.
The parameters of the thing I'm talking about are roughly as follows. It
creates search indices between ten and a hundred times faster than any
other indexing and retrieval software, normally indexes to file/sector
pointers meaning that unlike Glimpse and Glimpse clones it can index one or
more very large files and search them quickly, and becomes increasingly
fast at searching as you feed it longer and more search terms, which is
opposite to the behavior of database-like techniques. Searching on more
than one word is more like an AI approach and drastically reduces the hits
which you don't really want to see. The normal version of the thing which
assumes that the text to search exists in one or more files in directories
creates indices which are 14% the size of the text. Normal for other
software is more like 50 - 100% or even more.
The linux version of this thing is in the form of a shared library written
in C++ which is normally called from python code, either using Zope or
something like a python xmlrpc server. Using it with a forum which
maintained threads in mysql would require somebody modifying the python
code which retrieves file/sector areas to instead generate sql statements
and retrieve the appropriate text areas.
This software provides something like the power of much larger commercial
index and retrieval packages while running on a PC. Three years ago on 500
MH PCs which were current at the time, it was able to index the Gutenburg
Project classic literature collection, approximately 2.7 GB of text at the
time, in something like 13 minutes.
Anybody interested in this should contact me: