Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- WebClust - Clustering Search Engine
April 26, 2005, 5:29 am
rate this thread
clustering": the automatic organization of documents into meaningful
groups. WebClust queries one or more web search engines, parses their
result pages to extract the documents (titles, URLs, and short
descriptions) and groups the documents based on this information.
This process presents the best results of the web in a "horizontal"
topical arrangement in addition to a single vertical list.
WebClust offers a service similar to Vivisimo but is more simple,
immediate and light: the core clustering engine is written in C++ and
the clustering of 200 documents takes about 200 ms on a 1GHz Pentium
class Linux machine. A small Linux box is sufficient to serve dozens of
queries per second.
The mission of WebClust is to use this data mining technique to make
sense of large amounts of textual information extracted from internet,
intranet or digital libraries.