|
Posted by Ron Savage on February 23, 2005, 11:16 am
Please log in for more thread options
The pure Perl module Search::InvertedIndex::Simple V 1.00
is available immediately from CPAN,
and from http://savage.net.au/Perl-modules.html.
On-line docs, and a *.ppd for ActivePerl are also
available from the latter site.
An extract from the docs:
The input to new(dataset => $a, keyset => $k) is an arrayref of data (each
element of which is a hashref),and an arrayref of keys.
The arrayref of data is in the format returned by many DBI methods,eg DBI's
fetchall_arrayref({}) and DBIx::SQLEngine's fetch_select().
The arrayref of keys is used to select a subset of the keys within each
hashref.These selected keys become the primary keys in the hashref returned by
the method build_index().
In the example in the synopsis, build_index() will return a hashref with the
primary keys 'address' and 'time'.
The values (assumed to be strings) from the arrayref of data corresponding to
those keys are used to create a set of secondary keys under each of these
primary keys.
The secondary keys are created by taking these values, growing them one
character at a time, and using these generated strings as the secondary keys in
the hashref returned by the method build_index().
In the example in the synopsis, build_index() will return a hashref where the
primary key 'address' will have these secondary keys: H, He, Hea, Heav, Heave,
Heaven, Her, Here, T, Th, The, Ther, There.
This means that all data values for the key 'address', and all prefixes of those
values, are used to create entries in the returned hashref.
Similary, the primary key 'time' will have a set of secondary keys.
It should be clear by now that these sets of secondary keys can be used for
searching for the existence of values, eg by using as input user-supplied data
of any length. At the same time, any number of keys can be searched for
simultaneously.
Consider:
my($indexer) = Search::InvertedIndex::Simple -> new(...);
my($index) = $indexer -> build_index();
Now we can tell instantaneously which elements of the dataset contain the
results of a multi-key search:
my(@index) = $$index -> intersection($$index) );
That is, @index = (1). In other words, $$d[1] contains the only hashref where we
have an address value starting with 'He' and a time value starting with 'T'.
Here, intersection() is a method available to objects of type Set::Array, and it
returns a list.
--
Cheers
Ron Savage, ron@savage.net.au on 23/02/2005
http://savage.net.au/index.html Let the record show: Microsoft is not an Australian company
|
| Similar Threads | Posted | | ANNOUNCE: Search::InvertedIndex::Simple::BerkeleyDB V 1.00 | April 17, 2005, 3:18 am |
| ANNOUNCE: New version of RPC::Simple | October 18, 2004, 2:08 pm |
| Announce: Getopt::Simple V 1.48 | November 16, 2004, 10:29 am |
| www::search | July 9, 2007, 6:24 am |
| [RFC] Tagyu::Search | October 24, 2005, 8:42 am |
| In search of working IM library | July 26, 2006, 3:43 pm |
| WWW::Search::Ebay question | August 9, 2006, 9:29 pm |
| In search of "Digest::Nilsimsa" author | March 29, 2005, 3:31 pm |
| Free CGI scripts to search sites? | February 12, 2006, 10:15 am |
| Search for a job online from command line | May 31, 2007, 5:44 am |
|