Full Text File Search with Indexing Service on Windows (cont.)

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Here's the rest of the tutorial I started earlier:

Aside from text within a document, Indexing Service let you search on
meta information stored in the files. For example, MusicArtist and
MusicAlbum let you find MP3 and other music files based on the singer
and album name; DocAuthor let you find Office documents created by a
certain user; DocAppName let you find files of a particular program,
and so on.

Indexing Service uses plug-ins known as iFilters to extract information
from files it indexes. A default installation of Windows has iFilters
for many common file formats like HTML, Word, PowerPoint, and Excel.
You can extend Indexing Service's capability by installing additional
iFilters. Many are listed at http://www.ifilter.org /, with support
available for PDF, Photoshop,  ZIP, Visio, Open Office, and others.

In the previous example, we used CONTAINS(Contents, '$keyword') to
search for a particular key word. Only files containing that exact word
would be returned. If $keyword is 'date,' then Indexing Service would
find those files with the word "date" but not those containing 'dates.'
To relax the criteria somewhat, we can use the FORMSOF (INFLECTIONAL,
<word>) construct. Example:

   $dir = 'C:\htdocs'
   $keyword = 'FORMSOF (INFLECTIONAL, date)';
   $sql = "SELECT filename, size, path
              FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
              WHERE CONTAINS(Contents, '$keyword')";
   $res = oledb_query($sql, $link);

Now Indexing Service will look for all the inflected forms of the word:
date, dates, dating, dated, etc. If the word specified is "good," then
it'd look for good, better, best, and well.

To search on a partial word, we use the * sign:

   $keyword = ' "kn*" ';

The double-quotation marks indicate a wild-card search. The above
pattern means any word starting with "kn" is considered a match.

Indexing Service also supports the use of the <field> LIKE '%pattern%'
and <field> = 'value' SQL expressions. They are best avoided, however,
as they can be incredible slow: Matching against the value of a field
often means reading from the files.

To sort the results, we add an ORDER BY clause:

   $dir = 'C:\htdocs'
   $keyword = 'FORMSOF (INFLECTIONAL, good)';
   $sql = "SELECT filename, size, path
              FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
              WHERE CONTAINS(Contents, '$keyword')
              ORDER BY size DESC";
   $res = oledb_query($sql, $link);

The above example list the files found from the biggest to the
smallest. "ORDER BY write DESC" would list the more recently modified
files first, while "ORDER BY create DESC" list first the ones more
recently created. You can, of course, also use these file attributes as
 search criteria.

Thus far we have been searching on the computer's default catalog. If
searching will be done only in a particular folder, it's worthwhile to
create a separate catalog. You can do this in the Computer Management
console. To search different catalog to OLE-DB, you specify the catalog
name in the connection string as the data source::

   $link = oledb_open("Provider=MSIDXS; Data Source=web_cat");

Finally, what if you want to search files residing on a network server?
While it's possible to index a network drive, it's not terribly
efficient. Instead, you'd want to enable Indexing Service on that
computer and perform the search there.

To search a remote catalog, we prepend the SCOPE() statement with the
computer name and the catalog name:

   $dir = '\fileserver\projects'
   $keyword = 'FORMSOF (INFLECTIONAL, bad)';
   $sql = "SELECT filename, size, path
              FROM fileserver.System..SCOPE('DEEP TRAVERSAL OF
              WHERE CONTAINS(Contents, '$keyword')";
   $res = oledb_query($sql, $link);

Note that the double period is not a typo. Windows Authentication is
used to determine what files are visible. For the code above to work
the web server has to run as a user on the network.

Re: Full Text File Search with Indexing Service on Windows (cont.)

I just came across this and it is spectacular.  It works great and
makes using the indexing service to handle the heavy lifting of
searching a breeze.  Thank you.

Is there anywhere to find more advanced examples like boolean searches,
use of wildcard characters, or searching across multiple file


Chung Leong wrote:
Quoted text here. Click to load it

Re: Full Text File Search with Indexing Service on Windows (cont.)

sutton128@yahoo.com wrote:
Quoted text here. Click to load it

I'm not really an expert in Indexing Service. Here's something I just
came across:


The query string described in the document goes into CONTAINS()
statement. I realize now that what I said about the double quoted
strings was wrong. It's used for searching multiple words in a sequence
(i.e. a sentence). You can use the prefix* syntax without the double

To look specifiy multiple criteria, you just join them together in the
WHERE clauses as you would when querying a database.


SELECT path, filename, size, write
WHERE CONTAINS(contents, 'love AND NOT sex')
AND size > 10240
AND write > '01-01-2006'

The statement above looks for files containing 'love' but not 'sex',
that are larger than 10K and modified some time this year, and lists
them from the smallest to biggest.

To do a wildcard match against the filename, you use the LIKE
'%pattern%' syntax.


SELECT path, filename, size, write
WHERE filename LIKE '%.mp3'

This statement looks for files with the .mp3 extension.

Re: Full Text File Search with Indexing Service on Windows (cont.)

Thanks for the reply and leads.  After I posted I was thinking about
the queries and realized about the WHERE ... AND ... thing.  Also
looking at how MS implements it in their search dialog helped me
understand what was going on.

Thanks again.


Site Timeline