Click here to get back home

Where to start?

 HomeNewsGroups | Search | About
 microsoft.public.msn.search    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Where to start? Tthe guy who br 08-29-2006
Posted by Jeff R. on August 31, 2006, 8:05 pm
Please log in for more thread options
Okie dokie,

Here's the skinny, it's kind of a 10,000 foot overview but I think it will
answer most of your questions.

The protocol handler is responsible for allowing the indexer to crawl your
data base and is called by the indexer. You can set crawlscope rules in the
indexer to specify what you do and don't want indexed in your database.

Now the Protocol Handler (PH) will in essence get you from place to place in
your database and enumerate what's there. however to break open and parse
the items to get info in to the indexer you need IFilters. IFilters will
pull chunks of data out and then you can get the type of data and the value
of the data then pass it to the indexer in pairs and let it put it where it
needs to go. When you are done with an item the PH looks at the next item,
decides what it is and applies the proper IFilter if one is needed. For
example if you had a word DOC, XL file and say a PDF. The two office docs
would probably use an IFilter that comes with Office, no worries, the PDF
however you would need to download from the Add-ins site or write your own.

Also this technology is almost verbatium exactly the same as for SharePoint
Server. If you look on MSDN you will find a wealth of information on this
there as well as a prewritten IFilter (with source) to pick appart and hack
to your hearts content! I believe there is also a doc and code for a PH but
don't quote me on that.

Try this for the IFilters:
How To:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odc_SP2003_ta/html/ODC_HowToWriteaFilter.asp?frame=true

Premade:
http://addins.msn.com/addins_category_desktop.aspx

More Stuff:
http://channel9.msdn.com/wiki/default.aspx/Channel9.DesktopSearchIFilters

For PH's you may find this interesting :)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/spssdk/html/_introduction_to_a_protocol_handler.asp

Sorry it took so long to get back to you but I decided to sleep a little
this week. :)

Good Luck!
JR

>
> Guess it kind helps.
>
> So I need to write a Protocol handler. What protocol will the handler be
> handling?
>
> As I say my database is in a '.db' file. So would I be handling something
> like:
> myhandler://path/mydbfile.db
>
> If you'll excuse the questionable pseudo code, I want to do something like
> this:
> ---------
> filenames = SELECT filename FROM filestable
> foreach(nextfile in filenames)
> {
> filemetadata = SELECT key, value FROM metadatatable WHERE
> filename=nextfile
> foreach( keyvaluepair in filemetadata )
> {
> IndexThisPieceOfMetaData( nextfile, keyvaluepair )
> }
> }
> ---------
>
> What does what & when in the process?
> - Is this the right process flow?:
> 1) WDS is running and decides to index stuff
> 2) WDS finds my .db file in a indexable location
> 3) Someway, Somehow WDS finds out my protocol handler is registered to
> this
> type, so calls myhandler://path/mydbfile.db
> 4) My handler would recieve the path/filename, open the database and start
> doing "stuff" with my current API to get the metadata.
> 5) Metadata is sent to the index
>
>
> Questions:
> 1) Would I be right in assuming this happens when the file is changed?
> Since
> this is a db file this may change very often. Should I just 'live with
> it'\'make sure it is thread safe'\'have some flag in my db to say what
> changed'.
> 3) & 5) How\Why\What\When\Where?
>
> In the bit of pseudo code, what does
> IndexThisPieceOfMetaData( nextfile, keyvaluepair )
> actually do? How is this metadata sent to the index? How do I tell the
> index
> that I'm talking about a different file? Is it simply case of one of the
> key\value pairs sent to the index being 'filename'\'<absolute path of the
> file>'
>
>
> Is an IFilter used? Is so, when?
>
>
>
> I've got a gazillion other questions, but I need to get clear how the
> whole
> thing hangs together, what takes responsibility for doing what, and when.
>
>
>



Posted by Tthe guy who br on September 1, 2006, 1:53 pm
Please log in for more thread options
Cheers - hope you got some good sleep in :)

That's a good 10,000 ft overview - just what I needed.

Gonna go away and think about it.......



"Jeff R." wrote:

> Okie dokie,
>
> Here's the skinny, it's kind of a 10,000 foot overview but I think it will
> answer most of your questions.
>
> The protocol handler is responsible for allowing the indexer to crawl your
> data base and is called by the indexer. You can set crawlscope rules in the
> indexer to specify what you do and don't want indexed in your database.
>
> Now the Protocol Handler (PH) will in essence get you from place to place in
> your database and enumerate what's there. however to break open and parse
> the items to get info in to the indexer you need IFilters. IFilters will
> pull chunks of data out and then you can get the type of data and the value
> of the data then pass it to the indexer in pairs and let it put it where it
> needs to go. When you are done with an item the PH looks at the next item,
> decides what it is and applies the proper IFilter if one is needed. For
> example if you had a word DOC, XL file and say a PDF. The two office docs
> would probably use an IFilter that comes with Office, no worries, the PDF
> however you would need to download from the Add-ins site or write your own.
>
> Also this technology is almost verbatium exactly the same as for SharePoint
> Server. If you look on MSDN you will find a wealth of information on this
> there as well as a prewritten IFilter (with source) to pick appart and hack
> to your hearts content! I believe there is also a doc and code for a PH but
> don't quote me on that.
>
> Try this for the IFilters:
> How To:
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odc_SP2003_ta/html/ODC_HowToWriteaFilter.asp?frame=true
>
> Premade:
> http://addins.msn.com/addins_category_desktop.aspx
>
> More Stuff:
> http://channel9.msdn.com/wiki/default.aspx/Channel9.DesktopSearchIFilters
>
> For PH's you may find this interesting :)
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/spssdk/html/_introduction_to_a_protocol_handler.asp
>
> Sorry it took so long to get back to you but I decided to sleep a little
> this week. :)
>
> Good Luck!
> JR
>
> >
> > Guess it kind helps.
> >
> > So I need to write a Protocol handler. What protocol will the handler be
> > handling?
> >
> > As I say my database is in a '.db' file. So would I be handling something
> > like:
> > myhandler://path/mydbfile.db
> >
> > If you'll excuse the questionable pseudo code, I want to do something like
> > this:
> > ---------
> > filenames = SELECT filename FROM filestable
> > foreach(nextfile in filenames)
> > {
> > filemetadata = SELECT key, value FROM metadatatable WHERE
> > filename=nextfile
> > foreach( keyvaluepair in filemetadata )
> > {
> > IndexThisPieceOfMetaData( nextfile, keyvaluepair )
> > }
> > }
> > ---------
> >
> > What does what & when in the process?
> > - Is this the right process flow?:
> > 1) WDS is running and decides to index stuff
> > 2) WDS finds my .db file in a indexable location
> > 3) Someway, Somehow WDS finds out my protocol handler is registered to
> > this
> > type, so calls myhandler://path/mydbfile.db
> > 4) My handler would recieve the path/filename, open the database and start
> > doing "stuff" with my current API to get the metadata.
> > 5) Metadata is sent to the index
> >
> >
> > Questions:
> > 1) Would I be right in assuming this happens when the file is changed?
> > Since
> > this is a db file this may change very often. Should I just 'live with
> > it'\'make sure it is thread safe'\'have some flag in my db to say what
> > changed'.
> > 3) & 5) How\Why\What\When\Where?
> >
> > In the bit of pseudo code, what does
> > IndexThisPieceOfMetaData( nextfile, keyvaluepair )
> > actually do? How is this metadata sent to the index? How do I tell the
> > index
> > that I'm talking about a different file? Is it simply case of one of the
> > key\value pairs sent to the index being 'filename'\'<absolute path of the
> > file>'
> >
> >
> > Is an IFilter used? Is so, when?
> >
> >
> >
> > I've got a gazillion other questions, but I need to get clear how the
> > whole
> > thing hangs together, what takes responsibility for doing what, and when.
> >
> >
> >
>
>
>

Similar ThreadsPosted
Terminal server manual - WDS doesn't start! August 24, 2005, 4:57 pm
Windows Desktop Search won't start October 3, 2006, 10:16 am
after uninstall, Start Menu toolbar sticks November 14, 2005, 2:18 pm
WDS Protocol Handler: Indexing does not start automatically April 25, 2006, 5:58 am
Unable to start Windows Search Service April 1, 2007, 1:52 pm
Latest version install - Windows Desktop Search could not be start June 11, 2005, 10:30 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap