Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- Full text search in PDF and Word files ?
- Ned Baldessin
September 19, 2005, 2:11 pm
rate this thread
I need to perform full text searches on a batch of PDF and Word files.
What is the best way to go?
After some research, I'm thinking of extracting the plain text from the
files with "pdftotext" and "catdoc", hamonizing the various possible
encodings to UTF-8, storing the text in a MySQL database, and then
using the full text search capabilities of MySQL.
Do you think that would work well? I am told that the files are mostly
text and won't be longer than 30 pages.
My email address doesn't ride a horse.
Re: Full text search in PDF and Word files ?
I do this with Oracle Text -- however the documents are not stored in
the database, in fact Oracle is just used to index them (I store a
filepath and filename)-- of course I do other things with Oracle
however this has been a supurb solution for me and faster than you
could ever believe.
Essentially you get to search unlimited documents in their native
format without actually having to do any real work for it.