|
Posted by backpack on June 3, 2008, 5:29 pm
Please log in for more thread options I actually just ended up using Python with the win32 extensions. The
only downside to this is that you have to work on a windows machine
with microsoft word installed. At first I did try using antiword. It
partially worked. I'm not sure exactly why it didnt convert all the
documents but it converted about a third(30,000 of them). The rest of
the documents antiword claimed were'nt word docs but they obviously
were. Maybe an unsupported version of word? And for the hell of it I
tried changing the extensions of the files to .rtf and tried using
UnRtf but that didn't work either. The point of the original post was
that I was looking for a perl module i can run in a linux environment
therefore rendering win32::ole useless. Should've specified...
On May 29, 9:54=A0pm, dkco...@panix.com (David Combs) wrote:
>
>
>
> >On Fri, 9 May 2008, backpack wrote:
> >> Are there any perl modules that will allow you to convert
> >> MS-Word docs to plain text?
>
> >AFAIK there is no integrated Perl solution for this, but there
> >are several Perl bridge to external software doing this, such
> >as:
>
> >- Win32::OLE, if you happen to do this in a MS Windows system.
>
> >- OpenOffice::UNO supposedly let you control OpenOffice to do
> >most anything, including open Word document and save it as text
> >file or extract the text directly. I used OpenOffice UNO from
> >Java before, not sure how much of UNO implemented in the Perl
> >module.
>
> >- SWISH::Filters use external commant catdoc to extract the text
> >out of MS Word documents.
>
> >--
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(stephan paul=
) Arif Sahari Wibowo
> > =A0 =A0_____ =A0_____ =A0_____ =A0_____
> > =A0 /____ =A0/____/ /____/ /____
> > =A0_____/ / =A0 =A0 =A0/ =A0 =A0/ _____/ =A0 =A0 =A0http://www.arifsaha.=
com/
>
> These two also, anyone compared to "antiword"?
>
> Thanks
>
> David
|