Click here to get back home

Convert MS-Word to plain text

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Convert MS-Word to plain text backpack 05-09-2008
Get Chitika Premium
Posted by backpack on May 9, 2008, 1:29 pm
Please log in for more thread options
Are there any perl modules that will allow you to convert MS-Word docs
to plain text?

Posted by Ben Bullock on May 9, 2008, 7:09 pm
Please log in for more thread options
On Fri, 09 May 2008 10:29:33 -0700, backpack wrote:

> Are there any perl modules that will allow you to convert MS-Word docs
> to plain text?

You can use Win32::OLE to do this.

Posted by S P Arif Sahari Wibowo on May 17, 2008, 9:18 am
Please log in for more thread options
On Fri, 9 May 2008, backpack wrote:
> Are there any perl modules that will allow you to convert
> MS-Word docs to plain text?

AFAIK there is no integrated Perl solution for this, but there
are several Perl bridge to external software doing this, such
as:

- Win32::OLE, if you happen to do this in a MS Windows system.

- OpenOffice::UNO supposedly let you control OpenOffice to do
most anything, including open Word document and save it as text
file or extract the text directly. I used OpenOffice UNO from
Java before, not sure how much of UNO implemented in the Perl
module.

- SWISH::Filters use external commant catdoc to extract the text
out of MS Word documents.

--
(stephan paul) Arif Sahari Wibowo
_____ _____ _____ _____
/____ /____/ /____/ /____
_____/ / / / _____/ http://www.arifsaha.com/

Posted by David Combs on May 29, 2008, 9:54 pm
Please log in for more thread options
>On Fri, 9 May 2008, backpack wrote:
>> Are there any perl modules that will allow you to convert
>> MS-Word docs to plain text?
>
>AFAIK there is no integrated Perl solution for this, but there
>are several Perl bridge to external software doing this, such
>as:
>
>- Win32::OLE, if you happen to do this in a MS Windows system.
>
>- OpenOffice::UNO supposedly let you control OpenOffice to do
>most anything, including open Word document and save it as text
>file or extract the text directly. I used OpenOffice UNO from
>Java before, not sure how much of UNO implemented in the Perl
>module.
>
>- SWISH::Filters use external commant catdoc to extract the text
>out of MS Word documents.
>
>--
> (stephan paul) Arif Sahari Wibowo
> _____ _____ _____ _____
> /____ /____/ /____/ /____
> _____/ / / / _____/ http://www.arifsaha.com/

These two also, anyone compared to "antiword"?

Thanks

David



Posted by backpack on June 3, 2008, 5:29 pm
Please log in for more thread options
I actually just ended up using Python with the win32 extensions. The
only downside to this is that you have to work on a windows machine
with microsoft word installed. At first I did try using antiword. It
partially worked. I'm not sure exactly why it didnt convert all the
documents but it converted about a third(30,000 of them). The rest of
the documents antiword claimed were'nt word docs but they obviously
were. Maybe an unsupported version of word? And for the hell of it I
tried changing the extensions of the files to .rtf and tried using
UnRtf but that didn't work either. The point of the original post was
that I was looking for a perl module i can run in a linux environment
therefore rendering win32::ole useless. Should've specified...


On May 29, 9:54=A0pm, dkco...@panix.com (David Combs) wrote:
>
>
>
> >On Fri, 9 May 2008, backpack wrote:
> >> Are there any perl modules that will allow you to convert
> >> MS-Word docs to plain text?
>
> >AFAIK there is no integrated Perl solution for this, but there
> >are several Perl bridge to external software doing this, such
> >as:
>
> >- Win32::OLE, if you happen to do this in a MS Windows system.
>
> >- OpenOffice::UNO supposedly let you control OpenOffice to do
> >most anything, including open Word document and save it as text
> >file or extract the text directly. I used OpenOffice UNO from
> >Java before, not sure how much of UNO implemented in the Perl
> >module.
>
> >- SWISH::Filters use external commant catdoc to extract the text
> >out of MS Word documents.
>
> >--
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(stephan paul=
) Arif Sahari Wibowo
> > =A0 =A0_____ =A0_____ =A0_____ =A0_____
> > =A0 /____ =A0/____/ /____/ /____
> > =A0_____/ / =A0 =A0 =A0/ =A0 =A0/ _____/ =A0 =A0 =A0http://www.arifsaha.=
com/
>
> These two also, anyone compared to "antiword"?
>
> Thanks
>
> David


Similar ThreadsPosted
trying to use HTML::Mason on apache2 but scripts come up as plain text in the browser October 23, 2006, 1:50 am
Sort::Maker: style => 'plain' difficulty December 14, 2006, 4:35 am
Sort::Maker: (Notes) The plain and the orcish don't include the "init_code" December 14, 2006, 7:32 am
Convert from base64 to TIFF or BMP or JPG September 24, 2004, 12:15 am
Is there any way to convert swf file to bitmap? March 30, 2006, 4:58 am
Module to convert emails February 25, 2007, 2:20 pm
how to convert decimal to hexadecimal in perl July 12, 2004, 8:30 am
Convert::ASN1 - Decode error June 2, 2005, 9:46 am
Convert a PDF document into MS Word docuemnt October 6, 2006, 7:26 am
module to convert wiki-style to html June 16, 2005, 3:09 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap