Click here to get back home

Module to get text from a PDF page?

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Module to get text from a PDF page? H. Wade Minter 01-06-2005
Posted by H. Wade Minter on January 6, 2005, 8:52 pm
Please log in for more thread options
I'm looking for a Perl module that will give me the text from a page of a simple
(uncompressed, unencrypted) PDF. I've found several modules on CPAN that will
write text into PDFs, but nothing to get it out.

The closest possibilities look like PDF::API2 and Text::PDF. I've been working
with them, and they seem to be able to get at a lot of meta-information in a
PDF, but
unable to get at the actual text in the file.

My workaround is to shell out to pdftotext to get the text, but I'd like to have
a pure-perl solution if possible. Does anyone know of a module that can do this?

Thanks,
Wade


Similar ThreadsPosted
Sending "page up" and "page down" character problem. September 7, 2004, 1:59 pm
Sticky notes page module? July 6, 2006, 1:09 pm
LWP module - parse one line at a time (only download part of a page) January 20, 2006, 1:50 pm
Is there a module that grabs a remote page and prints thumbnail image? May 26, 2006, 12:09 am
I want an perl module for conver large html page file to multi little pages November 14, 2004, 3:02 am
New Module: Text::Stripper June 14, 2007, 5:05 pm
Building Perl module GD::Text on Solaris 10 August 3, 2007, 3:37 pm
RFC: new module Text::Tagged::InDesign::File and related December 10, 2004, 1:31 pm
lwp authentication on asp page March 23, 2005, 10:28 pm
web page automatic update August 15, 2004, 2:32 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap