|
Posted by David Squire on August 28, 2006, 9:45 am
Please log in for more thread options
johny wrote:
> Hi,
> I am trying to read a PDF file using active Perl. I tried with
> PDF::API2 but no use. For example - I should get the text which is on
> the third line of first page...
>
> or
>
> Is there any way where I can save the pdf file as a .txt file and then
> read the file?
> Please help........
Do you need to use Perl? There is the command-line utility pdftotext
that is available on most UNIX-like systems (and no doubt cygwin).
You need to be aware that there is no guarantee that you can get text
out of a PDF document. The PDF standard allows arbitrary encodings to be
used, so you would have to know what the glyph names mean to reconstruct
the text. In some cases the glyph names are not meaningful. See
http://www.glyphandcog.com/textext.html
That being said, pdftotext works in the great majority of cases.
DS
|