Reading PDF Headings and Page Numbers using PHP

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I have a directory of PDF files which contain Headings/Sub Headings
and Page Numbers. I wish to write a script to open the PDF, read the
Headings and any sub headings and write them out to a file. I want to
do this to create some  meta files (.pdf.desc). Most libraries that
I've seen give the methods to write the headings but not read them.
How can I do this?



Re: Reading PDF Headings and Page Numbers using PHP wrote:
Quoted text here. Click to load it

Good luck...

I tried to do something similar last year (I wanted to pull out just the
main body of the text, without headings, images, page numbers etc.). I'm
afraid that even though I searched for a long time I was unable to find
any libraries that would do this sort of thing. In the end, I downloaded
the PDF spec and rolled my own code. The spec is quite large but it's
fairly well-written so you may be able to pick out just the bits you
need to implement. It took me about a week to read through the document
and write my code, but if you're an experienced developer (I'm not!)
then no doubt you'll be able to do it quicker than that.


Site Timeline