|
Posted by Hilary Cotter on October 14, 2005, 9:02 pm
Please log in for more thread options
The print may generate something called an image overlay, which is basically
an image. Many PDF's contains plain text formatting information which the
iFilter understannds. For the image overlay the pdf contains binary
information which the PDF iFilter does not understand.
For instance if I crack open one pdf this is what I see
obj<</Type/Font/Encoding/WinAnsiEncoding/BaseFont/HelveticaNeue-Italic/FirstChar
32/LastChar 146/Subtype/Type1/FontDescriptor 9 0 R/Widths[278 259 426 556
556 926 630 278 259 259 352 600 278 389 278 333 556 556 556 556 556 556 556
556 556 556 278 278 600 600 600 556 800 667 685 722 704 611 574 759 722 259
519 667 556 870 722 759 648 759 685 648 574 722 611 926 611 611 611 259 333
259 600 500 222 519 593 537 593 537 296 574 556 222 222 481 222 852 556 574
593 593 333 481 315 556 481 759 481 481 444 333 222 333 600 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 278]>>
endobj
8 0
obj<</Type/Font/Encoding/WinAnsiEncoding/BaseFont/HelveticaNeue-BoldItalic/FirstChar
32/LastChar 121/Subtype/Type1/FontDescriptor 10 0 R/Widths[278 296 481 556
556 963 685 278 296 296 407 600 278 407 278 389 556 556 556 556 556 556 556
556 556 556 278 278 600 600 600 574 800 685 722 741 741 667 593 759 741 296
556 722 574 907 741 778 667 778 722 648 611 741 630 944 667 648 648 333 389
333 600 500 259 574 611 556 611 574 352 611 611 259 259 556 259 907 611 593
611 611 389 519 370 611 519 815 519 519]>>
endobj
9 0 obj<</Type/FontDescriptor/FontBBox[-166 -214 1106
957]/FontName/HelveticaNeue-Italic/Flags 96/StemV 85/StemH 75/CapHeight
714/XHeight 517/Ascent 714/Descent -198/ItalicAngle -12>>
endobj
10 0 obj<</Type/FontDescriptor/FontBBox[-166 -218 1129
975]/FontName/HelveticaNeue-BoldItalic/Flags 262240/StemV 142/StemH
107/CapHeight 714/XHeight 517/Ascent 714/Descent -182/ItalicAngle -12>>
endobj
11 0 obj<</Type/Font/Name/Helv/Encoding 139 0
R/BaseFont/Helvetica/Subtype/Type1>>
endobj
12 0 obj<</Encoding 13 0 R/Font 14 0 R>>
endobj
13 0 obj<</PDFDocEncoding 139 0 R>>
endobj
14 0 obj<</Helv 11 0 R/ZaDb 39 0 R/HeBo 138 0 R>>
endobj
15 0
Cracking open another I see this
%PDF-1.5
%
1 0 obj<</Contents 3 0 R/Type/Page/Parent 16 0 R/Rotate 0/MediaBox[0 0 792
1224]/CropBox[0 0 612 792]/Resources 2 0 R>>
endobj
2 0 obj<</Font<</F1 220 0 R/F4 224 0 R/F5 239 0 R/F6 7 0
R>>/ProcSet[/PDF/Text]/ExtGState<</GS1 246 0 R>>>>
endobj
3 0 obj<</Length 4678/Filter/FlateDecode>>stream
H?Wn(P?* ?w<"':'
?i?&)<'#R'}Kk.(Y";5g־eԹ-LY=q{/?X0m\?sYP?t=Oy,OpiO-k2n>vx~w9Xpoe>e,-`
>?
^&"<-!QC?E?srozp|NYލ}3s|ddT|6T~~9
-/~'V5ְ+9yɱ?H-\|fܲ_e}fYe~?~fw3tBrk"'
D"
~t[PE~wCc!SS"f3!Mj.oR&
OOoOesflWZSG>-ΏR͍7DpTq~t=θϏ&DYgi-I&%=NT0bТeI|'h^I2
.qBT4f9eB,sOyn1I"r"
Sof<trM˲HwxHuuF5'4N^4+rYYTFF
ILoveTheDevilBunnyG1z.WyK"%iQgl&wSYN<1S*Z(.S<U-xMuIE6[ȽxVy-V?5t<s\x
.f'M<t4*zT Y<YRZZVI*͐H~(fiE
OS:pM<MJWe'ҦӪTHO:]bz>U)@r?`$9j(^z~M-\HSz.}
>,O(uu)ܭw4?-ǐͽ&*C1=j ([>~URHa?Rnl>?|
c#ff#'~`OB?u^Z:-*^(1[JF^OZ+#͐N@Y8dVmx-)h42]i~9^4K-\Wt<h*&uoz:-DŏoZpAVV<r<
ʳ"%)xYnQ"Vq-t'Y-Ha"#zN/\e10]9,7
p5A;,ø)4Ԭ?5>ukA)/y+"?~-s\u<S"N"qáy`E9dEK:$bz,w_SոY21yM
--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
>I have so far sucessfully been able to index pdf-files that I have SCANNED
>and saved as pdf-documents.
>
> However, when I PRINT directly to a pdf-file (pdf995), for instance a
> print of a webpage, these files are not being indexed. Is there any
> explanation for this, and how do I solve this problem?
>
> Thanks.
>
> Mogens
|