Click here to get back home

Different types of pdf-files??

 HomeNewsGroups | Search | About
 microsoft.public.msn.search    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Different types of pdf-files?? mogens 10-09-2005
Posted by mogens on October 9, 2005, 11:20 am
Please log in for more thread options


I have so far sucessfully been able to index pdf-files that I have
SCANNED and saved as pdf-documents.

However, when I PRINT directly to a pdf-file (pdf995), for instance a
print of a webpage, these files are not being indexed. Is there any
explanation for this, and how do I solve this problem?

Thanks.

Mogens


Posted by Hilary Cotter on October 14, 2005, 9:02 pm
Please log in for more thread options


The print may generate something called an image overlay, which is basically
an image. Many PDF's contains plain text formatting information which the
iFilter understannds. For the image overlay the pdf contains binary
information which the PDF iFilter does not understand.

For instance if I crack open one pdf this is what I see

obj<</Type/Font/Encoding/WinAnsiEncoding/BaseFont/HelveticaNeue-Italic/FirstChar
32/LastChar 146/Subtype/Type1/FontDescriptor 9 0 R/Widths[278 259 426 556
556 926 630 278 259 259 352 600 278 389 278 333 556 556 556 556 556 556 556
556 556 556 278 278 600 600 600 556 800 667 685 722 704 611 574 759 722 259
519 667 556 870 722 759 648 759 685 648 574 722 611 926 611 611 611 259 333
259 600 500 222 519 593 537 593 537 296 574 556 222 222 481 222 852 556 574
593 593 333 481 315 556 481 759 481 481 444 333 222 333 600 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 278]>>
endobj
8 0
obj<</Type/Font/Encoding/WinAnsiEncoding/BaseFont/HelveticaNeue-BoldItalic/FirstChar
32/LastChar 121/Subtype/Type1/FontDescriptor 10 0 R/Widths[278 296 481 556
556 963 685 278 296 296 407 600 278 407 278 389 556 556 556 556 556 556 556
556 556 556 278 278 600 600 600 574 800 685 722 741 741 667 593 759 741 296
556 722 574 907 741 778 667 778 722 648 611 741 630 944 667 648 648 333 389
333 600 500 259 574 611 556 611 574 352 611 611 259 259 556 259 907 611 593
611 611 389 519 370 611 519 815 519 519]>>
endobj
9 0 obj<</Type/FontDescriptor/FontBBox[-166 -214 1106
957]/FontName/HelveticaNeue-Italic/Flags 96/StemV 85/StemH 75/CapHeight
714/XHeight 517/Ascent 714/Descent -198/ItalicAngle -12>>
endobj
10 0 obj<</Type/FontDescriptor/FontBBox[-166 -218 1129
975]/FontName/HelveticaNeue-BoldItalic/Flags 262240/StemV 142/StemH
107/CapHeight 714/XHeight 517/Ascent 714/Descent -182/ItalicAngle -12>>
endobj
11 0 obj<</Type/Font/Name/Helv/Encoding 139 0
R/BaseFont/Helvetica/Subtype/Type1>>
endobj
12 0 obj<</Encoding 13 0 R/Font 14 0 R>>
endobj
13 0 obj<</PDFDocEncoding 139 0 R>>
endobj
14 0 obj<</Helv 11 0 R/ZaDb 39 0 R/HeBo 138 0 R>>
endobj
15 0

Cracking open another I see this

%PDF-1.5
%
1 0 obj<</Contents 3 0 R/Type/Page/Parent 16 0 R/Rotate 0/MediaBox[0 0 792
1224]/CropBox[0 0 612 792]/Resources 2 0 R>>
endobj
2 0 obj<</Font<</F1 220 0 R/F4 224 0 R/F5 239 0 R/F6 7 0
R>>/ProcSet[/PDF/Text]/ExtGState<</GS1 246 0 R>>>>
endobj
3 0 obj<</Length 4678/Filter/FlateDecode>>stream
H?Wn(P?* ?w<"':'
?i?&)<'#R'}Kk.(Y";5g־eԹ-LY=q{/?X0m\?sYP?t=Oy,OpiO-k2n>vx~w9 Xpoe>e,-`
>?
^&"<-!QC?E?sr ozp|NYލ}3s|ddT|6T~~9
-/~'V5ְ+9yɱ?H-\|fܲ_e}fYe~?~fw3t؁Brk"'
D"
~t[PE~wCc! SS"f3! Mj.oR&
OOoOesflWZSG>-ΏR͍7DpTq~t=θϏ&DYgi-I&%=NT0bТeI|'h^I2
.qBT4f9eB,sOyn1I"r"
Sof<trM˲HwxHuuF5'4N^4+׫rYYTFF
ILoveTheDevilBunnyG1z.WyK"%iQgl&wSYN<1S *Z(.S<U-xMuIE6[ȽxVy-V?5 t<s\x
.f'M<t4*zT Y<YRZZVI*͐H~(fiE
OS:pM<MJWe'ҦӪTHO:]bz>U)@r?`$9j(^z~M-\H Sz.}
>,O(uu)ܭw4?-ǐͽ&*C1 =j ([>~URHa?Rnl>?|
c#ff#'~`OB?u ^Z:-*^(1[JF^OZ+#͐N@Y8dVmx-)h42]i~9^4K-\Wt<h*&uoz:-DŏoZpAVV<r<
ʳ"%)xYnQ"Vq-t'Y-Ha"#zN/ \e10]9,7
p5A;,ø)4Ԭ?5>ukA)/y+"?~-s\u<S"N"q؁áy`E9dEK:$bz,w_SոY21yM

--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>I have so far sucessfully been able to index pdf-files that I have SCANNED
>and saved as pdf-documents.
>
> However, when I PRINT directly to a pdf-file (pdf995), for instance a
> print of a webpage, these files are not being indexed. Is there any
> explanation for this, and how do I solve this problem?
>
> Thanks.
>
> Mogens




Similar ThreadsPosted
Indexing only certain file types May 23, 2005, 1:10 pm
WDS 3.01 - Not listing all file types December 2, 2007, 2:39 pm
Office 2007 File Types? September 10, 2007, 12:32 pm
Are certain file-types permanently excluded from index? June 9, 2005, 3:06 am
Desktop Search: Documents file types January 28, 2006, 10:35 am
Desktop Search 3 beta 2 - list of file types is truncated September 4, 2006, 9:11 am
WDS: How do I search for file names or File Types? March 21, 2006, 10:43 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap