PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.3 <strong>PDF</strong> Documents are not or not completely indexed<br />
Maximum file size. The registry value MaxDownloadSize determines the maximum<br />
file size of documents which will be crawled and indexed (unit: MB, default: 16).<br />
Maximum growth factor. The registry value MaxGrowFactor contains a factor with<br />
which the MaxDownloadSize value is multiplied to determine the maximum amount of<br />
text for a document that will be indexed. This factor is necessary because the text can be<br />
compressed inside the file, as is usually the case for <strong>PDF</strong> documents (unit: none, default:<br />
4).<br />
With the default settings a maximum of 64 MB of extracted text per document will<br />
be indexed. Depending on the exact product and version the MaxDownloadSize and<br />
MaxGrowFactor registry values can be found under the following keys in the Windows<br />
registry:<br />
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\<br />
\Search\Applications\\Gathering Manager<br />
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\<br />
\Search\Applications\\Gathering Manager<br />
HKEY_LOCAL_MACHINE\Software\Microsoft\SPSSearch\Gathering Manager<br />
The GUID will vary from installation to installation. A Microsoft article on this topic can<br />
be found here:<br />
SharePoint Server and Search Server are subject to limitations which affect indexing of<br />
large documents. Since these limitations are not well explained in Microsoft documentation<br />
the following notes collect information based on Microsoft support articles and<br />
blogs. These notes are not authoritative; if in doubt please contact Microsoft for guidance.<br />
office.microsoft.com/en-us/sharepoint-server-help/specifying-the-file-size-that-sharepointportal-server-2003-can-crawl-HA001164841.aspx<br />
Chunk buffer size. Another limitation affects the total number of unique words per<br />
document which can be indexed. The value CB_ChunkBufferSizeInMegaBytes determines<br />
the space which is reserved for the collection of unique words per document (unit: MB,<br />
default: 8).<br />
Bytes reserved for document. The value CB_MinBytesReservedForDoc depends on the<br />
CB_ChunkBufferSizeInMegaBytes value. It should be 2 MB less than the value of CB_<br />
ChunkBufferSizeInMegaBytes, although this relation is not true for the default values<br />
(unit: bytes, default: 3,145,728).<br />
Depending on the exact product and version the CB_ChunkBufferSizeInMegaBytes and<br />
CB_MinBytesReservedForDoc registry values can be found under the following keys in the<br />
Windows registry:<br />
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\<br />
\Search\Global\Gathering Manager<br />
74 Chapter 5: Troubleshooting