17.05.2014 Views

PDFlib TET PDF IFilter 4.0 Manual

PDFlib TET PDF IFilter 4.0 Manual

PDFlib TET PDF IFilter 4.0 Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.3 <strong>PDF</strong> Documents are not or not completely indexed<br />

Maximum file size. The registry value MaxDownloadSize determines the maximum<br />

file size of documents which will be crawled and indexed (unit: MB, default: 16).<br />

Maximum growth factor. The registry value MaxGrowFactor contains a factor with<br />

which the MaxDownloadSize value is multiplied to determine the maximum amount of<br />

text for a document that will be indexed. This factor is necessary because the text can be<br />

compressed inside the file, as is usually the case for <strong>PDF</strong> documents (unit: none, default:<br />

4).<br />

With the default settings a maximum of 64 MB of extracted text per document will<br />

be indexed. Depending on the exact product and version the MaxDownloadSize and<br />

MaxGrowFactor registry values can be found under the following keys in the Windows<br />

registry:<br />

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\<br />

\Search\Applications\\Gathering Manager<br />

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\<br />

\Search\Applications\\Gathering Manager<br />

HKEY_LOCAL_MACHINE\Software\Microsoft\SPSSearch\Gathering Manager<br />

The GUID will vary from installation to installation. A Microsoft article on this topic can<br />

be found here:<br />

SharePoint Server and Search Server are subject to limitations which affect indexing of<br />

large documents. Since these limitations are not well explained in Microsoft documentation<br />

the following notes collect information based on Microsoft support articles and<br />

blogs. These notes are not authoritative; if in doubt please contact Microsoft for guidance.<br />

office.microsoft.com/en-us/sharepoint-server-help/specifying-the-file-size-that-sharepointportal-server-2003-can-crawl-HA001164841.aspx<br />

Chunk buffer size. Another limitation affects the total number of unique words per<br />

document which can be indexed. The value CB_ChunkBufferSizeInMegaBytes determines<br />

the space which is reserved for the collection of unique words per document (unit: MB,<br />

default: 8).<br />

Bytes reserved for document. The value CB_MinBytesReservedForDoc depends on the<br />

CB_ChunkBufferSizeInMegaBytes value. It should be 2 MB less than the value of CB_<br />

ChunkBufferSizeInMegaBytes, although this relation is not true for the default values<br />

(unit: bytes, default: 3,145,728).<br />

Depending on the exact product and version the CB_ChunkBufferSizeInMegaBytes and<br />

CB_MinBytesReservedForDoc registry values can be found under the following keys in the<br />

Windows registry:<br />

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\<br />

\Search\Global\Gathering Manager<br />

74 Chapter 5: Troubleshooting

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!