PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
PDFlib TET PDF IFilter 4.0 Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Note <strong>TET</strong> <strong>PDF</strong> <strong>IFilter</strong> does not apply any language-specific processing beyond language detection. It<br />
is up to the <strong>IFilter</strong> client to use the LCID information. While some <strong>IFilter</strong> clients (e.g. SharePoint,<br />
SQL Server) include sophisticated LCID treatment, other <strong>IFilter</strong> clients may completely ignore<br />
the LCID information.<br />
Table 2.1 Common LCID values and the corresponding primary and secondary language<br />
LCID primary language secondary language (country)<br />
0x0000 Neutral locale language Neutral sublanguage<br />
0x0401 Arabic (ar) Saudi Arabia (SA)<br />
0x0404 Chinese (zh) Traditional (Hant)<br />
0x0407 German (de) Germany (DE)<br />
0x0409 English (en) United States (US)<br />
0x040c French (fr) France (FR)<br />
0x0410 Italian (it) Italy (IT)<br />
0x0411 Japanese (ja) Japan (JP)<br />
0x0413 Dutch (nl) Netherlands (NL)<br />
0x0419 Russian (ru) Russia (RU)<br />
0x0804 Chinese (zh) Simplified (Hans)<br />
0x0c0a Spanish (es) Spain (ES)<br />
0x0800<br />
System default locale language<br />
0x1000 Unspecified custom locale language Unspecified custom sublanguage<br />
XML configuration for LCIDs. LCIDs for overriding or supplementing automatic LCID<br />
detection can be specified in the LocaleId element of the XML configuration file:<br />
<br />
The detection attribute can have the values auto, disabled, and script. All other attributes<br />
except default will be ignored if detection=disabled. Default is auto. The script setting activates<br />
script analysis, but disables statistical analysis.<br />
The default attribute can be used to specify a global LCID setting which will be used<br />
for all text if detection=disabled. If this attribute is missing, the system locale will be<br />
used.<br />
For all attributes except detection a numeric value in decimal or hexadecimal syntax<br />
can be specified. Hexadecimal values must start with 0x. Table 2.2 lists the supported<br />
script attributes and their default values. LCIDs for text in all other scripts will be assigned<br />
automatically.<br />
2.2 Automatic Language Detection 25