17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.4 <strong>TET</strong> Connector for Oracle<br />

The <strong>TET</strong> connector for Oracle attaches <strong>TET</strong> to an Oracle database so that PDF documents<br />

can be indexed and queried with Oracle <strong>Text</strong>. The PDF documents can be referenced via<br />

their path name in the database, or directly stored in the database as BLOBs.<br />

Note Protected documents can be indexed with the shrug option under certain conditions (see<br />

Chapter 5.1, »Indexing protected PDF Documents«, page 49, for details). This is prepared in the<br />

Connector files, but you must manually enable this option.<br />

Requirements and installation. The <strong>TET</strong> connector has been tested with Oracle 10i and<br />

Oracle 11g. In order use the <strong>TET</strong> connector you must specify the AL32UTF8 database character<br />

set when creating the database. This is always the case for the Universal edition of<br />

Oracle Express (but not for the Western European edition). AL32UTF8 is the database<br />

character set recommended by Oracle, and also works best with <strong>TET</strong> for indexing PDF<br />

documents. However, it is also possible to connect <strong>TET</strong> to Oracle <strong>Text</strong> with other character<br />

sets according to one of the following methods:<br />

> Starting with Oracle <strong>Text</strong> 11.1.0.7 the database can perform the required character set<br />

conversion. Please refer to the section »Using USER_FILTER with Charset and Format<br />

Columns« in the Oracle <strong>Text</strong> 11.1.0.7 documentation, available at<br />

download.oracle.com/docs/cd/B28359_01/text.111/b28304/cdatadic.htm#sthref497.<br />

> With Oracle <strong>Text</strong> 11.1.0.6 or earlier the UTF-8 text generated by the <strong>TET</strong> filter script<br />

must be converted to the database character set. This can be achieved by adding a<br />

character set conversion command to tetfilter.sh:<br />

Unix: call iconv (open-source software) or uconv (part of the free ICU Unicode library)<br />

Windows: call a suitable code page converter in tetfilter.bat.<br />

In order to take advantage of the <strong>TET</strong> Connector for Oracle you must make the <strong>TET</strong> filter<br />

script available to Oracle as follows:<br />

> Copy the <strong>TET</strong> filter script to a directory where Oracle can find it:<br />

Unix: copy connectors/Oracle/tetfilter.sh to $ORACLE_HOME/ctx/bin<br />

Windows: copy connectors/Oracle/tetfilter.bat to %ORACLE_HOME%\bin<br />

> Make sure that the <strong>TET</strong>DIR variable in the <strong>TET</strong> filter script (tetfilter.sh or tetfilter.bat, respectively)<br />

points to the <strong>TET</strong> installation directory.<br />

> If required you can supply more <strong>TET</strong> options for the global, document, or page level<br />

in the <strong>TET</strong>OPT, DOCOPT, and PAGEOPT variables (see Chapter 10, »<strong>TET</strong> Library API Reference«,<br />

page 121, for option list details). This is especially useful for supplying the<br />

<strong>TET</strong> license key, e.g.:<br />

<strong>TET</strong>OPT="license=aaaaaaa-bbbbbb-cccccc-dddddd-eeeeee"<br />

See Section 0.2, »Applying the <strong>TET</strong> License Key«, page 8, for more options for supplying<br />

the <strong>TET</strong> license key.<br />

Granting privileges to the Oracle user. The examples below assume an Oracle user<br />

with appropriate privileges to create and query an index. The following commands<br />

grant appropriate privileges to the user HR (these commands must be issued as system<br />

and must be adjusted as appropriate):<br />

SQL> GRANT CTXAPP TO HR;<br />

SQL> GRANT EXECUTE ON CTX_CLS TO HR;<br />

SQL> GRANT EXECUTE ON CTX_DDL TO HR;<br />

4.4 <strong>TET</strong> Connector for Oracle 41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!