- Page 1 and 2: PDFlib GmbH München, Germany www.p
- Page 3: Contents 0 First Steps with TET 7 0
- Page 8 and 9: 0.2 Applying the TET License Key Us
- Page 11 and 12: 1 Introduction The PDFlib Text Extr
- Page 13 and 14: may extract some text which is not
- Page 15 and 16: 2 TET Command-Line Tool 2.1 Command
- Page 17 and 18: Constructing TET command lines. The
- Page 19 and 20: Extract images and generate TETML i
- Page 21 and 22: 3 TET Library Language Bindings Thi
- Page 23 and 24: TET_CATCH(tet) { printf("Error %d i
- Page 25 and 26: 3.4 COM Binding Installing the TET
- Page 27 and 28: 3.6 .NET Binding The .NET edition o
- Page 29 and 30: 3.8 PHP Binding Installing the TET
- Page 31 and 32: 3.9 Python Binding Installing the T
- Page 33 and 34: Exception Handling in RPG. TET clie
- Page 35 and 36: 4 TET Connectors TET connectors pro
- Page 37 and 38: 4.2 TET Connector for the Lucene Se
- Page 39 and 40: Indexing metadata fields. The TET c
- Page 41 and 42: 4.4 TET Connector for Oracle The TE
- Page 43 and 44: these options on the command line.
- Page 45 and 46: TET PDF IFilter is freely available
- Page 47 and 48: tracts text and metadata from the P
- Page 49 and 50: 5 Configuration 5.1 Indexing protec
- Page 51 and 52: 5.2 Resource Configuration and File
- Page 53 and 54: tet/3.0/resource /tet/3.0/resource/
- Page 55 and 56:
Geometry. The geometry features may
- Page 57 and 58:
6 Text Extraction 6.1 Document Doma
- Page 59 and 60:
How to search with Acrobat 8/9: not
- Page 61 and 62:
6.2 Unicode Concepts Unicode encodi
- Page 63 and 64:
artificial characters which will be
- Page 65 and 66:
width (x, y) beta fontsize baseline
- Page 67 and 68:
6.4 Support for Chinese, Japanese,
- Page 69 and 70:
Options for text filtering. There a
- Page 71 and 72:
6.6 Content Analysis PDF documents
- Page 73 and 74:
Dehyphenation. Hyphenated words at
- Page 75 and 76:
Table 6.3 Document styles docstyle=
- Page 77 and 78:
Fig. 6.5 Sample font reports create
- Page 79 and 80:
glyphmapping {{fontname=Warnock* to
- Page 81 and 82:
7 Image Extraction 7.1 Image Extrac
- Page 83 and 84:
7.2 Image Geometry Using TET_get_im
- Page 85 and 86:
7.3 Image Analysis Image merging. S
- Page 87 and 88:
7.4 Restrictions and Caveats Image
- Page 89 and 90:
8 TET Markup Language (TETML) 8.1 C
- Page 91 and 92:
Depending on the sele
- Page 93 and 94:
You can specify the amount of text
- Page 95 and 96:
Table 8.3 TETML elements TETML elem
- Page 97 and 98:
8.4 Transforming TETML with XSLT Ve
- Page 99 and 100:
Run the program as follows: nxslt3.
- Page 101 and 102:
8.5 XSLT Samples The TET distributi
- Page 103 and 104:
Alphabetical list of words in the d
- Page 105 and 106:
9 The pCOS Interface The pCOS (PDFl
- Page 107 and 108:
9.2 Handling Basic PDF Data Types p
- Page 109 and 110:
9.3 Composite Data Structures and I
- Page 111 and 112:
Path prefixes. Prefixes can be used
- Page 113 and 114:
Table 9.3 Universal pseudo objects
- Page 115 and 116:
Table 9.4 Pseudo objects for PDF ob
- Page 117 and 118:
Table 9.5 Pseudo objects for resour
- Page 119 and 120:
9.6 Encrypted PDF Documents pCOS su
- Page 121 and 122:
10 TET Library API Reference 10.1 O
- Page 123 and 124:
10.2 General Functions Perl PHP res
- Page 125 and 126:
Returns Scope Bindings The converte
- Page 127 and 128:
10.3 Exception Handling C++ string
- Page 129 and 130:
10.4 Document Functions C++ int ope
- Page 131 and 132:
Table 10.3 Document options for TET
- Page 133 and 134:
C++ C int open_document_callback(vo
- Page 135 and 136:
Table 10.5 Page options for TET_ope
- Page 137 and 138:
Table 10.6 Suboptions for the conte
- Page 139 and 140:
Table 10.7 Suboptions for the layou
- Page 141 and 142:
Table 10.9 Suboptions for the struc
- Page 143 and 144:
C++ const TET_char_info *get_char_i
- Page 145 and 146:
10.7 Image Retrieval Functions C++
- Page 147 and 148:
C++ int write_image_file(int doc, i
- Page 149 and 150:
10.8 TET Markup Language (TETML) Fu
- Page 151 and 152:
10.9 Option Handling C++ void set_o
- Page 153 and 154:
10.10 pCOS Functions The full pCOS
- Page 155 and 156:
If the object has type stream all f
- Page 157 and 158:
A TET Library Quick Reference The f
- Page 159:
B Revision History Revision history
- Page 162 and 163:
unsupported types 87 XMP metadata 8