PDFlib Text Extraction Toolkitï¼TETï¼ããã¥ã¢ã«
PDFlib Text Extraction Toolkitï¼TETï¼ããã¥ã¢ã«
PDFlib Text Extraction Toolkitï¼TETï¼ããã¥ã¢ã«
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
ABC<br />
<strong>Text</strong> <strong>Extraction</strong> Toolkit (TET)<br />
Version 4.0<br />
Copyright © 1997-2010 <strong>PDFlib</strong> GmbH. All rights reserved.<br />
Protected by European and U.S. patents.<br />
<strong>PDFlib</strong> GmbH<br />
Franziska-Bilek-Weg 9, 80339 München, Germany<br />
www.pdflib.com<br />
+49 • 89 • 452 33 84-0<br />
FAX +49 • 89 • 452 33 84-99<br />
<strong>PDFlib</strong> tech.groups.yahoo.com/group/pdflib <br />
<br />
sales@pdflib.com<br />
<strong>PDFlib</strong> support@pdflib.com <br />
<br />
<br />
<strong>PDFlib</strong> GmbH <strong>PDFlib</strong> GmbH <br />
<br />
<br />
<br />
<strong>PDFlib</strong> <strong>PDFlib</strong> <strong>PDFlib</strong> GmbH <strong>PDFlib</strong> <strong>PDFlib</strong> <br />
<br />
Adobe Acrobat PostScript XMP Adobe Systems Inc. AIX IBM OS/390 WebSphere <br />
iSeries zSeries International Business Machines Corporation ActiveX Microsoft Windows <br />
OpenType Windows Microsoft Corporation Apple Macintosh TrueType Apple Computer,<br />
Inc. Unicode Unicode Unicode, Inc. Unix The Open Group <br />
Java Solaris Sun Microsystems, Inc. HKS the HKS brand association: Hostmann-Steinberg,<br />
K+E Printing Inks, Schmincke <br />
<br />
TET <br />
Zlib Copyright © 1995-2002 Jean-loup Gailly and Mark Adler<br />
TIFFlib Copyright © 1988-1997 Sam Leffler, Copyright © 1991-1997 Silicon Graphics, Inc.<br />
Eric Young Cryptographic Copyright © 1995-1998 Eric Young eay@cryptsoft.com<br />
Independent JPEG Group JPEG Copyright © 1991-1998, Thomas G. Lane<br />
Cryptographic Copyright © 1998-2002 The OpenSSL Project www.openssl.org)<br />
Expat XML Copyright © 1998, 1999, 2000 Thai Open Source Software Center Ltd<br />
ICU International Components for Unicode Copyright © 1995-2009 International Business Machines<br />
Corporation and others<br />
TET RSA Security, Inc. MD5
0 TET 7<br />
0.1 7<br />
0.2 TET 9<br />
1 13<br />
1.1 TET 13<br />
1.2 TET 15<br />
1.3 16<br />
1.4 TET 4.0 17<br />
2 TET 19<br />
2.1 19<br />
2.2 TET 22<br />
2.3 24<br />
24<br />
24<br />
25<br />
25<br />
3 TET 27<br />
3.1 27<br />
3.2 C 29<br />
3.3 C++ 31<br />
3.4 COM 33<br />
3.5 Java 34<br />
3.6 .NET 36<br />
3.7 Perl 37<br />
3.8 PHP 38<br />
3.9 Python 40<br />
3.10 REALbasic 41<br />
3.11 RPG 42
4 TET 45<br />
4.1 Adobe Acrobat TET Plugin 45<br />
4.2 Lucene TET 47<br />
4.3 Solr TET 50<br />
4.4 Oracle TET 51<br />
4.5 Microsoft TET PDF IFilter 54<br />
4.6 MediaWiki TET 57<br />
5 59<br />
5.1 PDF 59<br />
5.2 61<br />
5.3 65<br />
6 69<br />
6.1 PDF 69<br />
6.2 73<br />
6.3 79<br />
79<br />
79<br />
80<br />
6.4 81<br />
81<br />
81<br />
6.5 83<br />
6.6 87<br />
7 Unicode 91<br />
7.1 Unicode 91<br />
7.2 Unicode 94<br />
94<br />
94<br />
7.3 Unicode 96<br />
96<br />
99<br />
103<br />
7.4 105
7.5 Unicode 106<br />
8 113<br />
8.1 113<br />
8.2 115<br />
8.3 117<br />
8.4 118<br />
8.5 119<br />
8.6 121<br />
9 TET TETML 123<br />
9.1 TETML 123<br />
9.2 TETML 127<br />
9.3 TETML TETML 131<br />
9.4 TETML XSLT 134<br />
9.5 XSLT 137<br />
10 pCOS 141<br />
11 TET API 143<br />
11.1 143<br />
11.2 143<br />
11.3 146<br />
11.4 149<br />
11.5 150<br />
150<br />
152<br />
152<br />
154<br />
157<br />
158<br />
11.6 160<br />
11.7 167<br />
11.8 175<br />
11.9 179
11.10 TET TETML 183<br />
11.11 pCOS 186<br />
ATET 189<br />
B 191<br />
193
0TET<br />
0.1 <br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
resource/cmap <br />
<br />
<br />
<br />
<br />
> <br />
--searchpath <br />
> searchpath <br />
<br />
set_option("searchpath=/CMap/ / / ");<br />
<br />
searchpath TETRESOURCEFILE <br />
<br />
<br />
resource/glyphlst
0TET<br />
0.1 <br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
resource/cmap <br />
<br />
<br />
<br />
<br />
> <br />
--searchpath <br />
> searchpath <br />
<br />
set_option("searchpath=/CMap/ / / ");<br />
<br />
searchpath TETRESOURCEFILE <br />
<br />
<br />
resource/glyphlst
0.2 TET <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
licensekeys.txt <br />
<br />
<br />
# Licensing information for <strong>PDFlib</strong> GmbH products<br />
<strong>PDFlib</strong> license file 1.0<br />
TET 4.0 ... ...<br />
<br />
<br />
<br />
<br />
> licensekeys.txt <br />
<br />
<br />
> set_option( ) licensefile <br />
tet.set_option("licensefile", "/path/to/licensekeys.txt");<br />
> --tetopt licensefile <br />
<br />
tet --tetopt "licensefile /path/to/your/licensekeys.txt" ...<br />
<br />
tet --tetopt "licensefile {/path/to/your/license file.txt}" ...<br />
> <br />
<br />
<br />
export PDFLIBLICENSEFILE="/path/to/licensekeys.txt"<br />
<br />
QSTRUP <br />
<br />
ADDENVVAR ENVVAR(PDFLIBLICENSEFILE) VALUE() LEVEL(*SYS)
HKLM\SOFTWARE\<strong>PDFlib</strong>\PDFLIBLICENSEFILE<br />
<br />
HKLM\SOFTWARE\<strong>PDFlib</strong>\TET4\license<br />
HKLM\SOFTWARE\<strong>PDFlib</strong>\TET4\4.0\license<br />
<br />
<br />
<br />
<br />
<br />
<br />
regedit <br />
... <br />
<br />
%systemroot%\syswow64\regedit<br />
<br />
<br />
<br />
<br />
<br />
/<strong>PDFlib</strong>/TET/4.0<br />
/<strong>PDFlib</strong>/TET<br />
/<strong>PDFlib</strong><br />
/usr/local <br />
<br />
<br />
<br />
<br />
<br />
tet --tetopt "license ... ..." ......<br />
<br />
<br />
<br />
> <br />
oTET.set_option "license=... ..."<br />
> <br />
TET_set_option(tet, "license=... ...");
tet.set_option("license=... ...");<br />
> <br />
tet->set_option("license=... ...");<br />
> <br />
d licensekey s 20<br />
d licenseval s 50<br />
c eval licenseopt='license=... ...'+x'00'<br />
c callp TET_set_option(TET:licenseopt:0)<br />
license <br />
TET_new( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<strong>PDFlib</strong> license file 2.0<br />
# Licensing information for <strong>PDFlib</strong> GmbH products<br />
TET 4.0 ... ... ...1...<br />
TET 4.0 ... ... ...2...<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
ü<br />
www.pdflib.com<br />
• • <br />
• • <br />
sales@pdflib.com<br />
support@pdflib.com
tet.set_option("license=... ...");<br />
> <br />
tet->set_option("license=... ...");<br />
> <br />
d licensekey s 20<br />
d licenseval s 50<br />
c eval licenseopt='license=... ...'+x'00'<br />
c callp TET_set_option(TET:licenseopt:0)<br />
license <br />
TET_new( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<strong>PDFlib</strong> license file 2.0<br />
# Licensing information for <strong>PDFlib</strong> GmbH products<br />
TET 4.0 ... ... ...1...<br />
TET 4.0 ... ... ...2...<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
ü<br />
www.pdflib.com<br />
• • <br />
• • <br />
sales@pdflib.com<br />
support@pdflib.com
1 <br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
> <br />
> <br />
> <br />
> <br />
> <br />
> <br />
<br />
<br />
1.1 TET <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
> <br />
<br />
> <br />
<br />
> <br />
<br />
>
<br />
<br />
> <br />
<br />
> <br />
> <br />
<br />
<br />
> <br />
> <br />
> <br />
> <br />
<br />
> <br />
<br />
> <br />
<br />
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
>
> <br />
> <br />
> <br />
> <br />
> <br />
<br />
<strong>PDFlib</strong> Comprehensive Object System <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
> <br />
<br />
<br />
<br />
<br />
> <br />
<br />
> <br />
<br />
> <br />
<br />
1.2 TET <br />
<br />
<br />
<br />
<br />
<br />
<br />
>
<br />
<br />
> <br />
<br />
> <br />
<br />
> <br />
<br />
1.3 <br />
<br />
<br />
<br />
<br />
<br />
> extractor <br />
<br />
> image_resources <br />
<br />
> dumper <br />
<br />
> fontfilter <br />
<br />
> glyphinfo dropcap <br />
shadow hyphenation <br />
> tetml <br />
<br />
> get_attachments <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> concordance.xsl <br />
> fontfilter.xsl
fontfinder.xsl <br />
<br />
> fontstat.xsl <br />
> index.xsl <br />
> metadata.xsl <br />
<br />
> solr.xsl <br />
> table.xsl <br />
> tetml2html.xsl <br />
> textonly.xsl <br />
<br />
<br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
> <br />
<br />
<br />
> <br />
> <br />
<br />
www.pdflib.com/tet-cookbook<br />
<br />
<br />
www.pdflib.com/pcos-cookbook<br />
<br />
<br />
1.4 TET 4.0 <br />
<br />
> <br />
> <br />
> <br />
<br />
> <br />
> <br />
> <br />
> <br />
>
<br />
<br />
>
2TET<br />
2.1 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
tet [] ...<br />
<br />
--docopt --tetopt --imageopt --pageopt <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
-- <br />
<br />
@filename 1<br />
--docopt<br />
--firstpage<br />
-f<br />
<br />
<br />
last<br />
<br />
<br />
<br />
<br />
<br />
open_document( ) <br />
tetml filename <br />
last <br />
last-1 <br />
--format utf8 | utf16 utf8 <br />
utf8 <br />
utf16 <br />
<br />
<br />
--help, -?
--image 2<br />
-i<br />
--imageloop <br />
<br />
--imageloop pageresource --image <br />
--tetml resource page <br />
page <br />
<br />
<br />
< >_p< >_< >.[tif|jpg|jpx]<br />
resource <br />
<br />
--firstpage --lastpage <br />
<br />
<br />
< >_I< ID>.[tif|jpg|jpx]<br />
I< ID> Image/@id <br />
--imageopt<br />
--lastpage<br />
-l<br />
--outfile<br />
-o<br />
--pageopt<br />
--password,<br />
-p<br />
--searchpath 1<br />
-s<br />
--targetdir<br />
-t<br />
--tetml<br />
-m<br />
<br />
<br />
last<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
glyphword<br />
wordplusline<br />
page<br />
write_image_file( ) <br />
<br />
last <br />
last-1 last<br />
-<br />
<br />
<br />
.pdf .PDF .txt .tetml <br />
<br />
open_page( ) process_page( )<br />
<br />
<br />
granularity page <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
--text <br />
<br />
<br />
<br />
glyph <br />
word <br />
wordplus <br />
ine <br />
page
--tetopt<br />
--text 2<br />
--verbose<br />
-v<br />
--version, -V<br />
<br />
<br />
0123<br />
set_option( ) <br />
outputformat --format <br />
<br />
--tetml <br />
<br />
<br />
0 <br />
1 <br />
2 <br />
3 <br />
<br />
<br />
--image --text --tetml
2.2 TET <br />
<br />
> searchpath <br />
> <br />
<br />
> <br />
<br />
> <br />
--password <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
" <br />
*.pdf <br />
.pdf <br />
*.pdf *.PDF <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
@filename <br />
<br />
<br />
> <br />
<br />
> " <br />
> <br />
> <br />
\" <br />
> \\ <br />
<br />
@filename
> <br />
>
2.3 <br />
<br />
<br />
<br />
2.3.1 <br />
file.pdf file.txt <br />
tet file.pdf<br />
<br />
tet --firstpage 2 --lastpage last-1 file.pdf<br />
tet -f 2 -l last-1 file.pdf<br />
<br />
<br />
tet --searchpath /usr/local/cmaps file.pdf<br />
tet -s /usr/local/cmaps file.pdf<br />
file.utf16 <br />
tet --format utf16 --outfile file.utf16 file.pdf<br />
tet --format utf16 -o file.utf16 file.pdf<br />
*.txt <br />
<br />
tet --targetdir out in/*.pdf<br />
tet -t out in/*.pdf<br />
<br />
<br />
tet --pageopt "includebox={{0 0 200 200}}" file.pdf<br />
<br />
options <br />
<br />
tet @options *.pdf<br />
2.3.2 <br />
file.pdf out file*.tif/<br />
file*.jpg <br />
tet --targetdir out --image file.pdf<br />
tet -t out -i file.pdf<br />
file.pdf out file*.tif/<br />
file*.jpg
tet --targetdir out --image --imageloop resource file.pdf<br />
tet -t out -i --imageloop resource file.pdf<br />
file.pdf <br />
<br />
tet --targetdir out --image --pageopt "imageanalysis={merge={disable}}" file.pdf<br />
tet -t out -i --pageopt "imageanalysis={merge={disable}}" file.pdf<br />
2.3.3 TETML <br />
file.pdf file.tetml <br />
<br />
tet --tetml word file.pdf<br />
tet -m word file.pdf<br />
Options <br />
<br />
tet --docopt "tetml={elements={options=false}}" --tetml word file.pdf<br />
file.tetml <br />
<br />
tet --tetml word --pageopt "tetml={glyphdetails={all}}" file.pdf<br />
tet -m word --pageopt "tetml={glyphdetails={all}}" file.pdf<br />
<br />
tet --image --tetml word file.pdf<br />
tet -i -m word file.pdf<br />
<br />
tet --tetml word --pageopt "topdown={output}" file.pdf<br />
tet -m word --pageopt "topdown={output}" file.pdf<br />
2.3.4 <br />
checkglyphlists <br />
<br />
tet --docopt checkglyphlists file.pdf<br />
<br />
<br />
tet --docopt "fold={{[:blank:] U+0020}}" file.pdf<br />
<br />
tet --pageopt "contentanalysis={punctuationbreaks=false}" file.pdf
3TET <br />
<br />
<br />
<br />
<br />
3.1 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
> <br />
> <br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
delete( ) get_apiname( ) get_errnum( ) get_errmsg( )<br />
<br />
<br />
> <br />
> <br />
> <br />
open_document( )<br />
open_page( )
get_errnum( ) get_errmsg( ) get_apiname( )
3.2 C <br />
<br />
TET_TRY( ) <br />
TET_CATCH( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
> TET_TRY( ) TET_CATCH( ) <br />
> TET_new( ) <br />
TET_new( ) <br />
<br />
> TET_delete( ) <br />
<br />
> <br />
<br />
<br />
<br />
<br />
volatile volatile <br />
<br />
<br />
> <br />
TET_CATCH( ) TET_EXIT_TRY( )<br />
<br />
> <br />
<br />
<br />
<br />
<br />
volatile int pageno;<br />
...<br />
if ((tet = TET_new()) == (TET *) 0)<br />
{<br />
printf("\n");<br />
return(2);<br />
}<br />
TET_TRY(tet)<br />
{<br />
for (pageno = 1; pageno
eturn -1;<br />
}<br />
}<br />
/* API */<br />
}<br />
TET_CATCH(tet)<br />
{<br />
printf(" %d %s() %d : %s\n",<br />
TET_get_errnum(tet), TET_get_apiname(tet), pageno, TET_get_errmsg(tet));<br />
}<br />
TET_delete(tet);<br />
<br />
<br />
length <br />
length <br />
length <br />
<br />
host <br />
ebcdic <br />
<br />
<br />
<br />
<br />
<br />
<br />
> \xEF\xBB\xBF <br />
> \x57\x8B\xAB <br />
<br />
> winansi <br />
ebcdic <br />
TET_utf16_to_utf8( )
3.3 C++ <br />
tetlib.h <br />
tetlib.h<br />
tetlib.h tet.hpp <br />
tet.cpp <br />
TET_<br />
<br />
<br />
<br />
<br />
<br />
> std::wstring <br />
<br />
wstring <br />
<br />
<br />
> <br />
basic_string <br />
<br />
> <br />
<br />
<br />
<br />
<br />
wstring wchar_t wstring <br />
<br />
<br />
L <br />
\u \U <br />
<br />
<br />
<br />
<br />
<br />
> pdflib <br />
<br />
<br />
using namespace pdflib;<br />
> wstring <br />
<br />
L <br />
const wstring pageoptlist = L"granularity=page";
TETTET::Exception get_errmsg( ) <br />
wstring wcerr <br />
<br />
> tet.cpp <br />
<br />
<br />
<br />
<br />
<br />
<br />
> tet.hpp wstring <br />
<br />
#define TETCPP_TET_WSTRING 0<br />
> tet.hpp pdflib <br />
#define TETCPP_USE_PDFLIB_NAMESPACE 0<br />
<br />
try/catch <br />
TET::Exception<br />
<br />
<br />
<br />
<br />
try {<br />
...TET...<br />
} catch (TET::Exception &ex) {<br />
wcerr
3.4 COM <br />
<br />
<br />
<br />
> <br />
...\TET 4.0 32-bit\bind\COM\bin\tet_com.dll <br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
tlbimp.exe <br />
<br />
tlbimp tet_com.dll /namespace:tet_com /out:Interop.tet_com.dll<br />
<br />
tet_com.dll <br />
<br />
using TET_com;<br />
...<br />
static TET_com.ITET tet;<br />
...<br />
tet = New TET();<br />
...
3.5 Java <br />
com.pdflib.TET <br />
<br />
<br />
<br />
<br />
> libtet_java.so libtet_java.jnilib<br />
<br />
<br />
> pdf_tet.dll <br />
<br />
tet.jar tet <br />
tet.jar <br />
CLASSPATH -classpath tet.jar<br />
<br />
<br />
java.library.path <br />
<br />
java -Djava.library.path=. extractor<br />
<br />
System.out.println(System.getProperty("java.library.path"));<br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
String(byte[] bytes)
enc SJIS UTF8 UTF-16 <br />
<br />
String(byte[] bytes, String enc)<br />
enc <br />
<br />
byte[] getBytes(String enc)<br />
<br />
<br />
<br />
<br />
> Javadoc <br />
<br />
> ... <br />
<br />
Java <br />
<br />
TETException <br />
<br />
<br />
TET tet = null;<br />
try {<br />
...TET...<br />
} catch (TETException e) {<br />
System.err.print("TET:\n");<br />
System.err.print("[" + e.get_errnum() + "] " + e.get_apiname() + ": " +<br />
e.get_errmsg() + "\n");<br />
} catch (Exception e) {<br />
System.err.println(e.getMessage());<br />
} finally {<br />
if (tet != null) {<br />
tet.delete(); /* TET */<br />
}<br />
}<br />
throws
3.6 .NET <br />
<br />
<br />
<br />
TET_dotnet.dll<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
bin <br />
bin TETlib_<br />
dotnet.dll <br />
C:\Inetpub\wwwroot\bin\TET_dotnet.dll<br />
C:\Inetpub\wwwroot\WebApplicationX\bin\TET_dotnet.dll<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Full <br />
<br />
<br />
<br />
<br />
<br />
<br />
TET_dotnet.TETException <br />
get_errnum get_errmsg get_apiname
3.7 Perl <br />
<br />
<br />
<br />
<br />
use <br />
<br />
<br />
<br />
<br />
<br />
<br />
tetlib_pl.pm <strong>PDFlib</strong>/TET.pm <br />
@INC -I <br />
<br />
perl -I/path/to/tet extractor.pl<br />
tetlib_pl.so tetlib_pl.bundletetlib_pl.pm <strong>PDFlib</strong>/TET.pm<br />
<br />
<br />
perl -e 'use Config; print $Config{sitearchexp};'<br />
auto/tetlib_pl <br />
<br />
/usr/lib/perl5/site_perl/5.10/i686-linux<br />
<br />
tetlib_pl.dll tetlib_pl.pm<strong>PDFlib</strong>/TET.pm<br />
<br />
<br />
perl -e "use Config; print $Config{sitearchexp};"<br />
<br />
C:\Program Files\Perl5.10\site\lib<br />
eval <br />
<br />
eval {<br />
...TET...<br />
};<br />
die ": $@" if $@;
3.8 PHP <br />
<br />
<br />
<br />
<br />
<br />
<br />
<strong>PDFlib</strong>-in-PHP-HowTo <br />
<br />
<br />
<br />
<br />
> php.ini <br />
extension=libtet_php.dll<br />
extension=libtet_php.so<br />
extension=libtet_php.sl<br />
; Windows <br />
; UnixMac OS X <br />
; HP-UX <br />
php.ini extension_dir <br />
<br />
<br />
<br />
<br />
<br />
tet <br />
<strong>PDFlib</strong> TET Support<br />
enabled<br />
<br />
<br />
> <br />
<br />
dl("libtet_php.dll");<br />
dl("libtet_php.so");<br />
dl("libtet_php.sl");<br />
# Windows <br />
# UnixMac OS X <br />
# HP-UX <br />
<br />
<br />
> <br />
<br />
> <br />
<br />
<br />
try/catch <br />
try {
...TET...<br />
} catch (TETException $e) {<br />
print "TET:\n";<br />
print "[" . $e->get_errnum() . "] " . $e->get_apiname() . ": "<br />
$e->get_errmsg() . "\n";<br />
}<br />
catch (Exception $e) {<br />
print $e;<br />
}
3.9 Python <br />
<br />
<br />
<br />
<br />
<br />
> tetlib_py.so<br />
> tetlib_py.pyd<br />
<br />
<br />
<br />
try:<br />
...TET...<br />
except TETException:<br />
print 'TET!'
3.10 REALbasic <br />
<br />
<br />
TET.rbx <br />
Plugins <br />
TET.framework /Library/Frameworks <br />
<br />
<br />
> <br />
> <br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
> TET <br />
> TETException RuntimeException <br />
<br />
<br />
<br />
<br />
<br />
<br />
TETException <br />
try/<br />
catch <br />
<br />
Exception err As TETException<br />
MsgBox("TE<strong>Text</strong>ractor: [" + _<br />
Str(err.get_errnum()) + "] " + err.get_apiname() + ": " + err.get_errmsg())
3.11 RPG <br />
<br />
/copy <br />
<br />
<br />
%ucs2 <br />
<br />
%char <br />
%CHAR %UCS2 <br />
<br />
<br />
<br />
<br />
length <br />
<br />
<br />
<br />
D <br />
<br />
d/copy QRPGLESRC,TETLIB<br />
<br />
<br />
d/copy tetsrclib/QRPGLESRC,TETLIB<br />
<br />
<br />
<br />
<br />
CRTBNDDIR BNDDIR(TETLIB/TETLIB) TEXT('TETlib Binding Directory')<br />
<br />
<br />
<br />
<br />
ADDBNDDIRE BNDDIR(TETLIB/TETLIB) OBJ((TETLIB/TETLIB *SRVPGM))<br />
CRTBNDRPG <br />
<br />
CRTBNDRPG PGM(TETLIB/EXTRACTOR) SRCFILE(TETLIB/QRPGLESRC) SRCMBR(*PGM) DFTACTGRP(*NO)<br />
BNDDIR(TETLIB/TETLIB)
monitor/on-error/endmon <br />
*PSSR <br />
<br />
<br />
c eval p=TET_new<br />
*<br />
c monitor<br />
*<br />
c callp TET_set_option(tet:globaloptlist)<br />
c eval doc=TET_open_document(tet:%ucs2(%trim(parm1)):docoptlist)<br />
:<br />
:<br />
* Error Handling<br />
c on-error<br />
* Do something with this error<br />
* don't forget to free the TET object<br />
c callp TET_delete(tet)<br />
c endmon
4TET<br />
<br />
<br />
<br />
4.1 Adobe Acrobat TET Plugin<br />
<br />
<br />
<br />
<br />
www.pdflib.com/products/tet-plugin
<br />
<br />
> <br />
> <br />
> <br />
> <br />
> <br />
> <br />
> <br />
<br />
<br />
<br />
> <br />
<br />
> <br />
<br />
> <br />
<br />
> <br />
> <br />
<br />
>
4.2 Lucene TET <br />
<br />
<br />
lucene.apache.org <br />
shrug <br />
<br />
<br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
lucene-core-2.4.0.jar <br />
<br />
> <br />
<br />
<br />
> /connectors/lucene cd<br />
> lucene-core-2.4.0.jar <br />
> TetReader.java <br />
<br />
<br />
<br />
<br />
PdfDocument.java <br />
<br />
> ant index /bind/data <br />
<br />
> ant search <br />
<br />
<br />
<br />
<br />
ant index <br />
devserver (1)$ ant index<br />
Buildfile: build.xml<br />
...<br />
index:<br />
[java] adding ../data/Whitepaper-XMP-metadata-in-<strong>PDFlib</strong>-products.pdf<br />
[java] adding ../data/Whitepaper-PDFA-with-<strong>PDFlib</strong>-products.pdf<br />
[java] adding ../data/FontReporter.pdf<br />
[java] adding ../data/TET-PDF-IFilter-datasheet.pdf
[java] adding ../data/<strong>PDFlib</strong>-datasheet.pdf<br />
[java] 1255 total milliseconds<br />
BUILD SUCCESSFUL<br />
Total time: 2 seconds<br />
devserver (1)$ ant search<br />
Buildfile: build.xml<br />
compile:<br />
search:<br />
[java] Enter query:<br />
<strong>PDFlib</strong><br />
[java] Searching for: pdflib<br />
[java] 5 total matching documents<br />
[java] 1. ../data/<strong>PDFlib</strong>-datasheet.pdf<br />
[java] Title: <strong>PDFlib</strong>, <strong>PDFlib</strong>+PDI, Personalization Server Datasheet<br />
[java] 2. ../data/Whitepaper-PDFA-with-<strong>PDFlib</strong>-products.pdf<br />
[java] Title: Whitepaper: Creating PDF/A with <strong>PDFlib</strong><br />
[java] 3. ../data/FontReporter.pdf<br />
[java] Title: <strong>PDFlib</strong> FontReporter 1.3 Manual<br />
[java] 4. ../data/TET-PDF-IFilter-datasheet.pdf<br />
[java] Title: <strong>PDFlib</strong> TET PDF IFilter Datasheet<br />
[java] 5. ../data/Whitepaper-XMP-metadata-in-<strong>PDFlib</strong>-products.pdf<br />
[java] Title: Whitepaper: XMP Metadata support in <strong>PDFlib</strong> Products<br />
[java] Press (q)uit or enter number to jump to a page.<br />
q<br />
[java] Enter query:<br />
title:FontReporter<br />
[java] Searching for: title:fontreporter<br />
[java] 1 total matching documents<br />
[java] 1. ../data/FontReporter.pdf<br />
[java] Title: <strong>PDFlib</strong> FontReporter 1.3 Manual<br />
[java] Press (q)uit or enter number to jump to a page.<br />
q<br />
[java] Enter query:<br />
BUILD SUCCESSFUL<br />
Total time: 57 seconds<br />
<strong>PDFlib</strong> <br />
title FontReporter <br />
q <br />
build.xml <br />
<br />
<br />
build.properties <br />
windows.properties unix.properties <br />
/tmp <br />
<br />
ant -Dlucene.jar=/tmp/lucene-core-2.4.0.jar index
lucene.apache.org/java/2_4_0/demo3.html <br />
Configuration <br />
configuration.jsp <br />
<br />
/<br />
bind/lucene/index <br />
<br />
<br />
> path <br />
> modified <br />
> contents <br />
> <br />
<br />
<br />
String objType = tet.pcos_get_string(tetHandle, "type:/Info/Subject");<br />
if (!objType.equals("null"))<br />
{<br />
doc.add(new Field("summary", tet.pcos_get_string(tetHandle,<br />
"/Info/Subject"), Field.Store.YES, Field.Index.ANALYZED));<br />
}<br />
> font <br />
PdfDocument.java
4.3 Solr TET <br />
<br />
<br />
<br />
lucene.apache.org/solr <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
shrug <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
solr.xsl glyph <br />
<br />
<br />
_s <br />
<br />
<br />
<br />
<br />
<br />
<strong>PDFlib</strong>-FontReporter-E.pdf<br />
<strong>PDFlib</strong> GmbH<br />
2008-07-08T15:05:39+00:00<br />
FrameMaker 7.0<br />
2008-07-08T15:05:39+00:00<br />
Acrobat Distiller 7.0.5 (Windows)<br />
<strong>PDFlib</strong> FontReporter<br />
<strong>PDFlib</strong> FontReporter 1.3 Manual<br />
<strong>PDFlib</strong><br />
GmbH<br />
Munchen<br />
...
4.4 Oracle TET <br />
<br />
<br />
<br />
<br />
shrug <br />
<br />
<br />
<br />
AL32UTF8 <br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
download.oracle.com/docs/cd/B28359_01/text.111/b28304/cdatadic.htm#sthref497<br />
> <br />
<br />
tetfilter.sh <br />
iconv uconv <br />
<br />
tetfilter.bat <br />
<br />
<br />
> <br />
<br />
connectors/Oracle/tetfilter.sh $ORACLE_HOME/ctx/bin <br />
connectors/Oracle/tetfilter.bat %ORACLE_HOME%\bin <br />
> tetfilter.shtetfilter.bat TETDIR <br />
<br />
> <br />
<br />
<br />
<br />
TETOPT="license=aaaaaaa-bbbbbb-cccccc-dddddd-eeeeee"
HR system <br />
<br />
SQL> GRANT CTXAPP TO HR;<br />
SQL> GRANT EXECUTE ON CTX_CLS TO HR;<br />
SQL> GRANT EXECUTE ON CTX_DDL TO HR;<br />
SQL> GRANT EXECUTE ON CTX_DOC TO HR;<br />
SQL> GRANT EXECUTE ON CTX_OUTPUT TO HR;<br />
SQL> GRANT EXECUTE ON CTX_QUERY TO HR;<br />
SQL> GRANT EXECUTE ON CTX_REPORT TO HR;<br />
SQL> GRANT EXECUTE ON CTX_THES TO HR;<br />
<br />
<br />
> <br />
/connectors/Oracle<br />
> tetsetup_a.sql tetpath <br />
<br />
> sqlplus pdftable_a <br />
tetindex_a <br />
tetsetup_a.sql <br />
<br />
SQL> @tetsetup_a.sql<br />
> <br />
SQL> select * from pdftable_a where CONTAINS(pdffile, 'Whitepaper', 1) > 0;<br />
> <br />
SQL> execute ctx_ddl.sync_index('tetindex_a')<br />
> <br />
<br />
SQL> @tetcleanup_a.sql<br />
<br />
<br />
<br />
tet_pdf_loader <br />
<br />
/Info/Title <br />
length:pages <br />
<br />
> <br />
/connectors/Oracle<br />
> sqlplus pdftable_b <br />
tetindex_b
SQL> @tetsetup_b.sql<br />
> <br />
<br />
<br />
ojdbc14.jar tet_pdf_loader.java <br />
ant <br />
<br />
<br />
localhost <br />
xe HR <br />
<br />
ant -Dtet.jdbc.connection=jdbc:oracle:thin:@localhost:1521:xe<br />
-Dtet.jdbc.user=HR -Dtet.jdbc.password=HR<br />
> <br />
SQL> execute ctx_ddl.sync_index('tetindex_b')<br />
> <br />
SQL> select * from pdftable_b where CONTAINS(pdffile, 'Whitepaper', 1) > 0;<br />
> <br />
<br />
SQL> @tetcleanup_b.sql
4.5 Microsoft TET PDF IFilter<br />
<br />
www.pdflib.com/products/tetpdf-ifilter<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
> <br />
> <br />
> <br />
> <br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
> <br />
> <br />
<br />
<br />
<br />
<br />
>
<br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
<br />
<br />
> <br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
> <br />
<br />
<br />
<br />
<br />
<br />
> <br />
Title Subject Author<br />
> <br />
<br />
>
4.6 MediaWiki TET <br />
<br />
<br />
<br />
www.mediawiki.org/wiki/MediaWiki<br />
shrug <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
<br />
> <br />
<br />
> /connectors/MediaWiki/PDFIndexer.php /extensions/PDFIndexer/PDFIndexer.php <br />
> /resource/<br />
cmap /extensions/<br />
PDFIndexer/resource/cmap <br />
> LocalSettings.php <br />
# PDF <br />
include("extensions/PDFIndexer/PDFIndexer.php");<br />
> /includes/DefaultSettings.php .pdf <br />
<br />
/**<br />
* <br />
* <br />
*/<br />
$wgFileExtensions = array( 'png', 'gif', 'jpg', 'jpeg', 'pdf' );<br />
<br />
PDFIndexer.php
> <br />
> <br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
DebugLogFile <br />
Image <br />
Advanced search <br />
Image <br />
LocalSettings.php <br />
<br />
$wgNamespacesToBeSearchedDefault = array(<br />
NS_MAIN<br />
=> true,<br />
NS_IMAGE<br />
=> true,<br />
}
5 <br />
5.1 PDF <br />
<br />
> <br />
> <br />
<br />
> <br />
<br />
dumper <br />
encrypt/master encrypt/user encrypt/nocopy <br />
pcosmode <br />
<br />
<br />
open_document( ) requiredmode <br />
nocopy <br />
<br />
<br />
<br />
if ((int) tet.pcos_get_number(doc, "pcosmode") == 2 ||<br />
((int) tet.pcos_get_number(doc, "pcosmode") == 1 &&<br />
(int) tet.pcos_get_number(doc, "encrypt/nocopy") == 0))<br />
{<br />
/* */<br />
}<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
> <br />
<br />
> <br />
>
open_document( ) shrug <br />
<br />
<br />
<br />
<br />
> open_document( ) shrug <br />
> open_document( ) <br />
<br />
> open_<br />
document( ) <br />
> nocopy=true<br />
<br />
> nocopy=true <br />
<br />
> shrug true <br />
> pcosmode <br />
<br />
<br />
<br />
<br />
int doc = tet.open_document(filename, "shrug");<br />
...<br />
if ((int) tet.pcos_get_number(doc, "shrug") == 1)<br />
{<br />
/* */<br />
}<br />
else<br />
{<br />
/* */<br />
}
5.2 <br />
<br />
<br />
<br />
<br />
set_option( ) <br />
<br />
Unix PostScript Resource <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
open_document( ) <br />
open_document( ) glyphmapping <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
set_option( ) <br />
<br />
<br />
<br />
<br />
> <br />
> \ <br />
<br />
> <br />
> <br />
> <br />
<br />
> <br />
PS-Resources-1.0
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
searchpath <br />
<br />
<br />
PS-Resources-1.0<br />
searchpath<br />
glyphlist<br />
codelist<br />
encoding<br />
.<br />
searchpath<br />
/usr/local/lib/cmaps<br />
/users/kurt/myfonts<br />
.<br />
glyphlist<br />
myglyphlist=/usr/lib/sample.gl<br />
.<br />
codelist<br />
mycodelist=/usr/lib/sample.cl<br />
.<br />
encoding<br />
myencoding=sample.enc<br />
.<br />
<br />
searchpath <br />
<br />
<br />
searchpath <br />
<br />
<br />
searchpath <br />
<br />
<br />
searchpath <br />
<br />
HKLM\SOFTWARE\<strong>PDFlib</strong>\TET4\4.0\SearchPath<br />
HKLM\SOFTWARE\<strong>PDFlib</strong>\TET4\SearchPath<br />
HKLM\SOFTWARE\<strong>PDFlib</strong>\SearchPath<br />
<br />
SearchPath
C:\Program Files\<strong>PDFlib</strong>\TET 4.0 32bit\resource<br />
C:\Program Files\<strong>PDFlib</strong>\TET 4.0 32bit\resource\cmap<br />
searchpath <br />
/<strong>PDFlib</strong>/TET/4.0/resource/icc<br />
/<strong>PDFlib</strong>/TET/4.0/resource/fonts<br />
/<strong>PDFlib</strong>/TET/4.0/resource/cmap<br />
/<strong>PDFlib</strong>/TET/4.0<br />
/<strong>PDFlib</strong>/TET<br />
/<strong>PDFlib</strong><br />
searchpath <br />
set_<br />
option( ) <br />
<br />
<br />
> TETRESOURCEFILE <br />
<br />
<br />
> TETRESOURCEFILE <br />
<br />
upr (MVS )<br />
/tet/4.0/tet.upr (iSeries )<br />
tet.upr (WindowsUnix )<br />
<br />
> <br />
HKLM\SOFTWARE\<strong>PDFlib</strong>\TET\4.0\resourcefile<br />
< <br />
>/tet.upr <br />
<br />
<br />
> <br />
resourcefile <br />
set_option("resourcefile=/ / /tet.upr");<br />
<br />
set_option( )<br />
<br />
<br />
<br />
set_option("glyphlist={myglyphnames=/usr/local/glyphnames.gl}");
set_option( ) <br />
<br />
<br />
<br />
<br />
<br />
\ <br />
> \x 0 9 A F a f \x0D<br />
> \nnn 0 7 \015 \000 <br />
> \\ <br />
>
5.3 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
open_page( ) <br />
<br />
> docstyle=searchengine<br />
<br />
<br />
> skipengines={image}<br />
<br />
<br />
> contentanalysis={merge=0}<br />
<br />
<br />
<br />
<br />
> contentanalysis={dehyphenate=false}<br />
<br />
<br />
> contentanalysis={shadowdetect=false}<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
open_page( ) granularity=word get_text( ) <br />
<br />
> get_text( ) <br />
open_page( ) granularity=page <br />
<br />
<br />
> <br />
open_page( ) contentanalysis={lineseparator=U+0020} granularity=page <br />
get_text( )
open_page( ) granularity=word <br />
> <br />
contentanalysis={punctuationbreaks=false} <br />
<br />
<br />
> get_char_info( ) <br />
<br />
get_text( ) <br />
> open_page( ) includebox <br />
excludebox <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
unknownchar=?<br />
<br />
fold={{[:Private_Use:] remove} {[U+FFFD] remove} default}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> open_page( ) granularity=word <br />
> open_<br />
document( ) password <br />
<br />
shrug
get_char_info( ) <br />
get_text( ) <br />
<br />
glyph wordplus Glyph <br />
<br />
unknown="true"<br />
unknownchar <br />
unknown <br />
<br />
> <br />
<br />
ignoreinvisibletext=true<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
> get_char_info( ) uv<br />
char_info <br />
get_text( ) <br />
> open_page( )granularity=glyphword <br />
granularity=glyph
checkglyphlists=true
6 <br />
6.1 PDF <br />
<br />
<br />
<br />
get_text( ) get_image( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
> <br />
<br />
<br />
<br />
Plug-Ins <strong>PDFlib</strong> TET Plugin... TET Find <br />
<br />
> Acrobat <br />
PDF<br />
<br />
> extractor <br />
> /TET/Document/Pages/Page<br />
<br />
> ...<br />
> <br />
> <br />
<br />
<br />
<br />
> dumper <br />
> /TET/Document/DocInfo<br />
<br />
<br />
> ... <br />
<br />
>
dumper <br />
> /TET/Document/DocInfo/Custom<br />
<br />
<br />
> ... ...<br />
<br />
> <br />
> <br />
<br />
XMP <br />
<br />
> dumper <br />
> /TET/Document/Metadata
TouchUp <br />
... <br />
<br />
> <br />
> image_metadata<br />
> /TET/Document/Pages/Resources/Images/Image/Metadata<br />
<br />
<br />
<br />
> <br />
> ...<br />
> <br />
> fields<br />
> <br />
<br />
<br />
<br />
<br />
> <br />
> <br />
<br />
> <br />
<br />
<br />
> annotations<br />
> <br />
<br />
<br />
<br />
> <br />
> <br />
<br />
> <br />
<br />
<br />
<br />
> bookmarks<br />
>
> <br />
<br />
<br />
> get_attachments <br />
> /TET/Document/Attachments/Attachment/Document<br />
<br />
<br />
<br />
<br />
> <br />
<br />
> <br />
PDF <br />
> <br />
> get_attachments <br />
> /TET/Document/Attachments/Attachment/Document<br />
<br />
<br />
<br />
> <br />
... ... <br />
... <br />
<br />
<br />
<br />
> <br />
<br />
> <br />
> dumper <br />
> /TET/Document/@pdfa /TET/Document/@pdfe /TET/Document/<br />
@pdfx
6.2 <br />
<br />
<br />
CropBox <br />
MediaBox Rotate <br />
<br />
<br />
1 pt = 1 inch / 72 = 25.4 mm / 72 = 0.3528 mm<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
y<br />
<br />
<br />
<br />
topdown <br />
<br />
<br />
> <br />
<br />
> <br />
<br />
... <br />
<br />
<br />
<br />
<br />
<br />
open_page( ) clippingarea <br />
<br />
unlimited <br />
cropbox <br />
<br />
open_page( )includebox<br />
excludebox
includebox <br />
excludebox <br />
<br />
<br />
<br />
<br />
get_char_info( ) <br />
<br />
<br />
> uv <br />
<br />
<br />
uv <br />
<br />
uv <br />
uv <br />
uv <br />
> type
width<br />
(x, y)<br />
beta<br />
fontsize<br />
baseline<br />
fontsize<br />
(x, y)<br />
alpha<br />
width<br />
<br />
<br />
<br />
<br />
<br />
<br />
(x, y) width <br />
uv <br />
alpha (x, y) width <br />
° fontsize <br />
> unknown <br />
unknownchar <br />
<br />
unknownchar <br />
<br />
<br />
> <br />
<br />
> (x, y) <br />
<br />
<br />
(x, y) <br />
y topdown <br />
> width <br />
<br />
width<br />
width <br />
width <br />
<br />
width
font size<br />
capheight<br />
ascender<br />
baseline<br />
descender<br />
<br />
> alpha <br />
° ° <br />
alpha <br />
° alpha beta topdown <br />
<br />
> beta <br />
alpha <br />
° beta ° <br />
<br />
> fontid <br />
<br />
<br />
<br />
> fontsize <br />
<br />
> textrendering <br />
<br />
<br />
open_page( ) ignoreinvisibletext <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
T H x <br />
x <br />
f d j p
x <br />
<br />
<br />
<br />
/* */<br />
path = "fonts[" + i + "]/ascender";<br />
System.out.println("=" + p.pcos_get_number(doc, path));<br />
path = "fonts[" + i + "]/descender";<br />
System.out.println("=" + p.pcos_get_number(doc, path));<br />
get_<br />
char_info( ) get_char_info( ) <br />
fonts[] <br />
<br />
FontDescriptor <br />
<br />
<br />
get_char_info x, y width alpha <br />
<br />
<br />
x end = lrx = x + * cos(alpha)<br />
y end = lry = y + * sin(alpha)<br />
alpha <br />
x end = lrx = x + <br />
y end = lry = y<br />
<br />
beta <br />
<br />
urx = x + * cos(alpha) - * * sin(alpha)<br />
ury = y + * sin(alpha) + * * cos(alpha)<br />
topdown=true =-1 topdown=false =1 <br />
<br />
<br />
<br />
= * / 1000<br />
<br />
translate(x,y);<br />
rotate(alpha);<br />
skew(0, -beta);
if (abs(beta) > 90)<br />
scale(1 -1);<br />
<br />
urx = x + <br />
ury = y + * <br />
<br />
<br />
x end = x<br />
y end = y - <br />
<br />
<br />
ulx = x - /2 * cos(alpha)<br />
uly = y - /2 * sin(alpha)<br />
lrx = ulx + * cos(alpha) + * * sin(alpha)<br />
lry = uly + * sin(alpha) - * * cos(alpha)<br />
topdown=true =-1 topdown=false =1
6.3 <br />
6.3.1 CMap<br />
<br />
<br />
<br />
> Adobe-Japan1 6<br />
> Adobe-CNS1 5<br />
> Adobe-GB1 5<br />
> Adobe-Korea1 2<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
6.3.2 <br />
<br />
<br />
> <br />
<br />
<br />
> alpha ° <br />
alpha=0° ° <br />
> <br />
<br />
<br />
count = p.pcos_get_number(doc, "length:fonts");<br />
for (i=0; i < count; i++)<br />
{<br />
if (p.pcos_get_number(doc, "fonts[" + id + "]/vertical"))<br />
{<br />
/* */<br />
vertical = true;<br />
}<br />
}<br />
> <br />
<br />
<br />
decompose={vertical=_none}
6.3.3 narrow wide vertical <br />
<br />
<br />
<br />
wide narrow <br />
decompose <br />
<br />
decompose={wide=_none narrow=_none}<br />
small square vertical <br />
wide narrow <br />
<br />
<br />
<br />
decompose={none}<br />
<br />
<br />
decompose <br />
<br />
narrow <br />
<br />
small<br />
<br />
<br />
<br />
square <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
U+30F2<br />
<br />
U+002C<br />
<br />
<br />
U+30AD U+30ED<br />
<br />
<br />
U+FF66<br />
<br />
U+FE50<br />
<br />
U+3314
6.4 <br />
<br />
<br />
<br />
<br />
6.4.1 <br />
<br />
word <br />
<br />
<br />
contentanalysis={bidi=logical}<br />
<br />
contentanalysis={bidi=visual}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
bidilevel <br />
<br />
contentanalysis={bidilevel=rtl}<br />
get_char_info( ) Glyph <br />
<br />
<br />
<br />
<br />
<br />
<br />
6.4.2 <br />
<br />
<br />
<br />
<br />
decompose
decompose <br />
<br />
decompose <br />
<br />
decompose=none<br />
<br />
decompose=<br />
{final=_all medial=_all initial=_all isolated=_all}<br />
<br />
decompose=<br />
{final=_none medial=_none initial=_none isolated=_none}<br />
<br />
U+FEB2<br />
<br />
U+FEB3<br />
<br />
U+FD0E<br />
<br />
U+FEB4<br />
<br />
U+FEB2<br />
<br />
U+FEB3<br />
<br />
U+FD0E<br />
<br />
U+FEB4<br />
<br />
U+0633<br />
<br />
U+0633<br />
<br />
U+0633 U+0631<br />
<br />
U+0633<br />
<br />
U+FEB2<br />
<br />
U+FEB3<br />
<br />
U+FD0E<br />
<br />
U+FEB4<br />
<br />
<br />
<br />
<br />
fold <br />
<br />
<br />
fold <br />
<br />
fold <br />
fold={{[U+0640] remove}} <br />
fold={default}<br />
<br />
fold={{[U+0640] preserve}}<br />
<br />
U+0640<br />
<br />
U+0640<br />
<br />
<br />
U+0640
6.5 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
> <br />
> <br />
> <br />
> <br />
<br />
<br />
open_page( ) granularity get_text( ) <br />
<br />
> granularity=glyph <br />
<br />
<br />
<br />
<br />
<br />
<br />
> granularity=word <br />
<br />
<br />
<br />
<br />
<br />
> granularity=line
granularity=page <br />
<br />
<br />
granularity=word TET_get_text( ) <br />
<br />
<br />
open_page( ) wordseparator lineseparator <br />
<br />
lineseparator==U+000A<br />
granularity=glyph <br />
<br />
<br />
glyph <br />
<br />
<br />
> <br />
<br />
<br />
<br />
> <br />
<br />
open_<br />
page( ) punctuationbreaks false <br />
<br />
contentanalysis={punctuationbreaks=false}<br />
<br />
<br />
<br />
<br />
punctuationbreaks=true <br />
<br />
punctuationbreaks=false
open_page( ) <br />
contentanalysis={dehyphenate=false}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
open_page( ) <br />
<br />
contentanalysis={shadowdetect=false}
ä <br />
a ¨
6.6 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
open_page( ) <br />
docstyle=papers<br />
docstyle <br />
<br />
> book <br />
> business <br />
> fancy <br />
> forms <br />
> generic <br />
> magazines <br />
<br />
> papers <br />
> science <br />
<br />
> searchengine <br />
<br />
<br />
<br />
> spacegrid<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
structureanalysis <br />
layoutanalysis <br />
structureanalysis={list=true bullets={{fontname=ZapfDingbats}}}<br />
layoutanalysis = {layoutrowhint={full separation=preservecolumns}}<br />
layoutdetect=2<br />
layouteffort=high
docstyle=book docstyle=business docstyle=fancy<br />
docstyle=magazines docstyle=papers docstyle=science<br />
docstyle=spacegrid
<br />
<br />
<br />
<br />
5<br />
<br />
<br />
<br />
.<br />
<br />
<br />
<br />
REFERENCES<br />
<br />
<br />
<br />
<br />
<br />
...<br />
7 Unicode <br />
7.1 Unicode <br />
<br />
<br />
<br />
www.unicode.org<br />
<br />
<br />
> <br />
<br />
<br />
>
BMP <br />
<br />
<br />
> PUA <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> UTF-8 <br />
<br />
<br />
 à <br />
> UTF-16 <br />
<br />
<br />
<br />
<br />
<br />
> UTF-32 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
þÿ<br />
ÿþ<br />
þÿ<br />
ÿþ
decompose <br />
<br />
<br />
get_text( ) <br />
get_char_info( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
> lineseparator/wordseparator
7.2 Unicode <br />
<br />
<br />
<br />
granularity=word <br />
7.2.1 <br />
<br />
<br />
<br />
fontsizerange <br />
<br />
<br />
<br />
fontsizerange={10 50}<br />
textrendering=3 <br />
<br />
<br />
<br />
textrendering=3 <br />
<br />
<br />
get_char_info( ) TET_char_info textrendering <br />
<br />
Glyph/@textrendering <br />
<br />
ignoreinvisibletext=true<br />
7.2.2 word <br />
granularity=word line page <br />
<br />
<br />
TET_char_info <br />
attributes <br />
Glyph/@hyphenation <br />
<br />
contentanalysis={dehyphenate=false}<br />
<br />
get_char_info( ) Glyph
contentanalysis={keephyphenglyphs=true}<br />
get_char_info( ) TET_char_info attributes TET_ATTR_<br />
DEHYPHENATION_ARTIFACT <br />
Glyph/@dehyphenation artifact <br />
<br />
<br />
<br />
TET_char_info attributes <br />
Glyph/@shadow<br />
<br />
<br />
contentanalysis={shadowdetect=false}<br />
<br />
<br />
<br />
<br />
unknownchar <br />
<br />
fold <br />
<br />
<br />
fold={{[:Private_Use:] remove} {[U+FFFD] remove} default}<br />
<br />
TET_char_info unknown <br />
Glyph/@unknown
7.3 Unicode <br />
<br />
<br />
<br />
> fold <br />
<br />
<br />
> decompose <br />
<br />
<br />
> normalize <br />
<br />
<br />
7.3.1 Unicode <br />
<br />
<br />
> <br />
> <br />
> <br />
<br />
<br />
TET_char_info <br />
<br />
<br />
<br />
fold <br />
<br />
fold <br />
fold <br />
fold={ {[:blank:] U+0020} } fold={ {_dehyphenation remove} }<br />
!<br />
<br />
fold={ {[:blank:] U+0020 } {_dehyphenation remove} }<br />
<br />
fold open_document( )
fold <br />
<br />
<br />
<br />
<br />
fold={{[^U+0020-U+00FF] remove}}<br />
<br />
fold={{[:Alphabetic=No:] remove}}<br />
<br />
U+0104<br />
<br />
U+0037<br />
<br />
U+0041<br />
<br />
<br />
<br />
U+0041<br />
<br />
fold={{[^[:General_Category=Decimal_Number:]] remove}}<br />
-<br />
<br />
<br />
fold={{[:Private_Use:] remove} {[U+FFFD] remove} default}<br />
<br />
fold={{[:General_Category=Dash_Punctuation:] remove}}<br />
<br />
U+0037<br />
<br />
U+0041<br />
<br />
U+FFFF<br />
<br />
U+002D<br />
<br />
U+0037<br />
<br />
<br />
<br />
<br />
fold={{[:Bidi_Control:] remove}}<br />
U+200E<br />
<br />
<br />
<br />
fold={{[:blank:] U+0020}}<br />
<br />
U+00A0<br />
<br />
U+0020<br />
<br />
fold={{[:Dash:] U+002D}}<br />
<br />
fold={{[:Unassigned:] U+FFFD}}<br />
<br />
<br />
<br />
<br />
_dehyphenation <br />
fold={{_dehyphenation preserve}}<br />
<br />
fold={{[U+0640] preserve}}<br />
<br />
<br />
fold={ {[U+2018] U+0027} {[U+2019] U+0027} {[U+201C] U+0022} {[U+201D]<br />
U+0022}}<br />
<br />
U+2011<br />
<br />
U+03A2<br />
<br />
U+002D<br />
<br />
U+0640<br />
<br />
U+201C<br />
<br />
U+002D<br />
<br />
U+FFFD<br />
<br />
U+002D<br />
<br />
U+0640<br />
<br />
U+002D U+0022
granularity=glyph <br />
<br />
default <br />
<br />
fold={ {_dehyphenation preserve} default }<br />
fold <br />
default <br />
fold <br />
<br />
<br />
fold={{[:blank:] U+0020}}<br />
<br />
U+00A0<br />
<br />
U+0020<br />
<br />
unknownchar <br />
fold={{[:Private_Use:] unknownchar}}<br />
<br />
fold={{_dehyphenation remove}}<br />
<br />
fold={{[U+0640] remove}}<br />
<br />
<br />
<br />
fold={{[:Control:] remove} {[:Unassigned:] remove}}<br />
<br />
U+E001<br />
<br />
U+002D<br />
<br />
U+0640<br />
<br />
<br />
U+000C U+03A2<br />
<br />
U+FFFD
7.3.2 Unicode <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G729<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
U+00C4<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
U+0041 U+0308<br />
<br />
U+00C4 U+2261<br />
decompose <br />
<br />
<br />
U+00C4 U+2261<br />
<br />
canonical 1 <br />
<br />
<br />
U+00C0<br />
U+0041 U+0300<br />
<br />
U+F9F4<br />
<br />
U+2126<br />
<br />
U+3070<br />
<br />
U+FB2F<br />
<br />
U+6797<br />
<br />
U+03A9<br />
<br />
<br />
<br />
<br />
<br />
U+2126 U+306F U+2126 U+306F U+3099<br />
<br />
<br />
U+05D0 U+05B8<br />
1. www.unicode.org/Public/5.2.0/charts/
U+0633<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
decompose <br />
decompose <br />
<br />
<br />
<br />
<br />
decompose open_document( ) <br />
decompose <br />
<br />
<br />
decompose={none}<br />
<br />
U+FEB2<br />
<br />
U+FEB4<br />
<br />
U+FEB3<br />
<br />
U+00C4 U+2248<br />
<br />
<br />
decompose={wide=_none narrow=_none}<br />
<br />
decompose={canonical=_all}<br />
circle <br />
<br />
decompose={none circle=_all}<br />
<br />
<br />
decompose={circle=_all}
decompose <br />
<br />
U+00C4 U+2248<br />
<br />
circle <br />
<br />
<br />
U+3251<br />
U+0032 U+0031<br />
compat 1<br />
final<br />
font<br />
fraction 1<br />
initial<br />
isolated<br />
medial<br />
narrow<br />
nobreak<br />
none<br />
small<br />
square<br />
sub 1<br />
super 1<br />
vertical<br />
wide<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
decompose <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
U+FB01<br />
<br />
U+FEB2<br />
<br />
U+2102<br />
<br />
U+00BC<br />
<br />
U+FEB3<br />
<br />
U+FD0E<br />
<br />
U+FEB4<br />
<br />
U+FF66<br />
<br />
U+00A0<br />
<br />
<br />
U+FE50<br />
<br />
U+3314<br />
<br />
U+2081<br />
<br />
U+00AA<br />
<br />
U+2122<br />
<br />
U+FE37<br />
<br />
U+FFE1<br />
<br />
<br />
<br />
<br />
U+0066 U+0069<br />
<br />
U+0633<br />
<br />
U+0043<br />
<br />
<br />
<br />
U+0031 U+2044 U+0034<br />
<br />
U+0633<br />
<br />
<br />
U+0633 U+0631<br />
<br />
U+0633<br />
<br />
U+30F2<br />
<br />
U+0020<br />
<br />
U+002C<br />
<br />
<br />
U+30AD U+30ED<br />
<br />
U+0031<br />
<br />
U+0061<br />
<br />
<br />
U+0054 U+004D<br />
<br />
U+007B<br />
<br />
U+00A3
fraction <br />
_all <br />
<br />
<br />
granularity=glyph <br />
<br />
decompose <br />
<br />
canonical<br />
compat<br />
fraction<br />
sub<br />
super<br />
all others<br />
<br />
canonical={[U+0374 U+037E U+0387 U+1FBE U+1FEF U+1FFD U+2000 U+2001 U+2126 U+212A<br />
U+212B U+2329-U+232A]}<br />
<br />
_all <br />
U+00C4<br />
<br />
compat={[U+FB00-U+FB17]}<br />
<br />
_all <br />
U+0132<br />
<br />
fraction=_none<br />
<br />
<br />
<br />
<br />
<br />
<br />
U+0039 U+00BD<br />
U+0039 U+0031 U+2044 U+0032<br />
<br />
sub={[U+208A-U+208E]}<br />
super={[U+207A-U+207E]}<br />
<br />
fraction <br />
<br />
<br />
U+2122<br />
U+0054 U+004D<br />
<br />
circle=_all final=_all ... vertical=_all wide=_all
7.3.3 Unicode <br />
<br />
<br />
<br />
> <br />
> <br />
> <br />
> <br />
<br />
www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G21796 www.unicode.org/reports/tr15/<br />
<br />
normalize<br />
<br />
normalize=nfc<br />
decompose normalize <br />
normalize none <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
U+00C4<br />
<br />
U+00C4<br />
<br />
U+0041 U+0308<br />
<br />
U+00C4<br />
<br />
U+0041 U+0308<br />
<br />
U+0041 U+0308<br />
<br />
U+00C4<br />
<br />
U+0041 U+0308<br />
<br />
U+00C4<br />
<br />
U+0041 U+0308<br />
<br />
U+0308 U+0041<br />
<br />
U+0308 U+0041<br />
<br />
U+0308 U+0041<br />
<br />
U+0308 U+0041<br />
<br />
U+0308 U+0041<br />
<br />
U+FB01<br />
<br />
U+FB01<br />
<br />
U+FB01<br />
<br />
<br />
U+0066 U+0069<br />
<br />
U+0066 U+0069<br />
<br />
<br />
U+0033 U+2075<br />
<br />
<br />
U+0033 U+2075<br />
<br />
<br />
U+0033 U+2075<br />
<br />
<br />
U+0033 U+0035<br />
<br />
U+0033 U+0035<br />
<br />
U+212B<br />
<br />
U+00C5<br />
<br />
U+0041 U+030A<br />
<br />
U+00C5<br />
<br />
U+0041 U+030A<br />
<br />
U+2122<br />
<br />
U+2122<br />
<br />
U+2122<br />
<br />
<br />
U+0054 U+004D<br />
<br />
U+0054 U+004D<br />
<br />
U+2163<br />
<br />
U+2163<br />
<br />
U+2163<br />
<br />
<br />
U+0049 U+0056<br />
<br />
U+0049 U+0056
U+FB48<br />
<br />
<br />
U+05E8 U+05BC<br />
<br />
<br />
U+05E8 U+05BC<br />
<br />
<br />
U+05E8 U+05BC<br />
<br />
U+05E8 U+05BC<br />
<br />
U+AC00<br />
<br />
U+AC00<br />
<br />
<br />
U+1100 U+1161<br />
<br />
U+AC00<br />
<br />
U+1100 U+1161<br />
<br />
U+FB48 U+3062<br />
<br />
U+FB48 U+3062<br />
<br />
<br />
U+3061 U+3099<br />
<br />
U+FB48 U+3062<br />
<br />
U+3061 U+3099<br />
<br />
U+32C9<br />
<br />
U+32C9<br />
<br />
U+32C9<br />
<br />
<br />
<br />
U+0031 U+0030 U+6708<br />
<br />
U+0031 U+0030 U+6708
7.4 <br />
U+FFFF<br />
<br />
U+1DXXX <br />
U+20000 <br />
<br />
<br />
get_char_info( ) uv <br />
<br />
<br />
get_text( )
7.5 Unicode <br />
<br />
<br />
<br />
Unicode <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
fold <br />
<br />
<br />
<br />
<br />
fold={ {[:Private_Use:] remove} }<br />
get_char_info( ) unknown <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
open_document( ) glyphmapping <br />
<br />
<br />
<br />
<br />
> forceencoding <br />
WinAnsiEncoding MacRomanEncoding <br />
<br />
> codelist tounicodecmap <br />
codelist
glyphlist <br />
<br />
> glyphrule <br />
encodinghint <br />
<br />
> <br />
encodinghint <br />
glyphrule encoding <br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
> <br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
1. <strong>PDFlib</strong> FontReporter Plugin www.pdflib.com/products/fontreporter
> <br />
> <br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
>
x 0x <br />
<br />
.cl <br />
codelist <br />
mycodelist>.gl mycodelist> <br />
searchpath <br />
<br />
.cl <br />
<br />
name <br />
set_option("codelist {name name.cl}");<br />
<br />
<br />
<br />
a b c d e <br />
<br />
% GlobeLogosOneUnicode<br />
x61 x0054 x0068 x0065 x0020 % The<br />
x62 x0042 x006F % Bo<br />
x63 x0073 x0074 x006F x006E x0020 % ston<br />
x64 x0047 x006C x006F % Glo<br />
x65 x0062 x0065 % be<br />
open_document( ) <br />
GlobeLogosOne.cl <br />
<br />
glyphmapping {{fontname=GlobeLogosOne codelist=GlobeLogosOne}}
cmap <br />
cmap <br />
Warnock open_<br />
document( ) <br />
glyphmapping {{fontname=Warnock* tounicodecmap=warnock}}<br />
<br />
<br />
<br />
<br />
<br />
> <br />
> <br />
<br />
\% <br />
<br />
> <br />
<br />
> <br />
x 0x <br />
<br />
> <br />
<br />
.gl <br />
glyphlist <br />
myglyphlist>.gl myglyphlist> <br />
searchpath <br />
<br />
.gl <br />
<br />
name <br />
set_option("glyphlist {name name.gl}");<br />
<br />
<br />
% TeXUnicode<br />
precedesequal<br />
similarequal<br />
negationslash<br />
union<br />
prime<br />
0x227C<br />
0x2243<br />
0x2044<br />
0x222A<br />
0x2032<br />
1. partners.adobe.com/public/developer/en/acrobat/5411.ToUnicode.pdf
CMSY open_<br />
document( ) <br />
glyphmapping {{fontname=CMSY* glyphlist=tarski}}<br />
<br />
<br />
<br />
<br />
G00, G01, G02, <br />
<br />
<br />
<br />
open_document( ) encodinghint <br />
<br />
<br />
encodinghint= cp1250 <br />
<br />
<br />
open_document( ) glyphmapping fontname<br />
glyphrule <br />
> fontname <br />
> prefix <br />
> base <br />
> encoding <br />
T1, T2, T3, c00, c01, c02, , cFF <br />
00, , FF <br />
open_document( ) <br />
<br />
glyphmapping {{fontname=T* glyphrule={prefix=c base=hex encoding=winansi} }}<br />
<br />
<br />
<br />
<br />
fontoutline <br />
<br />
<br />
open_document( ) fontoutline <br />
<br />
WarnockPro <br />
<br />
TET_set_option("fontoutline {WarnockPro WarnockPro.otf}");
8 <br />
8.1 <br />
<br />
<br />
> .tif <br />
<br />
<br />
<br />
<br />
<br />
> .jpg DCTDecode <br />
<br />
<br />
<br />
> .jpx JPXDecode <br />
<br />
<br />
<br />
> write_image_file( ) <br />
filename <br />
<br />
> get_image_data( ) <br />
<br />
<br />
<br />
<br />
<br />
Image/@extractedAs<br />
<br />
<br />
int imageType = tet.write_image_file(doc, tet.imageid, "typeonly");<br />
/* */<br />
String imageFormat;<br />
switch (imageType) {<br />
case 10:<br />
imageFormat = "TIFF";<br />
break;<br />
case 20:<br />
imageFormat = "JPEG";<br />
break;
case 30:<br />
imageFormat = "JPEG2000";<br />
break;<br />
case 40:<br />
imageFormat = "RAW";<br />
break;<br />
default:<br />
System.err.println("write_image_file() "<br />
+ imageType + ", , : "<br />
+ tet.get_errmsg());<br />
}<br />
<br />
<br />
www.pdflib.com/knowledge-base/xmp-metadata/<br />
<br />
<br />
write_<br />
image_file( ) get_image_data( ) keepxmp <br />
false <br />
<br />
image_metadata
8.2 <br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
> <br />
> <br />
> <br />
<br />
<br />
images[ ]/mergetype
imageanalysis={merge={disable}}<br />
open_page( ) <br />
<br />
> images[ ] length:images <br />
<br />
<br />
length:images <br />
images[ ]/mergetype <br />
artificial <br />
> images[ ] <br />
images[ ] <br />
images[ ]/mergetype consumed <br />
<br />
<br />
<br />
> <br />
> <br />
<br />
<br />
image_count <br />
<br />
No of raw image resources before merging: 82<br />
No of placed images: 12<br />
No of images after merging (all types): 83<br />
normal images: 1<br />
artificial (merged) images: 1<br />
consumed images: 81<br />
No of relevant (normal or artificial) image resources: 2<br />
<br />
<br />
imageanalysis<br />
smallimages maxarea maxcount <br />
<br />
<br />
imageanalysis={smallimages={disable}}
8.3 <br />
<br />
> <br />
<br />
<br />
<br />
<br />
PlacedImage <br />
> <br />
<br />
<br />
<br />
Image <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
get_image_info( ) imageid <br />
PlacedImage/@image <br />
<br />
get_image_info( ) <br />
imageid <br />
Image/@id <br />
<br />
<br />
<br />
< >_p< >_< >.<br />
[tif|jpg|jpx]<br />
< >_I< ID>.<br />
[tif|jpg|jpx]
8.4 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
--imageloop page <br />
images_per_page images_in_memory <br />
images_per_page <br />
<br />
<br />
get_image_info( ) <br />
imageid pcos_get_number( ) <br />
<br />
write_image_file( ) get_image_data( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
--imageloop resource <br />
image_resources <br />
<br />
<br />
<br />
<br />
<br />
<br />
pcos_get_number( ) <br />
length:images <br />
mergetype <br />
<br />
<br />
write_image_file( ) get_image_data( )
8.5 <br />
get_image_info( ) <br />
image_info <br />
> x y <br />
<br />
y topdown<br />
<br />
> width height <br />
<br />
> alpha alpha<br />
alpha <br />
alpha alpha beta topdown <br />
<br />
> beta alpha <br />
beta beta <br />
beta beta<br />
beta abs(beta) <br />
<br />
> imageid <br />
write_image_file( ) get_image_data( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
while (tet.get_image_info(page) == 1) {<br />
String imagePath = "images[" + tet.imageid + "]";<br />
int width = (int) tet.pcos_get_number(doc, imagePath + "/Width");<br />
int height = (int) tet.pcos_get_number(doc, imagePath + "/Height");<br />
double xDpi = 72 * width / tet.width;<br />
height<br />
<br />
<br />
(x, y)<br />
alpha<br />
width
}<br />
double yDpi = 72 * height / tet.height;<br />
...<br />
<br />
<br />
determine_image_resolution
8.6 <br />
<br />
> <br />
> <br />
<br />
> <br />
<br />
<br />
<br />
<br />
> <br />
> <br />
<br />
> <br />
<br />
<br />
<br />
> <br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
> <br />
write_image_<br />
file( ) <br />
> <br />
<br />
> <br />
>
8.6 <br />
<br />
> <br />
> <br />
<br />
> <br />
<br />
<br />
<br />
<br />
> <br />
> <br />
<br />
> <br />
<br />
<br />
<br />
> <br />
<br />
> <br />
<br />
<br />
> <br />
<br />
<br />
> <br />
write_image_<br />
file( ) <br />
> <br />
<br />
> <br />
>
9TET TETML<br />
9.1 TETML <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
--tetml <br />
file.tetml <br />
tet --tetml word file.pdf<br />
<br />
<br />
<br />
<br />
tetml <br />
<br />
<br />
<br />
<br />
tetml <br />
<br />
<br />
www.unicode.org/reports/tr16 <br />
<br />
> <br />
> <br />
> <br />
> <br />
> <br />
> <br />
> <br />
>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<strong>PDFlib</strong> GmbH<br />
2008-07-08T15:05:39+00:00<br />
FrameMaker 7.0<br />
2008-09-30T23:15:19+02:00<br />
Acrobat Distiller 7.0.5 (Windows)<br />
<strong>PDFlib</strong> FontReporter<br />
<strong>PDFlib</strong> FontReporter 1.3 Manual<br />
<br />
<br />
<br />
<br />
...XMP...<br />
<br />
<br />
<br />
tetml={} <br />
<br />
<br />
tetml={} granularity=word <br />
<br />
<br />
<br />
<strong>PDFlib</strong><br />
<br />
<br />
<br />
GmbH<br />
<br />
<br />
<br />
Munchen<br />
<br />
<br />
......<br />
<br />
<br />
<br />
<br />
<br />
<br />
......<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Glyph <br />
<br />
<br />
<strong>PDFlib</strong><br />
<br />
P<br />
D<br />
F<br />
l<br />
i<br />
b<br />
<br />
<br />
<br />
GmbH<br />
<br />
G<br />
m<br />
b<br />
H<br />
<br />
<br />
<br />
Munchen<br />
<br />
M<br />
u<br />
n
c<br />
h<br />
e<br />
n<br />
<br />
9.2 TETML <br />
<br />
<br />
<br />
<br />
<br />
> glyph <br />
<br />
<br />
> word Box <br />
<br />
<br />
<br />
Line <br />
tetml <br />
> wordplus word <br />
topdown <br />
wordplus <br />
wordplus <br />
<br />
<br />
<br />
> line Line <br />
Para line <br />
<br />
<br />
> page <br />
<br />
<br />
<br />
<br />
<br />
<br />
glyph Glyph<br />
word<br />
wordplus<br />
Para Word<br />
Line<br />
Para Word<br />
Line<br />
Table Row Cell Box <br />
Table Row Cell Box Glyph<br />
line Para Line <br />
page Para Table Row Cell
--tetml <br />
wordplus <br />
tet --tetml wordplus file.pdf<br />
<br />
<br />
> process_page( ) granularity <br />
<br />
> granularity=glyph word <br />
tetml glyphdetails <br />
<br />
wordplus <br />
<br />
granularity=word tetml={ glyphdetails={all} }<br />
<br />
<br />
<br />
<br />
<br />
<br />
glyph granularity=glyph tetml={glyphdetails={all}}<br />
word granularity=word <br />
wordplus granularity=word tetml={glyphdetails={all}}<br />
Line word granularity=word tetml={elements={line}}<br />
Line <br />
wordplus<br />
granularity=word<br />
tetml={glyphdetails={all} elements={line}}<br />
line granularity=line <br />
page granularity=page <br />
<br />
<br />
<br />
<br />
--docopt open_<br />
document( ) <br />
tetml <br />
elements <br />
<br />
<br />
tetml={ elements={nodocxmp} }
engines <br />
<br />
engines={noimage}<br />
<br />
/TET/Document/Options <br />
tetml={ elements={nooptions} }<br />
<br />
--pageopt <br />
process_page( ) <br />
tetml Glyph <br />
Glyph <br />
<br />
tetml={ glyphdetails={font} }<br />
Line <br />
tetml={ glyphdetails={font} elements={line} }<br />
Glyph sub sup <br />
<br />
tetml={ glyphdetails={sub sup} }<br />
all Glyph <br />
<br />
tetml={ glyphdetails={all} }<br />
<br />
<br />
topdown={output}<br />
<br />
<br />
<br />
contentanalysis={nopunctuationbreaks}<br />
page <br />
<br />
contentanalysis={lineseparator=U+0020}<br />
<br />
/TET/Document/Pages/Page/Options <br />
<br />
tetml={ elements={nooptions} }
Exception <br />
<br />
Object 'objects[49]/Subtype' does not exist<br />
Exception
9.3 TETML TETML <br />
<br />
<br />
http://www.pdflib.com/XML/TET3/TET-3.0<br />
<br />
http://www.pdflib.com/XML/TET3/TET-3.0.xsd<br />
<br />
<br />
<br />
<br />
<br />
<br />
Attachment<br />
Attachments<br />
Box<br />
Cell<br />
ColorSpace<br />
ColorSpaces<br />
Content<br />
Creation<br />
DocInfo<br />
Document<br />
Encryption<br />
<br />
Document <br />
<br />
name level pagenumber<br />
Attachment <br />
llx lly Box urx ury <br />
Box <br />
<br />
<br />
llx lly urx ury ulx uly lrx lry <br />
<br />
colSpan<br />
<br />
alternate base components id name<br />
ColorSpace <br />
<br />
granularity dehyphenation dropcap font geometry <br />
shadow sub sup <br />
<br />
<br />
platform tetVersion date<br />
<br />
<br />
filename pageCount filesize linearized pdfVersion pdfa <br />
pdfe pdfx tagged<br />
<br />
keylength algorithm description masterpassword userpassword noprint <br />
nomodify nocopy noannots noassemble noforms noaccessible nohiresprint <br />
plainmetadata
Exception<br />
Font<br />
Fonts<br />
Glyph<br />
<br />
<br />
<br />
Exception <br />
errnum<br />
name <br />
fullname <br />
embedded fullname id type name vertical<br />
Font <br />
<br />
<br />
Glyph <br />
Box <br />
x y width alpha beta shadow dropcap font size <br />
sub sup textrendering unknown dehyphenation <br />
Image<br />
Images<br />
Line<br />
Metadata<br />
Options<br />
Page<br />
Pages<br />
Para<br />
PlacedImage<br />
Resources<br />
Row<br />
Table<br />
TET<br />
<strong>Text</strong><br />
Word<br />
<br />
bitsPerComponent colorspace extractedAs height id mask <br />
maskonly mergetype width<br />
Image <br />
Line Word <br />
<br />
<br />
<br />
<br />
number height width topdown <br />
<br />
<br />
<br />
alpha beta height image width x y <br />
<br />
<br />
<br />
<br />
4.0 3 <br />
<br />
<br />
<br />
topdown
Exception<br />
Font<br />
Fonts<br />
Glyph<br />
<br />
<br />
<br />
Exception <br />
errnum<br />
name <br />
fullname <br />
embedded fullname id type name vertical<br />
Font <br />
<br />
<br />
Glyph <br />
Box <br />
x y width alpha beta shadow dropcap font size <br />
sub sup textrendering unknown dehyphenation <br />
Image<br />
Images<br />
Line<br />
Metadata<br />
Options<br />
Page<br />
Pages<br />
Para<br />
PlacedImage<br />
Resources<br />
Row<br />
Table<br />
TET<br />
<strong>Text</strong><br />
Word<br />
<br />
bitsPerComponent colorspace extractedAs height id mask <br />
maskonly mergetype width<br />
Image <br />
Line Word <br />
<br />
<br />
<br />
<br />
number height width topdown <br />
<br />
<br />
<br />
alpha beta height image width x y <br />
<br />
<br />
<br />
<br />
4.0 3 <br />
<br />
<br />
<br />
topdown
9.4 TETML XSLT <br />
eXtensible Stylesheet Language Transformations <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
www.w3.org/TR/xslt <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
> <br />
> <br />
> <br />
<br />
> libxslt <br />
> <br />
<br />
<br />
<br />
FontReporter.tetml tetml2html.xsl <br />
toc-generate 0 <br />
FontReporter.html <br />
> www.saxonica.com
java -jar saxon9.jar -o FontReporter.html FontReporter.tetml tetml2html.xsl<br />
toc-generate=0<br />
> xmlsoft.org/<br />
XSLT <br />
<br />
xsltproc --output FontReporter.html --param toc-generate 0 tetml2html.xsl<br />
FontReporter.tetml<br />
> <br />
<br />
Xalan -o FontReporter.html -p toc-generate 0 FontReporter.tetml tetml2html.xsl<br />
> msxsl.exe<br />
<br />
<br />
www.microsoft.com/downloads/details.aspx?familyid=2FB55371-C94E-4373-B0E9-DB4816552E41<br />
<br />
msxsl.exe FontReporter.tetml tetml2html.xsl -o FontReporter.html toc-generate=0<br />
<br />
<br />
<br />
runxslt <br />
<br />
<br />
<br />
<br />
runxslt <br />
<br />
> javax.xml.transform <br />
runxslt.java ant <br />
build.xml<br />
<br />
> System.Xml.Xsl.XslTransform <br />
runxslt.ps1 <br />
<br />
> <br />
MSXML2.DOMDocument <br />
runxslt.vbs
<br />
> javax.xml.transform <br />
> www.php.net/<br />
manual/en/intro.xsl.php <br />
<br />
<br />
xml <br />
<br />
<br />
.xml <br />
<br />
<br />
9.5 XSLT <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> <br />
<br />
ant <br />
> <br />
<br />
<br />
<br />
<br />
> <br />
<br />
<br />
Attachments xsl:template <br />
<br />
> <br />
<br />
<br />
concordance.xsl word wordplus <br />
<br />
<br />
<br />
List of words in the document along with the number of occurrences:<br />
the 207<br />
font 107<br />
of 100<br />
a 92<br />
in 83<br />
and 75<br />
fonts 64<br />
PDF 60<br />
FontReporter 58<br />
...<br />
fontfilter.xsl glyph wordplus
<strong>Text</strong> containing font 'TheSansBold-Plain' with size greater than 10:<br />
[TheSansBold-Plain/24] Contents<br />
[TheSansBold-Plain/13.98] 1<br />
[TheSansBold-Plain/13.98] Installing<br />
[TheSansBold-Plain/13.98] <strong>PDFlib</strong><br />
[TheSansBold-Plain/13.98] FontReporter<br />
[TheSansBold-Plain/13.98] 2<br />
[TheSansBold-Plain/13.98] Working<br />
[TheSansBold-Plain/13.98] with<br />
[TheSansBold-Plain/13.98] FontReporter<br />
[TheSansBold-Plain/13.98] A<br />
[TheSansBold-Plain/13.98] Revision<br />
[TheSansBold-Plain/13.98] History<br />
[TheSansBold-Plain/24] 1<br />
[TheSansBold-Plain/24] Installing<br />
[TheSansBold-Plain/24] <strong>PDFlib</strong><br />
[TheSansBold-Plain/24] FontReporter<br />
...<br />
fontfinder.xsl glyph wordplus <br />
<br />
<br />
<br />
<br />
TheSansExtraBold-Plain used on:<br />
page 1:<br />
(111, 636), (165, 636), (219, 636), (292, 636), (301, 636), (178, 603), (221, 603), (226,<br />
603),<br />
(272, 603), (277, 603), (102, 375), (252, 375), (261, 375), (267, 375)<br />
TheSans-Plain used on:<br />
page 1:<br />
(102, 266), (119, 266), (179, 266), (208, 266), (296, 266), (346, 266), (367, 266)<br />
...<br />
fontstat.xsl glyph wordplus <br />
<br />
<br />
<br />
<br />
19894 total glyphs in the document; breakdown by font:<br />
68.71% ThesisAntiqua-Normal: 13669 glyphs<br />
22.89% TheSans-Italic: 4553 glyphs<br />
6.38% TheSansBold-Plain: 1269 glyphs<br />
0.9% TheSansMonoCondensed-Plain: 179 glyphs<br />
0.49% TheSansBold-Italic: 98 glyphs<br />
0.27% TheSansExtraBold-Plain: 54 glyphs<br />
0.21% TheSerif-Caps: 42 glyphs<br />
0.15% TheSans-Plain: 29 glyphs<br />
0.01% Gen_TheSans-Plain: 1 glyphs
index.xsl word wordplus <br />
<br />
<br />
<br />
Alphabetical list of words in the document along with their page number:<br />
A<br />
about 2 7 8<br />
access 8 12<br />
accessible 11<br />
achieving 9 12<br />
Acrobat 2 5 7 8 9 10 11 14 15 17<br />
ActiveX 2<br />
actual 9 12<br />
actually 11 12 14<br />
addition 9<br />
Additional 12<br />
additions 17<br />
address 9 12<br />
addressed 9<br />
addressing 9<br />
Adobe 2 5 8 12 14<br />
...<br />
metadata.xsl <br />
<br />
<br />
<br />
dc:creator = <strong>PDFlib</strong> GmbH<br />
xmp:CreatorTool = FrameMaker 7.0<br />
table.xsl word wordplus page <br />
<br />
<br />
<br />
<br />
tetml2html.xsl wordplus <br />
<br />
<br />
<br />
> H1 H2 <br />
> <br />
<br />
> <br />
<br />
> <br />
resource tet --image --tetml file.pdf
textonly.xsl <br />
<strong>Text</strong>
10 pCOS <br />
<strong>PDFlib</strong> Comprehensive Object Syntax <br />
<br />
<br />
<br />
<br />
www.pdflib.com/pcos-cookbook/
10 pCOS <br />
<strong>PDFlib</strong> Comprehensive Object Syntax <br />
<br />
<br />
<br />
<br />
www.pdflib.com/pcos-cookbook/
11 TET API <br />
11.1 <br />
<br />
<br />
optlist <br />
<br />
<br />
<br />
<br />
<br />
<br />
sprintf( ) <br />
<br />
AppendFormat( ) <br />
<br />
Append( ) <br />
<br />
AppendFormat( ) Append( ) <br />
<br />
11.2 <br />
<br />
<br />
> <br />
<br />
<br />
> {} <br />
> <br />
<br />
<br />
> <br />
> <br />
<br />
<br />
>
key=value<br />
key = value<br />
key value<br />
key1 = value1 key2 = value2<br />
<br />
<br />
<br />
<br />
<br />
key value2 <br />
key=value1 key=value2<br />
<br />
<br />
{} <br />
searchpath={/usr/lib/tet d:\tet}<br />
(2)<br />
<br />
} { <br />
<br />
<br />
fold={ {[:Private_Use:] remove} {[U+FFFD] remove} }<br />
(2)<br />
<br />
<br />
fold={ {[:Private_Use:] remove} }<br />
(1)<br />
<br />
<br />
<br />
<br />
<br />
<br />
contentanalysis <br />
punctuationbreaks <br />
contentanalysis={punctuationbreaks=false}<br />
glyphmapping <br />
<br />
glyphmapping={ {fontname=GlobeLogosOne codelist=GlobeLogosOne} }<br />
glyphmapping
glyphmapping { {fontname=CMSY* glyphlist=tarski} {fontname=ZEH* glyphlist=zeh}}<br />
<br />
fontname <br />
glyphmapping={ {fontname={Globe Logos One} codelist=GlobeLogosOne} }<br />
<br />
fonttypes={Type1 TrueType}<br />
<br />
default <br />
<br />
fold={ {[:Private_Use:] remove} {[U+FFFD] remove} default }<br />
<br />
includeboxes={{10 20 30 40}}<br />
<br />
<br />
key1 {value1}key2 {value2}<br />
!<br />
Unknown option 'value2' <br />
<br />
key{value}<br />
key={{value1}{value2}}<br />
!<br />
!<br />
<br />
key={open brace {}<br />
!<br />
Braces aren't balanced in option list 'key={open brace {}' <br />
<br />
<br />
key={closing brace \} and open brace \{}<br />
!<br />
<br />
<br />
filename={C:\path\name\}<br />
filename={C:\path\name\\}<br />
!<br />
!
11.3 <br />
<br />
<br />
{} <br />
password={ secret string }<br />
contents={length=3mm}<br />
(3)<br />
(1)<br />
{} \ <br />
<br />
password={weird\}string}<br />
()<br />
<br />
<br />
filename={C:\path\name\\}<br />
(1)<br />
<br />
{}<br />
<br />
<br />
<br />
<strong>PDFlib</strong> <br />
<br />
<br />
<br />
<br />
<br />
x X 0x 0X U+ <br />
xAD 0xAD U+00AD <br />
shy #xAD #173 <br />
<br />
unknownchar=?<br />
unknownchar=63<br />
unknownchar=x3F<br />
unknownchar=0x3F<br />
unknownchar=U+003F<br />
lineseparator={CRLF}<br />
()<br />
(10)<br />
(16)<br />
(16)<br />
(Unicode)<br />
()<br />
<br />
replacementchar=3<br />
(U+0033 THREEU+0003!)<br />
<br />
>
U+FB00-U+FB17 <br />
<br />
U+0048U+006C<br />
> <br />
<br />
\uhhhh U+hhhh<br />
U+hhhhh<br />
\x{hhhhhh}<br />
\Uhhhhhhhh<br />
\\<br />
> <br />
type <br />
www.unicode.org/Public/UNIDATA/PropertyAliases.txt value <br />
www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt <br />
[:type=value:]<br />
[:^type=value:]<br />
\p{type=value}<br />
\P{type=value}<br />
type= <br />
<br />
> <br />
[[:letter:] [:number:]]<br />
[[:letter:] & [U+0061-U+007A]]<br />
[[:letter:]-[U+0061-U+007A]]<br />
[^U+0061-U+007A] <br />
<br />
<br />
<br />
unicode.org/cldr/utility/list-unicodeset.jsp<br />
true false <br />
true name=false noname <br />
usehostfonts<br />
nousehostfonts<br />
(usehostfonts=true)<br />
(usehostfonts=false)<br />
<br />
<br />
clippingarea=cropbox
[U+0061-U+007A]<br />
[U+0640]<br />
[\x{0640}]<br />
[U+FB00-U+FB17]<br />
[^U+0061-U+007A]<br />
[:Lu:]<br />
[:UppercaseLetter:]<br />
[:L:]<br />
[:Letter:]<br />
[:General_Category=Dash_Punctuation:]<br />
[:Alphabetic=No:]<br />
[:Private_Use:]<br />
<br />
a z <br />
<br />
<br />
<br />
a z <br />
<br />
<br />
<br />
Dash_Punctuation <br />
<br />
<br />
<br />
<br />
<br />
-12345<br />
0<br />
0xFF<br />
<br />
<br />
size = -123.45<br />
size = -123,45<br />
size = -1.2345E2<br />
size = -1.2345e+2
11.4 <br />
x y <br />
<br />
<br />
includebox = {{0 0 500 100} {0 500 500 600}}
11.5 <br />
11.5.1 <br />
C++ void set_option(string optlist)<br />
C# Java void set_option(String optlist)<br />
Perl PHP set_option(string optlist)<br />
VB RB Sub set_option(optlist As String)<br />
C void TET_set_option(TET *tet, const char *optlist)<br />
<br />
optlist <br />
<br />
searchpath <br />
<br />
asciifile cmap codelist encoding <br />
filenamehandling fontoutline glyphlist license licensefile logging userlog <br />
outputformat resourcefile searchpath<br />
<br />
<br />
<br />
<br />
TET_set_option( ) <br />
<br />
asciifile<br />
cmap 1, 2<br />
codelist 1, 2<br />
encoding 1, 2<br />
filenamehandling<br />
<br />
<br />
<br />
true false<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
unicode legacy <br />
ascii <br />
basicebcdic <br />
basicebcdic_37<br />
<br />
honorlang utf8 UTF-8 cpXXXX CPXXXX iso8859-x ISO-<br />
8859-x <br />
legacy auto <br />
honorlang <br />
unicode
TET_set_option( ) <br />
<br />
fontoutline 1, 2<br />
glyphlist 1, 2<br />
hostfont 1, 2<br />
license<br />
licensefile<br />
logging 1<br />
userlog<br />
outputformat<br />
resourcefile<br />
searchpath 1<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
open_document*( ) <br />
<br />
<br />
TET_open_document*( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
TET_new( ) <br />
<br />
<br />
TET_get_text( ) <br />
ebcdicutf8 <br />
utf8<br />
utf8 <br />
<br />
ebcdicutf8 <br />
<br />
<br />
utf16 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
tet.upr upr
11.5.2 <br />
C<br />
TET *TET_new(void)<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Java void delete( )<br />
C# void Dispose( )<br />
C<br />
void TET_delete(TET *tet)<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Dispose( ) <br />
<br />
11.5.3 <strong>PDFlib</strong> PVF<br />
C++ void create_pvf(string filename, const void *data, size_t size, string optlist)<br />
C# Java void create_pvf(String filename, byte[] data, String optlist)<br />
Perl PHP create_pvf(string filename, string data, string optlist)<br />
VB RB Sub create_pvf(filename As String, data, optlist As String)<br />
C void TET_create_pvf(TET *tet,<br />
const char *filename, int len, const void *data, size_t size, const char *optlist)<br />
<br />
<br />
filename <br />
<br />
len filename <br />
len=0 <br />
data <br />
<br />
<br />
size
optlist<br />
copy<br />
<br />
<br />
<br />
TET_delete_pvf( ) <br />
TET_delete( ) <br />
<br />
<br />
<br />
filename <br />
filename <br />
<br />
copy TET_delete_<br />
pvf( ) <br />
<br />
TET_create_pvf( ) <br />
<br />
copy<br />
<br />
<br />
<br />
<br />
false copy <br />
<br />
C++ int delete_pvf(string filename)<br />
C# Java int delete_pvf(String filename)<br />
Perl PHP<br />
VB RB<br />
C<br />
int delete_pvf(string filename)<br />
Function delete_pvf(filename As String) As Long<br />
int TET_delete_pvf(TET *tet, const char *filename, int len)<br />
<br />
<br />
filename<br />
TET_create_pvf( ) <br />
len filename <br />
len=0 <br />
<br />
<br />
<br />
<br />
filename <br />
filename <br />
<br />
filename TET_delete( ) <br />
<br />
TET_create_pvf( ) copy <br />
copy
11.5.4 Unicode <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
printf( ) <br />
<br />
<br />
C++<br />
Perl PHP<br />
C<br />
string utf8_to_utf16(string utf8string, string ordering)<br />
string utf8_to_utf16(string utf8string, string ordering)<br />
const char *TET_utf8_to_utf16(TET *tet, const char *utf8string, const char *ordering, int<br />
*size)<br />
<br />
utf8string <br />
<br />
<br />
ordering <br />
> utf16 <br />
<br />
> utf16le \xFF\xFE<br />
<br />
> utf16be <br />
\xFE\xFF <br />
size <br />
<br />
<br />
<br />
C++<br />
Perl PHP<br />
C<br />
string utf16_to_utf8(string utf16string)<br />
string utf16_to_utf8(string utf16string)<br />
const char *TET_utf16_to_utf8(TET *tet, const char *utf16string, int len, int *size)<br />
<br />
utf16string
len<br />
utf16string <br />
size <br />
<br />
<br />
\xEF\xBB\xBF<br />
<br />
<br />
C++<br />
Perl PHP<br />
C<br />
string utf32_to_utf16(string utf32string, string ordering)<br />
string utf32_to_utf16(string utf32string, string ordering)<br />
const char *TET_utf32_to_utf16(TET *tet, const char *utf32string, int len, const char<br />
*ordering,<br />
int *size)<br />
<br />
utf32string <br />
<br />
len<br />
utf32string <br />
ordering <br />
> utf16 <br />
<br />
> utf16le <br />
\xFF\xFE <br />
> utf16be <br />
\xFE\xFF <br />
size <br />
<br />
<br />
<br />
<br />
<br />
C++<br />
Perl PHP<br />
C<br />
string utf8_to_utf32(string utf8string, string ordering)<br />
string utf8_to_utf32(string utf8string, string ordering)<br />
const char *TET_utf8_to_utf32(TET *tet, const char *utf8string, const char *ordering, int<br />
*size)<br />
<br />
utf8string <br />
<br />
<br />
ordering<br />
<br />
size
C++<br />
Perl PHP<br />
C<br />
string utf32_to_utf8(string utf32string)<br />
string utf32_to_utf8(string utf32string)<br />
const char *TET_utf32_to_utf8(TET *tet, const char *utf32string, int len, int *size)<br />
<br />
utf32string <br />
<br />
len<br />
utf32string <br />
size <br />
<br />
<br />
<br />
\xEF\xBB\xBF<br />
<br />
<br />
<br />
C++<br />
Perl PHP<br />
C<br />
string utf16_to_utf32(string utf16string, string ordering)<br />
string utf16_to_utf32(string utf16string, string ordering)<br />
const char *TET_utf16_to_utf32(TET *tet, const char *utf16string, int len, const char<br />
*ordering,<br />
int *size)<br />
<br />
utf16string <br />
<br />
len<br />
utf16string <br />
ordering<br />
<br />
size
11.5.5 <br />
C++ string get_apiname( )<br />
C# Java String get_apiname( )<br />
Perl PHP string get_apiname( )<br />
VB RB Function get_apiname( ) As String<br />
C const char *TET_get_apiname(TET *tet)<br />
<br />
<br />
<br />
<br />
C++ string get_errmsg( )<br />
C# Java String get_errmsg( )<br />
Perl PHP string get_errmsg( )<br />
VB RB Function get_errmsg( ) As String<br />
C const char *TET_get_errmsg(TET *tet)<br />
<br />
<br />
<br />
<br />
<br />
C++ int get_errnum( )<br />
C# Java int get_errnum( )<br />
Perl PHP long get_errnum( )<br />
VB RB Function get_errnum( ) As Long<br />
C int TET_get_errnum(TET *tet)<br />
<br />
<br />
<br />
<br />
<br />
C<br />
C<br />
C<br />
C<br />
TET_TRY(tet)<br />
TET_CATCH(tet)<br />
TET_RETHROW(tet)<br />
TET_EXIT_TRY(tet)<br />
<br />
TET_CATCH( ) TET_TRY( )
TET_RETHROW( ) <br />
<br />
<br />
<br />
11.5.6 <br />
<br />
<br />
TET_set_option( ) <br />
<br />
TET_set_option( ) <br />
<br />
logging<br />
userlog<br />
<br />
<br />
<br />
<br />
> TET_set_option( ) logging <br />
tet.set_option("logging", "filename=debug.log remove")<br />
> TETLOGGING <br />
<br />
TET_set_option( ) logging <br />
<br />
<br />
disable<br />
enable<br />
filename<br />
flush<br />
remove<br />
stringlimit<br />
<br />
disable <br />
false<br />
<br />
stdout stderr <br />
<br />
filename <br />
tet.log / /tmp <br />
true <br />
<br />
<br />
false false<br />
true <br />
false
TET_set_option( ) logging <br />
<br />
classes<br />
<br />
<br />
<br />
<br />
{api=1 warning=1} <br />
api api=2 <br />
<br />
api=3 <br />
<br />
filesearch <br />
<br />
resource <br />
<br />
user userlog <br />
warning <br />
warning=2 TET_get_errmsg( )
11.6 <br />
C++ int open_document(string filename, string optlist)<br />
C# Java int open_document(String filename, String optlist)<br />
Perl PHP long open_document(string filename, string optlist)<br />
VB RB Function open_document(filename As String, optlist As String) As Long<br />
C int TET_open_document(TET *tet, const char *filename, int len, const char *optlist)<br />
<br />
filename <br />
searchpath <br />
<br />
<br />
<br />
len = 0 <br />
<br />
<br />
len filename <br />
len=0 <br />
optlist <br />
checkglyphlists decompose encodinghint fold glyphmapping <br />
lineseparator normalize inmemory password repair requiredmode shrug tetml <br />
usehostfonts wordseparator zoneseparator<br />
<br />
<br />
<br />
TET_<br />
get_errmsg( ) <br />
<br />
<br />
<br />
<br />
password <br />
<br />
requiredmode <br />
shrug <br />
<br />
<br />
<br />
<br />
QSYS.lib <br />
QSYS.lib <br />
<br />
QSYS.lib
TET_open_document( ) TET_open_document_callback( ) <br />
<br />
checkglyphlists<br />
decompose<br />
encodinghint<br />
<br />
true condition=allfonts <br />
<br />
<br />
false<br />
<br />
<br />
<br />
<br />
normalize <br />
none <br />
normalize decompose=none <br />
<br />
<br />
none <br />
default <br />
<br />
<br />
canonical circle compat final font fraction initial isolated medial narrow nobreak <br />
small square sub super vertical wide<br />
<br />
<br />
<br />
<br />
<br />
_all <br />
<br />
_none <br />
<br />
<br />
none <br />
winansi
TET_open_document( ) TET_open_document_callback( ) <br />
<br />
fold<br />
glyphmapping<br />
keeppua<br />
lineseparator<br />
<br />
<br />
<br />
<br />
lineseparator wordseparator <br />
<br />
<br />
<br />
none <br />
<br />
default <br />
<br />
<br />
fold <br />
<br />
<br />
<br />
_dehyphenation<br />
<br />
TET_get_char_info( ) attributes <br />
@dehyphenation <br />
<br />
<br />
(Unichar) <br />
<br />
remove <br />
preserve <br />
unknownchar<br />
unknownchar <br />
<br />
<br />
<br />
<br />
* <br />
<br />
<br />
<br />
<br />
<br />
fold={{[:Private_Use:] preserve}} fold={{[:Private_Use:] unknownchar}} <br />
<br />
granularity=zone page
TET_open_document( ) TET_open_document_callback( ) <br />
<br />
normalize<br />
inmemory<br />
password<br />
repair<br />
requiredmode<br />
shrug<br />
<br />
<br />
none <br />
nfc <br />
nfd <br />
nfkc <br />
nfkd <br />
decompose <br />
normalize normalize none <br />
decompose=none normalize <br />
decompose <br />
TET_open_document( ) true <br />
<br />
false <br />
false<br />
<br />
<br />
<br />
<br />
<br />
shrug <br />
<br />
<br />
<br />
auto <br />
force <br />
auto <br />
none <br />
<br />
minimum <br />
restricted full <br />
<br />
<br />
requiredmode=minimum <br />
full<br />
true <br />
shrug <br />
false
TET_open_document( ) TET_open_document_callback( ) <br />
<br />
tetml<br />
<br />
TET_process_page( ) <br />
<br />
elements <br />
<br />
docinfo /TET/Document/DocInfo <br />
docxmp /TET/Document/Metadata <br />
options /TET/Document/Options /TET/Document/Pages/Page/Options<br />
encodingname<br />
<br />
UTF-8 <br />
_none <br />
<br />
UTF-8 encoding="UTF-8" <br />
<br />
<br />
<br />
<br />
filename filename <br />
TET_get_xml_data( ) <br />
<br />
<br />
unknownchar<br />
usehostfonts<br />
wordseparator<br />
<br />
<br />
<br />
unknownchar <br />
fold={{[:Private_Use:] unknownchar}} fold={{[:Private_<br />
Use:] remove}} <br />
true <br />
<br />
true<br />
granularity=line page <br />
<br />
<br />
<br />
TET_open_document( ) TET_open_document_callback( ) glyphmapping <br />
<br />
codelist<br />
fontname<br />
fonttypes<br />
<br />
<br />
<br />
<br />
<br />
<br />
*<br />
<br />
* Type1 MMType1 TrueType CIDFontType2 <br />
CIDFontType0 Type3*
TET_open_document( ) TET_open_document_callback( ) glyphmapping <br />
<br />
forceencoding<br />
forcettsymbolencoding<br />
globalglyphlist<br />
glyphlist<br />
glyphrule<br />
override<br />
ignoretounicodecmap<br />
tounicodecmap<br />
<br />
winansi macroman Custom <br />
<br />
<br />
MacRoman WinAnsi MacExpert <br />
<br />
<br />
<br />
<br />
auto <br />
auto -<br />
encodinghint <br />
<br />
encodinghint builtin<br />
<br />
builtin <br />
<br />
<br />
true <br />
false<br />
<br />
<br />
<br />
prefix <br />
base <br />
ascii <br />
1 <br />
auto <br />
<br />
dec <br />
hex <br />
encoding <br />
none <br />
true <br />
false<br />
glyphlist glyphrule true <br />
<br />
<br />
true<br />
<br />
<br />
* MSTT* <br />
<br />
winansi macroman macroman_apple macroman_euro <br />
ebcdic ebcdic_37 iso8859-X cpXXXX U+XXXX
C++<br />
C<br />
int open_document_callback(void *opaque, size_t filesize,<br />
size_t (*readproc)(void *opaque, void *buffer, size_t size),<br />
int (*seekproc)(void *opaque, long offset),<br />
string optlist)<br />
int TET_open_document_callback(TET *tet, void *opaque, size_t filesize,<br />
size_t (*readproc)(void *opaque, void *buffer, size_t size),<br />
int (*seekproc)(void *opaque, long offset),<br />
const char *optlist)<br />
<br />
opaque <br />
<br />
<br />
filesize<br />
<br />
readproc size buffer <br />
<br />
<br />
seekproc offset <br />
<br />
<br />
optlist<br />
<br />
<br />
<br />
<br />
TET_open_document( ) <br />
TET_open_document( ) <br />
<br />
C++ void close_document(int doc)<br />
C# Java void close_document(int doc)<br />
Perl PHP TET_close_document(resource tet, long doc)<br />
VB RB Sub close_document(doc As Long)<br />
C void TET_close_document(TET *tet, int doc)<br />
<br />
doc<br />
TET_open_document*( ) <br />
TET_delete( )
11.7 <br />
C++ int open_page(int doc, int pagenumber, string optlist)<br />
C# Java int open_page(int doc, int pagenumber, String optlist)<br />
Perl PHP long open_page(long pagenumber, string optlist)<br />
VB RB Function open_page(doc As Long, pagenumber As Long, optlist As String) As Long<br />
C int TET_open_page(TET *tet, int doc, int pagenumber, const char *optlist)<br />
<br />
doc<br />
TET_open_document*( ) <br />
pagenumber <br />
TET_pcos_get_number( ) length:pages <br />
optlist <br />
clippingarea contentanalysis docstyle excludebox fontsizerange <br />
granularity ignoreinvisibletext imageanalysis includebox layoutanalysis <br />
layouteffort skipengines structureanalysis topdown<br />
<br />
<br />
TET_get_errmsg( )
TET_open_page( ) TET_process_page( ) <br />
<br />
clippingarea<br />
docstyle<br />
excludebox<br />
granularity<br />
contentanalysis<br />
fontsizerange<br />
ignoreinvisibletext<br />
imageanalysis<br />
<br />
includebox <br />
cropbox <br />
mediabox <br />
cropbox <br />
bleedbox <br />
trimbox <br />
artbox <br />
unlimited <br />
granularity=glyph <br />
<br />
<br />
<br />
<br />
book <br />
business <br />
fancy <br />
forms <br />
generic <br />
magazines <br />
papers <br />
science <br />
searchengine<br />
<br />
<br />
spacegrid <br />
<br />
<br />
<br />
<br />
<br />
unlimited <br />
{ 0 unlimited }<br />
TET_get_text( ) glyph <br />
<br />
word <br />
glyph <br />
<br />
word <br />
<br />
line <br />
<br />
page <br />
<br />
true <br />
false
TET_open_page( ) TET_process_page( ) <br />
<br />
includebox<br />
layouteffort<br />
skipengines<br />
layoutanalysis<br />
structureanalysis<br />
topdown<br />
<br />
<br />
<br />
granularity=glyph <br />
<br />
<br />
<br />
none low medium high extra <br />
low<br />
<br />
<br />
<br />
<br />
text <br />
image <br />
granularity=glyph <br />
<br />
y <br />
<br />
<br />
input true <br />
false <br />
includebox excludebox<br />
output true <br />
false <br />
TET_char_info y alpha beta<br />
TET_image_info y alpha beta<br />
Glyph/@y Glyph/@alpha Glyph/@beta Box/@lly Box/@ury PlacedImage/<br />
@y PlacedImage/@alpha PlacedImage/@beta<br />
TET_open_page( ) TET_process_page( ) contentanalysis <br />
<br />
bidi<br />
bidilevel<br />
dehyphenate<br />
<br />
granularity=glyph <br />
<br />
logical <br />
visual <br />
<br />
logical <br />
<br />
auto <br />
auto <br />
ltr <br />
rtl <br />
<br />
true <br />
keephyphens <br />
true
TET_open_page( ) TET_process_page( ) contentanalysis <br />
<br />
dropcapsize<br />
dropcapratio<br />
includeboxorder<br />
keep<br />
hyphenglyphs<br />
lineseparator<br />
linespacing<br />
maxwords<br />
<br />
<br />
<br />
<br />
<br />
dropcapsize dropcapratio <br />
<br />
<br />
<br />
includebox <br />
<br />
0 <br />
<br />
<br />
<br />
1 <br />
<br />
<br />
<br />
<br />
<br />
<br />
2 <br />
<br />
<br />
<br />
<br />
true dehyphenate=true <br />
get_char_info( ) Glyph <br />
<br />
fold={{_dehyphenation remove} <br />
get_text( ) <br />
false<br />
<br />
small medium <br />
large medium<br />
unlimited <br />
<br />
<br />
<br />
<br />
unlimited
TET_open_page( ) TET_process_page( ) contentanalysis <br />
<br />
merge<br />
punctuation<br />
breaks<br />
superscript<br />
wordseparator<br />
<br />
<br />
0 <br />
<br />
1 <br />
<br />
<br />
2 <br />
<br />
<br />
<br />
keep <br />
split punctuationbreaks <br />
keep <br />
true <br />
true<br />
granularity=word true <br />
<br />
true<br />
<br />
0 <br />
1 <br />
2 <br />
<br />
TET_open_document*( ) <br />
TET_open_page( ) TET_process_page( ) layoutanalysis <br />
numericentities<br />
shadowdetect<br />
<br />
layoutastable<br />
layoutcolumnhint<br />
layoutdetect<br />
<br />
true <br />
<br />
false true<br />
<br />
multicolumn <br />
multicolumn<br />
<br />
none <br />
singlecolumn<br />
<br />
<br />
0 <br />
1 <br />
2 <br />
<br />
<br />
3
TET_open_page( ) TET_process_page( ) layoutanalysis <br />
<br />
mergetables<br />
splithint<br />
layoutrowhint<br />
standalonefontsize<br />
supertablecolumns<br />
tabledetect<br />
<br />
none <br />
full <br />
none <br />
separation <br />
<br />
preservecolumns<br />
<br />
<br />
<br />
thick <br />
<br />
<br />
<br />
<br />
thin <br />
<br />
<br />
layoutanalysis = {layoutrowhint={full separation=thick}}<br />
<br />
<br />
none <br />
down <br />
none <br />
up <br />
updown <br />
<br />
<br />
includebox includebox <br />
<br />
x <br />
<br />
y <br />
<br />
<br />
layoutastable=true <br />
<br />
<br />
<br />
<br />
0 <br />
1 <br />
2
TET_open_page( ) TET_process_page( ) imageanalysis <br />
<br />
smallimages<br />
merge<br />
<br />
<br />
<br />
disable true false<br />
maxarea <br />
<br />
maxcount <br />
<br />
<br />
<br />
<br />
<br />
disable true false<br />
gap <br />
<br />
TET_open_page( ) TET_process_page( ) structureanalysis <br />
<br />
bullets<br />
list<br />
paragraph<br />
table<br />
<br />
list=true <br />
<br />
bulletchars<br />
<br />
fontname <br />
<br />
fontname <br />
bulletchars <br />
<br />
<br />
bullets={{fontname=ZapfDingbats}}<br />
bullets={{bulletchars={U+2022}}<br />
bullets={{fontname=KozGoPro-Medium bulletchars={U+2460 U+2461 U+2462 U+2463 U+2464}}<br />
false false <br />
<br />
true false <br />
<br />
true false
C++ void close_page(int page)<br />
C# Java void close_page(int page)<br />
Perl PHP close_page(long page)<br />
VB RB Sub close_page(page As Long)<br />
C void TET_close_page(TET *tet, int page)<br />
<br />
page<br />
TET_open_page( ) <br />
<br />
TET_close_document( )
11.8 <br />
C++ string get_text(int page)<br />
C# Java String get_text(int page)<br />
Perl PHP string get_text(long page)<br />
VB RB Function get_text(page As Long) As String<br />
C const char *TET_get_text(TET *tet, int page, int *len)<br />
<br />
page<br />
TET_open_page( ) <br />
len <br />
outputformat=utf16 <br />
outputformat=utf8 <br />
<br />
<br />
<br />
TET_open_<br />
page( ) granularity granularity=glyph <br />
<br />
<br />
<br />
<br />
TET_get_<br />
errnum( ) <br />
TET_set_option( ) outputformat <br />
<br />
<br />
*len=0
C++ const TET_char_info *get_char_info(int page)<br />
C# Java int get_char_info(int page)<br />
Perl PHP object get_char_info(long page)<br />
VB RB Function get_char_info(int page) As Long<br />
C const TET_char_info *TET_get_char_info(TET *tet, int page)<br />
<br />
page<br />
TET_open_page( ) <br />
<br />
<br />
TET_get_glyph_info( ) <br />
TET_get_text( ) <br />
<br />
<br />
<br />
TET_get_text( ) <br />
<br />
<br />
M <br />
N N N>0 N <br />
M <br />
> granularity=glyph <br />
N=1 <br />
M=1 <br />
M>1 TET_get_char_info( ) <br />
<br />
> glyph <br />
<br />
N M <br />
N M <br />
<br />
<br />
glyph TET_get_text( ) <br />
<br />
<br />
<br />
<br />
<br />
TET_<br />
get_char_info( ) TET_close_page( ) <br />
<br />
<br />
TET_get_char_info( )
TET_get_text( ) <br />
<br />
TET_char_info <br />
<br />
TET_get_text( ) <br />
<br />
<br />
<br />
unknown false <br />
get_text( ) <br />
<br />
<br />
<br />
get_text( ) <br />
nil <br />
TET_char_info <br />
<br />
get_text( ) <br />
<br />
<br />
<br />
long <br />
TET_char_info <br />
<br />
<br />
<br />
uv<br />
type<br />
<br />
glyph <br />
granularity=glyph <br />
<br />
<br />
<br />
<br />
<br />
0 <br />
1 <br />
x y <br />
width uv <br />
<br />
10 <br />
11 <br />
12
TET_char_info <br />
<br />
<br />
<br />
attributes<br />
<br />
<br />
0 <br />
1 <br />
2 <br />
3 <br />
4 <br />
5 contentanalysis={keephyphenglyphs=true} <br />
<br />
6 <br />
unknown<br />
false <br />
unknownchar true <br />
x, y <br />
x y <br />
<br />
width<br />
alpha<br />
beta<br />
fontid<br />
fontsize<br />
textrendering<br />
<br />
<br />
<br />
<br />
alpha <br />
<br />
alpha <br />
<br />
beta <br />
abs(beta) <br />
fonts[ ] fontid <br />
<br />
<br />
<br />
<br />
<br />
0 <br />
1 <br />
2 <br />
3 <br />
4 <br />
5 <br />
6 <br />
7
11.9 <br />
C++ const TET_image_info *get_image_info(int page)<br />
C# Java int get_image_info(int page)<br />
Perl PHP object image_info TET_get_image_info(long page)<br />
VB RB Function get_image_info(int page) As Long<br />
C const TET_image_info *TET_get_image_info(TET *tet, int page)<br />
<br />
<br />
page<br />
TET_open_page( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
TET_<br />
get_image_info( ) TET_close_page( ) <br />
<br />
<br />
TET_get_image_info( ) <br />
<br />
<br />
TET_image_info<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
nil <br />
TET_image_info <br />
<br />
<br />
<br />
TET_image_info
long <br />
<br />
TET_image_info <br />
<br />
<br />
<br />
<br />
x, y <br />
width,<br />
height<br />
alpha<br />
beta<br />
imageid<br />
<br />
<br />
alpha <br />
alpha <br />
alpha beta<br />
beta <br />
beta abs(beta) <br />
<br />
images[ ] <br />
<br />
C++ int write_image_file(int doc, int imageid, string optlist)<br />
C# Java int write_image_file(int doc, int imageid, String optlist)<br />
Perl PHP long write_image_file(long doc, long imageid, string optlist)<br />
VB RB Function write_image_file(doc As Long, imageid As Long, optlist As String) As Long<br />
C int TET_write_image_file(TET *tet, int doc, int imageid, const char *optlist)<br />
<br />
doc<br />
TET_open_document*( ) <br />
imageid TET_get_image_info( ) <br />
imageid images <br />
length:images <br />
optlist <br />
compression filename keepxmp typeonly<br />
<br />
TET_get_<br />
errmsg( ) <br />
<br />
<br />
<br />
> <br />
> .tif <br />
> .jpg <br />
> .jpx <br />
> .raw
typeonly <br />
<br />
tetlib.h <br />
TET_write_image_file( ) TET_get_image_data( ) <br />
<br />
compression<br />
filename 1<br />
keepxmp<br />
typeonly 1<br />
<br />
auto <br />
auto <br />
none <br />
<br />
typeonly filename<br />
<br />
Image/@id attribute <br />
<br />
I<br />
imageid imageid <br />
true <br />
true<br />
<br />
TET_get_image_data( ) <br />
false<br />
TET_write_image_file( ) <br />
C++ const char *get_image_data(int doc, size_t *length, int imageid, string optlist)<br />
C# Java final byte[ ] get_image_data(int doc, int imageid, String optlist)<br />
Perl PHP string get_image_data(long doc, long imageid, string optlist)<br />
VB RB Function get_image_data(doc As Long, imageid As Long, optlist As String)<br />
C const char * TET_get_image_data(TET *tet, int doc, size_t *length, int imageid, const char *optlist)<br />
<br />
doc<br />
TET_open_document*( ) <br />
length <br />
<br />
imageid TET_get_image_info( ) <br />
imageid images <br />
length:images <br />
optlist <br />
compression keepxmp
TET_get_errmsg( )
11.10 TET TETML <br />
C++ int process_page(int doc, int pagenumber, string optlist)<br />
C# Java int process_page(int doc, int pagenumber, String optlist)<br />
Perl PHP long process_page(long doc, long pagenumber, string optlist)<br />
VB RB Function process_page(doc As Long, pagenumber As Long, optlist As String) As Int<br />
C int TET_process_page(TET *tet, int doc, int pagenumber, const char *optlist)<br />
<br />
doc<br />
TET_open_document*( ) <br />
pagenumber <br />
TET_pcos_get_number( ) length:pages <br />
trailer=true pagenumber <br />
optlist <br />
> pagenumber=0<br />
clippingarea contentanalysis excludebox fontsizerange granularity <br />
ignoreinvisibletext imageanalysis includebox layoutanalysis skipengines<br />
> tetml<br />
TET_process_page( ) <br />
<br />
tetml<br />
<br />
<br />
elements <br />
line granularity=word Para <br />
Word Line false<br />
glyphdetails<br />
granularity=glyph word Glyph <br />
<br />
false <br />
all <br />
dehyphenation<br />
dehyphenation <br />
<br />
dropcap dropcap <br />
<br />
geometry x y width alpha beta <br />
font font fontsize textrendering unknown <br />
sub sub <br />
sup sup <br />
trailer true <br />
<br />
<br />
pagenumber=0 <br />
trailer=true <br />
TET_process_page( ) false
Exception<br />
<br />
TET_open_document*( ) <br />
TET_get_xml_data( ) <br />
<br />
TET_open_document*( ) <br />
<br />
TET_open_document*( ) <br />
TET_process_page( ) TET_get_xml_data( ) <br />
<br />
<br />
<br />
trailer <br />
<br />
pagenumber=0 pagenumber <br />
<br />
<br />
TET_close_document( ) <br />
TET_process_page( ) <br />
C++ const char *get_xml_data(int doc, size_t *length, string optlist)<br />
C# Java final byte[ ] get_xml_data(int doc, String optlist)<br />
Perl PHP string get_xml_data(long doc, string optlist)<br />
VB RB Function get_xml_data(doc As Long, optlist As String)<br />
C const char * TET_get_xml_data(TET *tet, int doc, size_t *length, const char *optlist)<br />
<br />
doc<br />
TET_open_document*( ) <br />
length <br />
length <br />
optlist<br />
<br />
<br />
<br />
<br />
*len=0<br />
TET_open_document*( ) TET_process_page( ) <br />
outputformat<br />
<br />
TET_process_page( ) TET_get_xml_<br />
data( ) <br />
<br />
TET_close_document( ) <br />
TET_get_xml_data( )<br />
TET_process_
page( ) TET_close_document( ) <br />
<br />
TET_open_document*( ) tetml filename <br />
<br />
<br />
<br />
<br />
TET_get_xml_data( ) <br />
<br />
<br />
<br />
<br />
<br />
bytes
11.11 pCOS <br />
<br />
<br />
C++ double pcos_get_number(int doc, string path)<br />
C# Java double pcos_get_number(int doc, String path)<br />
Perl PHP float pcos_get_number(int doc, String path)<br />
VB RB Function pcos_get_number(doc as Long, path As String) As Double<br />
C double TET_pcos_get_number(TET *tet, int doc, const char *path, ...)<br />
<br />
<br />
doc<br />
path<br />
TET_open_document*( ) <br />
<br />
key <br />
%s %d %% <br />
<br />
<br />
<br />
<br />
true <br />
<br />
C++ string pcos_get_string(int doc, string path)<br />
C# Java String pcos_get_string(int doc, String path)<br />
Perl PHP String pcos_get_string(int doc, String path)<br />
VB RB Function pcos_get_string(doc as Long, path As String) As String<br />
C const char *TET_pcos_get_string(TET *tet, int doc, const char *path, ...)<br />
<br />
doc<br />
path<br />
TET_open_document*( ) <br />
<br />
key <br />
%s %d %% <br />
<br />
<br />
<br />
<br />
<br />
<br />
true false <br />
<br />
/Info/
* nocopy=false plainmetadata=true <br />
bookmarks[...]/Title pages[...]/Annots/Contents <br />
nocopy=false <br />
<br />
<br />
TET_pcos_get_stream( ) <br />
<br />
<br />
<br />
<br />
printf( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
C++ const unsigned char *pcos_get_stream(int doc, int *length, string optlist, string path)<br />
C# Java final byte[ ] pcos_get_stream(int doc, String optlist, String path)<br />
Perl PHP String pcos_get_stream(int doc, String optlist, String path)<br />
VB RB Function pcos_get_stream(doc as Long, optlist As String, path As String)<br />
C const unsigned char *TET_pcos_get_stream(TET *tet, int doc, int *length, const char *optlist,<br />
const char *path, ...)<br />
stream fstream <br />
doc<br />
TET_open_document*( ) <br />
length <br />
<br />
optlist<br />
path<br />
<br />
<br />
key <br />
%s %d %%
stream keepfilter=true <br />
<br />
fstream <br />
<br />
convert <br />
<br />
<br />
/Root/Metadata <br />
nocopy=false plainmetadata=true stream <br />
fstream <br />
<br />
<br />
TET_pcos_get_string( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
bytes<br />
<br />
<br />
<br />
<br />
TET_pcos_get_stream( ) <br />
<br />
convert<br />
keepfilter<br />
<br />
<br />
none <br />
none <br />
unicode TET_pcos_get_string( ) <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
true filterinfo<br />
false<br />
<br />
true false
ATET<br />
(C) <br />
<br />
<br />
<br />
<br />
(C) TET *TET_new(void) 152<br />
void delete( ) 152<br />
PVF <br />
<br />
<br />
void create_pvf(String filename, byte[] data, String optlist) 152<br />
int delete_pvf(String filename) 153<br />
Unicode <br />
<br />
<br />
(C) const char *TET_utf8_to_utf16(TET *tet, const char *utf8string, const char *ordering, int *size) 154<br />
(C) const char *TET_utf16_to_utf8(TET *tet, const char *utf16string, int len, int *size) 154<br />
(C) const char *TET_utf32_to_utf16(TET *tet, const char *utf32string, int len, const char *ordering, int *size) 155<br />
(C) const char *TET_utf8_to_utf32(TET *tet, const char *utf8string, const char *ordering, int *size) 155<br />
(C) const char *TET_utf32_to_utf8(TET *tet, const char *utf32string, int len, int *size) 156<br />
(C) const char *TET_utf16_to_utf32(TET *tet, const char *utf16string, int len, const char *ordering, int *size) 156<br />
<br />
<br />
<br />
String get_apiname( ) 157<br />
String get_errmsg( ) 157<br />
int get_errnum( ) 157<br />
<br />
<br />
<br />
int open_document(String filename, String optlist) 160<br />
(C) int TET_open_document_callback(TET *tet, void *opaque, size_t filesize, size_t (*readproc)(void<br />
*opaque, void *buffer, size_t size), int (*seekproc)(void *opaque, long offset), const char *optlist) 166<br />
void close_document(int doc) 166
int open_page(int doc, int pagenumber, String optlist) 167<br />
void close_page(int page) 174<br />
<br />
<br />
<br />
String get_text(int page) 175<br />
int get_char_info(int page) 176<br />
<br />
<br />
<br />
int get_image_info(int page) 179<br />
int write_image_file(int doc, int imageid, String optlist) 180<br />
final byte[ ] get_image_data(int doc, int imageid, String optlist) 181<br />
TET TETML <br />
<br />
<br />
int process_page(int doc, int pagenumber, String optlist) 183<br />
final byte[ ] get_xml_data(int doc, String optlist) 184<br />
<br />
<br />
<br />
void set_option(String optlist) 150<br />
pCOS <br />
<br />
<br />
double pcos_get_number(int doc, String path) 186<br />
String pcos_get_string(int doc, String path) 186<br />
final byte[ ] pcos_get_stream(int doc, String optlist, String path) 187
B <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> TET 4.0 <br />
> TET 3.0 <br />
> TET 2.3 <br />
> TET 2.0 <br />
> TET 2.1.0 PHP RPG <br />
<br />
> TET 2.0.0 <br />
> TET 1.1 <br />
> TET 1.0.2 TET_open_doc_callback( ) <br />
<br />
> TET 1 1
B <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
> TET 4.0 <br />
> TET 3.0 <br />
> TET 2.3 <br />
> TET 2.0 <br />
> TET 2.1.0 PHP RPG <br />
<br />
> TET 2.0.0 <br />
> TET 1.1 <br />
> TET 1.0.2 TET_open_doc_callback( ) <br />
<br />
> TET 1 1
A<br />
API 143<br />
B<br />
BMP 92<br />
BOM Byte Order Mark 92<br />
C<br />
C++ 31<br />
codelist 108<br />
COM 33<br />
CSV 139<br />
C 29<br />
D<br />
Dispose( ) 152<br />
F<br />
float<br />
147<br />
float <br />
148<br />
FontReporter Plugin 13, 107<br />
G<br />
glyphlist 110<br />
glyphrule 111<br />
granularity 83<br />
H<br />
HTML XSLT 139<br />
I<br />
IFilter<br />
Microsoft 54<br />
J<br />
J2EE 34<br />
Javadoc 35<br />
Java 34<br />
L<br />
Lucene 47<br />
M<br />
MediaWiki 57<br />
N<br />
.NET 36<br />
O<br />
Oracle <strong>Text</strong> 51<br />
P<br />
pCOS<br />
API 186<br />
141<br />
17<br />
PDF 13<br />
Perl 37<br />
PHP 38<br />
PUA 92<br />
Python 40<br />
R<br />
REALbasic 41<br />
resourcefile 63<br />
RPG 42<br />
S<br />
searchpath 62<br />
Solr 50<br />
T<br />
tet.upr 63<br />
TET_CATCH( ) 157<br />
TET_close_document( ) 166<br />
TET_close_page( ) 174<br />
TET_create_pvf() 152<br />
TET_delete( ) 152<br />
TET_delete_pvf() 153<br />
TET_EXIT_TRY( ) 29, 157<br />
TET_get_apiname() 157<br />
TET_get_char_info( ) 176<br />
TET_get_errmsg( ) 157
TET_get_errnum( ) 157<br />
TET_get_image_data( ) 181<br />
TET_get_image_info( ) 179<br />
TET_get_text( ) 175<br />
TET_get_xml_data( ) 184<br />
TET_new( ) 152<br />
TET_open_document( ) 160<br />
TET_open_document_callback( ) 166<br />
TET_open_page( ) 167<br />
TET_pcos_get_number( ) 186<br />
TET_pcos_get_stream( ) 187<br />
TET_pcos_get_string( ) 186<br />
TET_RETHROW( ) 157<br />
TET_set_option( ) 150<br />
TET_TRY( ) 157<br />
TET_utf16_to_utf32() 156<br />
TET_utf16_to_utf8( ) 154<br />
TET_utf32_to_utf16() 155<br />
TET_utf32_to_utf8() 156<br />
TET_utf8_to_utf16( ) 154<br />
TET_utf8_to_utf16() 155<br />
TET_write_image_file( ) 180<br />
TETML 123<br />
TETML 131<br />
TETRESOURCEFILE 63<br />
TET <br />
Lucene 47<br />
MediaWiki 57<br />
Microsoft 54<br />
Oracle 51<br />
Solr 50<br />
TET 19<br />
TET 13<br />
TET 17<br />
TET <br />
Adobe Acrobat 45<br />
TeX 68<br />
ToUnicode CMap 109<br />
U<br />
Unichar <br />
146<br />
Unicode<br />
BOM 92<br />
96<br />
146<br />
91<br />
103<br />
92<br />
92<br />
99<br />
94<br />
94<br />
Unicode 96<br />
UPR 61<br />
UTF-32 105<br />
UTF 92<br />
X<br />
XMP 70<br />
XSLT 139<br />
114<br />
XSD <br />
TETML 131<br />
XSLT 134<br />
137, 16<br />
x 76<br />
<br />
94<br />
81<br />
59<br />
144<br />
<br />
TET 7<br />
73<br />
143<br />
143<br />
<br />
79<br />
85<br />
<br />
XMP 114<br />
121<br />
119<br />
119<br />
116<br />
113<br />
113<br />
117<br />
121<br />
116<br />
118<br />
117<br />
118<br />
115<br />
<br />
147<br />
92<br />
91<br />
93<br />
85<br />
<br />
149<br />
93<br />
91
111<br />
74<br />
110<br />
19<br />
XSLT 137<br />
93<br />
100<br />
<br />
<br />
65<br />
XSLT 139<br />
<br />
XSLT 137<br />
73<br />
93<br />
71<br />
<br />
119<br />
<br />
77<br />
59<br />
116<br />
92<br />
96<br />
<br />
147<br />
131<br />
103<br />
99<br />
81<br />
65<br />
<br />
79<br />
73<br />
84<br />
84<br />
71<br />
59<br />
<br />
83<br />
XSLT 140<br />
14<br />
79<br />
7<br />
<br />
79<br />
<br />
117<br />
85<br />
7<br />
89<br />
XSLT 139<br />
62<br />
71<br />
XSLT 138<br />
XSLT <br />
138<br />
XSLT 137<br />
99<br />
69<br />
87<br />
69<br />
<br />
143<br />
81<br />
118<br />
73<br />
<br />
94<br />
106<br />
73<br />
<br />
146<br />
<br />
9<br />
93<br />
<br />
144<br />
61<br />
61<br />
118<br />
83<br />
<br />
73<br />
<br />
59<br />
27<br />
C 29<br />
158<br />
<br />
147<br />
<br />
76<br />
76<br />
34<br />
92<br />
94
76<br />
77<br />
72<br />
71<br />
72<br />
16<br />
22
ABC<br />
<strong>PDFlib</strong> GmbH<br />
Franziska-Bilek-Weg 9<br />
80339 München, Germany<br />
www.pdflib.com<br />
+49 • 89 • 452 33 84-0<br />
fax +49 • 89 • 452 33 84-99<br />
PDF <br />
tech.groups.yahoo.com/group/pdflib <br />
<br />
sales@pdflib.com<br />
<br />
support@pdflib.com