17.05.2014 Views

PDFlib Text Extraction Toolkit(TET)マニュアル

PDFlib Text Extraction Toolkit(TET)マニュアル

PDFlib Text Extraction Toolkit(TET)マニュアル

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ABC<br />

<strong>Text</strong> <strong>Extraction</strong> Toolkit (TET)<br />

Version 4.0<br />

PDF


Copyright © 1997-2010 <strong>PDFlib</strong> GmbH. All rights reserved.<br />

Protected by European and U.S. patents.<br />

<strong>PDFlib</strong> GmbH<br />

Franziska-Bilek-Weg 9, 80339 München, Germany<br />

www.pdflib.com<br />

+49 • 89 • 452 33 84-0<br />

FAX +49 • 89 • 452 33 84-99<br />

<strong>PDFlib</strong> tech.groups.yahoo.com/group/pdflib <br />

<br />

sales@pdflib.com<br />

<strong>PDFlib</strong> support@pdflib.com <br />

<br />

<br />

<strong>PDFlib</strong> GmbH <strong>PDFlib</strong> GmbH <br />

<br />

<br />

<br />

<strong>PDFlib</strong> <strong>PDFlib</strong> <strong>PDFlib</strong> GmbH <strong>PDFlib</strong> <strong>PDFlib</strong> <br />

<br />

Adobe Acrobat PostScript XMP Adobe Systems Inc. AIX IBM OS/390 WebSphere <br />

iSeries zSeries International Business Machines Corporation ActiveX Microsoft Windows <br />

OpenType Windows Microsoft Corporation Apple Macintosh TrueType Apple Computer,<br />

Inc. Unicode Unicode Unicode, Inc. Unix The Open Group <br />

Java Solaris Sun Microsystems, Inc. HKS the HKS brand association: Hostmann-Steinberg,<br />

K+E Printing Inks, Schmincke <br />

<br />

TET <br />

Zlib Copyright © 1995-2002 Jean-loup Gailly and Mark Adler<br />

TIFFlib Copyright © 1988-1997 Sam Leffler, Copyright © 1991-1997 Silicon Graphics, Inc.<br />

Eric Young Cryptographic Copyright © 1995-1998 Eric Young eay@cryptsoft.com<br />

Independent JPEG Group JPEG Copyright © 1991-1998, Thomas G. Lane<br />

Cryptographic Copyright © 1998-2002 The OpenSSL Project www.openssl.org)<br />

Expat XML Copyright © 1998, 1999, 2000 Thai Open Source Software Center Ltd<br />

ICU International Components for Unicode Copyright © 1995-2009 International Business Machines<br />

Corporation and others<br />

TET RSA Security, Inc. MD5


0 TET 7<br />

0.1 7<br />

0.2 TET 9<br />

1 13<br />

1.1 TET 13<br />

1.2 TET 15<br />

1.3 16<br />

1.4 TET 4.0 17<br />

2 TET 19<br />

2.1 19<br />

2.2 TET 22<br />

2.3 24<br />

24<br />

24<br />

25<br />

25<br />

3 TET 27<br />

3.1 27<br />

3.2 C 29<br />

3.3 C++ 31<br />

3.4 COM 33<br />

3.5 Java 34<br />

3.6 .NET 36<br />

3.7 Perl 37<br />

3.8 PHP 38<br />

3.9 Python 40<br />

3.10 REALbasic 41<br />

3.11 RPG 42


4 TET 45<br />

4.1 Adobe Acrobat TET Plugin 45<br />

4.2 Lucene TET 47<br />

4.3 Solr TET 50<br />

4.4 Oracle TET 51<br />

4.5 Microsoft TET PDF IFilter 54<br />

4.6 MediaWiki TET 57<br />

5 59<br />

5.1 PDF 59<br />

5.2 61<br />

5.3 65<br />

6 69<br />

6.1 PDF 69<br />

6.2 73<br />

6.3 79<br />

79<br />

79<br />

80<br />

6.4 81<br />

81<br />

81<br />

6.5 83<br />

6.6 87<br />

7 Unicode 91<br />

7.1 Unicode 91<br />

7.2 Unicode 94<br />

94<br />

94<br />

7.3 Unicode 96<br />

96<br />

99<br />

103<br />

7.4 105


7.5 Unicode 106<br />

8 113<br />

8.1 113<br />

8.2 115<br />

8.3 117<br />

8.4 118<br />

8.5 119<br />

8.6 121<br />

9 TET TETML 123<br />

9.1 TETML 123<br />

9.2 TETML 127<br />

9.3 TETML TETML 131<br />

9.4 TETML XSLT 134<br />

9.5 XSLT 137<br />

10 pCOS 141<br />

11 TET API 143<br />

11.1 143<br />

11.2 143<br />

11.3 146<br />

11.4 149<br />

11.5 150<br />

150<br />

152<br />

152<br />

154<br />

157<br />

158<br />

11.6 160<br />

11.7 167<br />

11.8 175<br />

11.9 179


11.10 TET TETML 183<br />

11.11 pCOS 186<br />

ATET 189<br />

B 191<br />

193


0TET<br />

0.1 <br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

resource/cmap <br />

<br />

<br />

<br />

<br />

> <br />

--searchpath <br />

> searchpath <br />

<br />

set_option("searchpath=/CMap/ / / ");<br />

<br />

searchpath TETRESOURCEFILE <br />

<br />

<br />

resource/glyphlst


0TET<br />

0.1 <br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

resource/cmap <br />

<br />

<br />

<br />

<br />

> <br />

--searchpath <br />

> searchpath <br />

<br />

set_option("searchpath=/CMap/ / / ");<br />

<br />

searchpath TETRESOURCEFILE <br />

<br />

<br />

resource/glyphlst


0.2 TET <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

licensekeys.txt <br />

<br />

<br />

# Licensing information for <strong>PDFlib</strong> GmbH products<br />

<strong>PDFlib</strong> license file 1.0<br />

TET 4.0 ... ...<br />

<br />

<br />

<br />

<br />

> licensekeys.txt <br />

<br />

<br />

> set_option( ) licensefile <br />

tet.set_option("licensefile", "/path/to/licensekeys.txt");<br />

> --tetopt licensefile <br />

<br />

tet --tetopt "licensefile /path/to/your/licensekeys.txt" ...<br />

<br />

tet --tetopt "licensefile {/path/to/your/license file.txt}" ...<br />

> <br />

<br />

<br />

export PDFLIBLICENSEFILE="/path/to/licensekeys.txt"<br />

<br />

QSTRUP <br />

<br />

ADDENVVAR ENVVAR(PDFLIBLICENSEFILE) VALUE() LEVEL(*SYS)


HKLM\SOFTWARE\<strong>PDFlib</strong>\PDFLIBLICENSEFILE<br />

<br />

HKLM\SOFTWARE\<strong>PDFlib</strong>\TET4\license<br />

HKLM\SOFTWARE\<strong>PDFlib</strong>\TET4\4.0\license<br />

<br />

<br />

<br />

<br />

<br />

<br />

regedit <br />

... <br />

<br />

%systemroot%\syswow64\regedit<br />

<br />

<br />

<br />

<br />

<br />

/<strong>PDFlib</strong>/TET/4.0<br />

/<strong>PDFlib</strong>/TET<br />

/<strong>PDFlib</strong><br />

/usr/local <br />

<br />

<br />

<br />

<br />

<br />

tet --tetopt "license ... ..." ......<br />

<br />

<br />

<br />

> <br />

oTET.set_option "license=... ..."<br />

> <br />

TET_set_option(tet, "license=... ...");


tet.set_option("license=... ...");<br />

> <br />

tet->set_option("license=... ...");<br />

> <br />

d licensekey s 20<br />

d licenseval s 50<br />

c eval licenseopt='license=... ...'+x'00'<br />

c callp TET_set_option(TET:licenseopt:0)<br />

license <br />

TET_new( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<strong>PDFlib</strong> license file 2.0<br />

# Licensing information for <strong>PDFlib</strong> GmbH products<br />

TET 4.0 ... ... ...1...<br />

TET 4.0 ... ... ...2...<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

ü<br />

www.pdflib.com<br />

• • <br />

• • <br />

sales@pdflib.com<br />

support@pdflib.com


tet.set_option("license=... ...");<br />

> <br />

tet->set_option("license=... ...");<br />

> <br />

d licensekey s 20<br />

d licenseval s 50<br />

c eval licenseopt='license=... ...'+x'00'<br />

c callp TET_set_option(TET:licenseopt:0)<br />

license <br />

TET_new( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<strong>PDFlib</strong> license file 2.0<br />

# Licensing information for <strong>PDFlib</strong> GmbH products<br />

TET 4.0 ... ... ...1...<br />

TET 4.0 ... ... ...2...<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

ü<br />

www.pdflib.com<br />

• • <br />

• • <br />

sales@pdflib.com<br />

support@pdflib.com


1 <br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

> <br />

> <br />

> <br />

> <br />

> <br />

> <br />

<br />

<br />

1.1 TET <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

> <br />

<br />

> <br />

<br />

> <br />

<br />

>


<br />

<br />

> <br />

<br />

> <br />

> <br />

<br />

<br />

> <br />

> <br />

> <br />

> <br />

<br />

> <br />

<br />

> <br />

<br />

<br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

>


> <br />

> <br />

> <br />

> <br />

> <br />

<br />

<strong>PDFlib</strong> Comprehensive Object System <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

> <br />

<br />

<br />

<br />

<br />

> <br />

<br />

> <br />

<br />

> <br />

<br />

1.2 TET <br />

<br />

<br />

<br />

<br />

<br />

<br />

>


<br />

<br />

> <br />

<br />

> <br />

<br />

> <br />

<br />

1.3 <br />

<br />

<br />

<br />

<br />

<br />

> extractor <br />

<br />

> image_resources <br />

<br />

> dumper <br />

<br />

> fontfilter <br />

<br />

> glyphinfo dropcap <br />

shadow hyphenation <br />

> tetml <br />

<br />

> get_attachments <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> concordance.xsl <br />

> fontfilter.xsl


fontfinder.xsl <br />

<br />

> fontstat.xsl <br />

> index.xsl <br />

> metadata.xsl <br />

<br />

> solr.xsl <br />

> table.xsl <br />

> tetml2html.xsl <br />

> textonly.xsl <br />

<br />

<br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

> <br />

<br />

<br />

> <br />

> <br />

<br />

www.pdflib.com/tet-cookbook<br />

<br />

<br />

www.pdflib.com/pcos-cookbook<br />

<br />

<br />

1.4 TET 4.0 <br />

<br />

> <br />

> <br />

> <br />

<br />

> <br />

> <br />

> <br />

> <br />

>


<br />

<br />

>


2TET<br />

2.1 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

tet [] ...<br />

<br />

--docopt --tetopt --imageopt --pageopt <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

-- <br />

<br />

@filename 1<br />

--docopt<br />

--firstpage<br />

-f<br />

<br />

<br />

last<br />

<br />

<br />

<br />

<br />

<br />

open_document( ) <br />

tetml filename <br />

last <br />

last-1 <br />

--format utf8 | utf16 utf8 <br />

utf8 <br />

utf16 <br />

<br />

<br />

--help, -?


--image 2<br />

-i<br />

--imageloop <br />

<br />

--imageloop pageresource --image <br />

--tetml resource page <br />

page <br />

<br />

<br />

< >_p< >_< >.[tif|jpg|jpx]<br />

resource <br />

<br />

--firstpage --lastpage <br />

<br />

<br />

< >_I< ID>.[tif|jpg|jpx]<br />

I< ID> Image/@id <br />

--imageopt<br />

--lastpage<br />

-l<br />

--outfile<br />

-o<br />

--pageopt<br />

--password,<br />

-p<br />

--searchpath 1<br />

-s<br />

--targetdir<br />

-t<br />

--tetml<br />

-m<br />

<br />

<br />

last<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

glyphword<br />

wordplusline<br />

page<br />

write_image_file( ) <br />

<br />

last <br />

last-1 last<br />

-<br />

<br />

<br />

.pdf .PDF .txt .tetml <br />

<br />

open_page( ) process_page( )<br />

<br />

<br />

granularity page <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

--text <br />

<br />

<br />

<br />

glyph <br />

word <br />

wordplus <br />

ine <br />

page


--tetopt<br />

--text 2<br />

--verbose<br />

-v<br />

--version, -V<br />

<br />

<br />

0123<br />

set_option( ) <br />

outputformat --format <br />

<br />

--tetml <br />

<br />

<br />

0 <br />

1 <br />

2 <br />

3 <br />

<br />

<br />

--image --text --tetml


2.2 TET <br />

<br />

> searchpath <br />

> <br />

<br />

> <br />

<br />

> <br />

--password <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

" <br />

*.pdf <br />

.pdf <br />

*.pdf *.PDF <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

@filename <br />

<br />

<br />

> <br />

<br />

> " <br />

> <br />

> <br />

\" <br />

> \\ <br />

<br />

@filename


> <br />

>


2.3 <br />

<br />

<br />

<br />

2.3.1 <br />

file.pdf file.txt <br />

tet file.pdf<br />

<br />

tet --firstpage 2 --lastpage last-1 file.pdf<br />

tet -f 2 -l last-1 file.pdf<br />

<br />

<br />

tet --searchpath /usr/local/cmaps file.pdf<br />

tet -s /usr/local/cmaps file.pdf<br />

file.utf16 <br />

tet --format utf16 --outfile file.utf16 file.pdf<br />

tet --format utf16 -o file.utf16 file.pdf<br />

*.txt <br />

<br />

tet --targetdir out in/*.pdf<br />

tet -t out in/*.pdf<br />

<br />

<br />

tet --pageopt "includebox={{0 0 200 200}}" file.pdf<br />

<br />

options <br />

<br />

tet @options *.pdf<br />

2.3.2 <br />

file.pdf out file*.tif/<br />

file*.jpg <br />

tet --targetdir out --image file.pdf<br />

tet -t out -i file.pdf<br />

file.pdf out file*.tif/<br />

file*.jpg


tet --targetdir out --image --imageloop resource file.pdf<br />

tet -t out -i --imageloop resource file.pdf<br />

file.pdf <br />

<br />

tet --targetdir out --image --pageopt "imageanalysis={merge={disable}}" file.pdf<br />

tet -t out -i --pageopt "imageanalysis={merge={disable}}" file.pdf<br />

2.3.3 TETML <br />

file.pdf file.tetml <br />

<br />

tet --tetml word file.pdf<br />

tet -m word file.pdf<br />

Options <br />

<br />

tet --docopt "tetml={elements={options=false}}" --tetml word file.pdf<br />

file.tetml <br />

<br />

tet --tetml word --pageopt "tetml={glyphdetails={all}}" file.pdf<br />

tet -m word --pageopt "tetml={glyphdetails={all}}" file.pdf<br />

<br />

tet --image --tetml word file.pdf<br />

tet -i -m word file.pdf<br />

<br />

tet --tetml word --pageopt "topdown={output}" file.pdf<br />

tet -m word --pageopt "topdown={output}" file.pdf<br />

2.3.4 <br />

checkglyphlists <br />

<br />

tet --docopt checkglyphlists file.pdf<br />

<br />

<br />

tet --docopt "fold={{[:blank:] U+0020}}" file.pdf<br />

<br />

tet --pageopt "contentanalysis={punctuationbreaks=false}" file.pdf


3TET <br />

<br />

<br />

<br />

<br />

3.1 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

> <br />

> <br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

delete( ) get_apiname( ) get_errnum( ) get_errmsg( )<br />

<br />

<br />

> <br />

> <br />

> <br />

open_document( )<br />

open_page( )


get_errnum( ) get_errmsg( ) get_apiname( )


3.2 C <br />

<br />

TET_TRY( ) <br />

TET_CATCH( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

> TET_TRY( ) TET_CATCH( ) <br />

> TET_new( ) <br />

TET_new( ) <br />

<br />

> TET_delete( ) <br />

<br />

> <br />

<br />

<br />

<br />

<br />

volatile volatile <br />

<br />

<br />

> <br />

TET_CATCH( ) TET_EXIT_TRY( )<br />

<br />

> <br />

<br />

<br />

<br />

<br />

volatile int pageno;<br />

...<br />

if ((tet = TET_new()) == (TET *) 0)<br />

{<br />

printf("\n");<br />

return(2);<br />

}<br />

TET_TRY(tet)<br />

{<br />

for (pageno = 1; pageno


eturn -1;<br />

}<br />

}<br />

/* API */<br />

}<br />

TET_CATCH(tet)<br />

{<br />

printf(" %d %s() %d : %s\n",<br />

TET_get_errnum(tet), TET_get_apiname(tet), pageno, TET_get_errmsg(tet));<br />

}<br />

TET_delete(tet);<br />

<br />

<br />

length <br />

length <br />

length <br />

<br />

host <br />

ebcdic <br />

<br />

<br />

<br />

<br />

<br />

<br />

> \xEF\xBB\xBF <br />

> \x57\x8B\xAB <br />

<br />

> winansi <br />

ebcdic <br />

TET_utf16_to_utf8( )


3.3 C++ <br />

tetlib.h <br />

tetlib.h<br />

tetlib.h tet.hpp <br />

tet.cpp <br />

TET_<br />

<br />

<br />

<br />

<br />

<br />

> std::wstring <br />

<br />

wstring <br />

<br />

<br />

> <br />

basic_string <br />

<br />

> <br />

<br />

<br />

<br />

<br />

wstring wchar_t wstring <br />

<br />

<br />

L <br />

\u \U <br />

<br />

<br />

<br />

<br />

<br />

> pdflib <br />

<br />

<br />

using namespace pdflib;<br />

> wstring <br />

<br />

L <br />

const wstring pageoptlist = L"granularity=page";


TETTET::Exception get_errmsg( ) <br />

wstring wcerr <br />

<br />

> tet.cpp <br />

<br />

<br />

<br />

<br />

<br />

<br />

> tet.hpp wstring <br />

<br />

#define TETCPP_TET_WSTRING 0<br />

> tet.hpp pdflib <br />

#define TETCPP_USE_PDFLIB_NAMESPACE 0<br />

<br />

try/catch <br />

TET::Exception<br />

<br />

<br />

<br />

<br />

try {<br />

...TET...<br />

} catch (TET::Exception &ex) {<br />

wcerr


3.4 COM <br />

<br />

<br />

<br />

> <br />

...\TET 4.0 32-bit\bind\COM\bin\tet_com.dll <br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

tlbimp.exe <br />

<br />

tlbimp tet_com.dll /namespace:tet_com /out:Interop.tet_com.dll<br />

<br />

tet_com.dll <br />

<br />

using TET_com;<br />

...<br />

static TET_com.ITET tet;<br />

...<br />

tet = New TET();<br />

...


3.5 Java <br />

com.pdflib.TET <br />

<br />

<br />

<br />

<br />

> libtet_java.so libtet_java.jnilib<br />

<br />

<br />

> pdf_tet.dll <br />

<br />

tet.jar tet <br />

tet.jar <br />

CLASSPATH -classpath tet.jar<br />

<br />

<br />

java.library.path <br />

<br />

java -Djava.library.path=. extractor<br />

<br />

System.out.println(System.getProperty("java.library.path"));<br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

String(byte[] bytes)


enc SJIS UTF8 UTF-16 <br />

<br />

String(byte[] bytes, String enc)<br />

enc <br />

<br />

byte[] getBytes(String enc)<br />

<br />

<br />

<br />

<br />

> Javadoc <br />

<br />

> ... <br />

<br />

Java <br />

<br />

TETException <br />

<br />

<br />

TET tet = null;<br />

try {<br />

...TET...<br />

} catch (TETException e) {<br />

System.err.print("TET:\n");<br />

System.err.print("[" + e.get_errnum() + "] " + e.get_apiname() + ": " +<br />

e.get_errmsg() + "\n");<br />

} catch (Exception e) {<br />

System.err.println(e.getMessage());<br />

} finally {<br />

if (tet != null) {<br />

tet.delete(); /* TET */<br />

}<br />

}<br />

throws


3.6 .NET <br />

<br />

<br />

<br />

TET_dotnet.dll<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

bin <br />

bin TETlib_<br />

dotnet.dll <br />

C:\Inetpub\wwwroot\bin\TET_dotnet.dll<br />

C:\Inetpub\wwwroot\WebApplicationX\bin\TET_dotnet.dll<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Full <br />

<br />

<br />

<br />

<br />

<br />

<br />

TET_dotnet.TETException <br />

get_errnum get_errmsg get_apiname


3.7 Perl <br />

<br />

<br />

<br />

<br />

use <br />

<br />

<br />

<br />

<br />

<br />

<br />

tetlib_pl.pm <strong>PDFlib</strong>/TET.pm <br />

@INC -I <br />

<br />

perl -I/path/to/tet extractor.pl<br />

tetlib_pl.so tetlib_pl.bundletetlib_pl.pm <strong>PDFlib</strong>/TET.pm<br />

<br />

<br />

perl -e 'use Config; print $Config{sitearchexp};'<br />

auto/tetlib_pl <br />

<br />

/usr/lib/perl5/site_perl/5.10/i686-linux<br />

<br />

tetlib_pl.dll tetlib_pl.pm<strong>PDFlib</strong>/TET.pm<br />

<br />

<br />

perl -e "use Config; print $Config{sitearchexp};"<br />

<br />

C:\Program Files\Perl5.10\site\lib<br />

eval <br />

<br />

eval {<br />

...TET...<br />

};<br />

die ": $@" if $@;


3.8 PHP <br />

<br />

<br />

<br />

<br />

<br />

<br />

<strong>PDFlib</strong>-in-PHP-HowTo <br />

<br />

<br />

<br />

<br />

> php.ini <br />

extension=libtet_php.dll<br />

extension=libtet_php.so<br />

extension=libtet_php.sl<br />

; Windows <br />

; UnixMac OS X <br />

; HP-UX <br />

php.ini extension_dir <br />

<br />

<br />

<br />

<br />

<br />

tet <br />

<strong>PDFlib</strong> TET Support<br />

enabled<br />

<br />

<br />

> <br />

<br />

dl("libtet_php.dll");<br />

dl("libtet_php.so");<br />

dl("libtet_php.sl");<br />

# Windows <br />

# UnixMac OS X <br />

# HP-UX <br />

<br />

<br />

> <br />

<br />

> <br />

<br />

<br />

try/catch <br />

try {


...TET...<br />

} catch (TETException $e) {<br />

print "TET:\n";<br />

print "[" . $e->get_errnum() . "] " . $e->get_apiname() . ": "<br />

$e->get_errmsg() . "\n";<br />

}<br />

catch (Exception $e) {<br />

print $e;<br />

}


3.9 Python <br />

<br />

<br />

<br />

<br />

<br />

> tetlib_py.so<br />

> tetlib_py.pyd<br />

<br />

<br />

<br />

try:<br />

...TET...<br />

except TETException:<br />

print 'TET!'


3.10 REALbasic <br />

<br />

<br />

TET.rbx <br />

Plugins <br />

TET.framework /Library/Frameworks <br />

<br />

<br />

> <br />

> <br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

> TET <br />

> TETException RuntimeException <br />

<br />

<br />

<br />

<br />

<br />

<br />

TETException <br />

try/<br />

catch <br />

<br />

Exception err As TETException<br />

MsgBox("TE<strong>Text</strong>ractor: [" + _<br />

Str(err.get_errnum()) + "] " + err.get_apiname() + ": " + err.get_errmsg())


3.11 RPG <br />

<br />

/copy <br />

<br />

<br />

%ucs2 <br />

<br />

%char <br />

%CHAR %UCS2 <br />

<br />

<br />

<br />

<br />

length <br />

<br />

<br />

<br />

D <br />

<br />

d/copy QRPGLESRC,TETLIB<br />

<br />

<br />

d/copy tetsrclib/QRPGLESRC,TETLIB<br />

<br />

<br />

<br />

<br />

CRTBNDDIR BNDDIR(TETLIB/TETLIB) TEXT('TETlib Binding Directory')<br />

<br />

<br />

<br />

<br />

ADDBNDDIRE BNDDIR(TETLIB/TETLIB) OBJ((TETLIB/TETLIB *SRVPGM))<br />

CRTBNDRPG <br />

<br />

CRTBNDRPG PGM(TETLIB/EXTRACTOR) SRCFILE(TETLIB/QRPGLESRC) SRCMBR(*PGM) DFTACTGRP(*NO)<br />

BNDDIR(TETLIB/TETLIB)


monitor/on-error/endmon <br />

*PSSR <br />

<br />

<br />

c eval p=TET_new<br />

*<br />

c monitor<br />

*<br />

c callp TET_set_option(tet:globaloptlist)<br />

c eval doc=TET_open_document(tet:%ucs2(%trim(parm1)):docoptlist)<br />

:<br />

:<br />

* Error Handling<br />

c on-error<br />

* Do something with this error<br />

* don't forget to free the TET object<br />

c callp TET_delete(tet)<br />

c endmon


4TET<br />

<br />

<br />

<br />

4.1 Adobe Acrobat TET Plugin<br />

<br />

<br />

<br />

<br />

www.pdflib.com/products/tet-plugin


<br />

<br />

> <br />

> <br />

> <br />

> <br />

> <br />

> <br />

> <br />

<br />

<br />

<br />

> <br />

<br />

> <br />

<br />

> <br />

<br />

> <br />

> <br />

<br />

>


4.2 Lucene TET <br />

<br />

<br />

lucene.apache.org <br />

shrug <br />

<br />

<br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

lucene-core-2.4.0.jar <br />

<br />

> <br />

<br />

<br />

> /connectors/lucene cd<br />

> lucene-core-2.4.0.jar <br />

> TetReader.java <br />

<br />

<br />

<br />

<br />

PdfDocument.java <br />

<br />

> ant index /bind/data <br />

<br />

> ant search <br />

<br />

<br />

<br />

<br />

ant index <br />

devserver (1)$ ant index<br />

Buildfile: build.xml<br />

...<br />

index:<br />

[java] adding ../data/Whitepaper-XMP-metadata-in-<strong>PDFlib</strong>-products.pdf<br />

[java] adding ../data/Whitepaper-PDFA-with-<strong>PDFlib</strong>-products.pdf<br />

[java] adding ../data/FontReporter.pdf<br />

[java] adding ../data/TET-PDF-IFilter-datasheet.pdf


[java] adding ../data/<strong>PDFlib</strong>-datasheet.pdf<br />

[java] 1255 total milliseconds<br />

BUILD SUCCESSFUL<br />

Total time: 2 seconds<br />

devserver (1)$ ant search<br />

Buildfile: build.xml<br />

compile:<br />

search:<br />

[java] Enter query:<br />

<strong>PDFlib</strong><br />

[java] Searching for: pdflib<br />

[java] 5 total matching documents<br />

[java] 1. ../data/<strong>PDFlib</strong>-datasheet.pdf<br />

[java] Title: <strong>PDFlib</strong>, <strong>PDFlib</strong>+PDI, Personalization Server Datasheet<br />

[java] 2. ../data/Whitepaper-PDFA-with-<strong>PDFlib</strong>-products.pdf<br />

[java] Title: Whitepaper: Creating PDF/A with <strong>PDFlib</strong><br />

[java] 3. ../data/FontReporter.pdf<br />

[java] Title: <strong>PDFlib</strong> FontReporter 1.3 Manual<br />

[java] 4. ../data/TET-PDF-IFilter-datasheet.pdf<br />

[java] Title: <strong>PDFlib</strong> TET PDF IFilter Datasheet<br />

[java] 5. ../data/Whitepaper-XMP-metadata-in-<strong>PDFlib</strong>-products.pdf<br />

[java] Title: Whitepaper: XMP Metadata support in <strong>PDFlib</strong> Products<br />

[java] Press (q)uit or enter number to jump to a page.<br />

q<br />

[java] Enter query:<br />

title:FontReporter<br />

[java] Searching for: title:fontreporter<br />

[java] 1 total matching documents<br />

[java] 1. ../data/FontReporter.pdf<br />

[java] Title: <strong>PDFlib</strong> FontReporter 1.3 Manual<br />

[java] Press (q)uit or enter number to jump to a page.<br />

q<br />

[java] Enter query:<br />

BUILD SUCCESSFUL<br />

Total time: 57 seconds<br />

<strong>PDFlib</strong> <br />

title FontReporter <br />

q <br />

build.xml <br />

<br />

<br />

build.properties <br />

windows.properties unix.properties <br />

/tmp <br />

<br />

ant -Dlucene.jar=/tmp/lucene-core-2.4.0.jar index


lucene.apache.org/java/2_4_0/demo3.html <br />

Configuration <br />

configuration.jsp <br />

<br />

/<br />

bind/lucene/index <br />

<br />

<br />

> path <br />

> modified <br />

> contents <br />

> <br />

<br />

<br />

String objType = tet.pcos_get_string(tetHandle, "type:/Info/Subject");<br />

if (!objType.equals("null"))<br />

{<br />

doc.add(new Field("summary", tet.pcos_get_string(tetHandle,<br />

"/Info/Subject"), Field.Store.YES, Field.Index.ANALYZED));<br />

}<br />

> font <br />

PdfDocument.java


4.3 Solr TET <br />

<br />

<br />

<br />

lucene.apache.org/solr <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

shrug <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

solr.xsl glyph <br />

<br />

<br />

_s <br />

<br />

<br />

<br />

<br />

<br />

<strong>PDFlib</strong>-FontReporter-E.pdf<br />

<strong>PDFlib</strong> GmbH<br />

2008-07-08T15:05:39+00:00<br />

FrameMaker 7.0<br />

2008-07-08T15:05:39+00:00<br />

Acrobat Distiller 7.0.5 (Windows)<br />

<strong>PDFlib</strong> FontReporter<br />

<strong>PDFlib</strong> FontReporter 1.3 Manual<br />

<strong>PDFlib</strong><br />

GmbH<br />

Munchen<br />

...


4.4 Oracle TET <br />

<br />

<br />

<br />

<br />

shrug <br />

<br />

<br />

<br />

AL32UTF8 <br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

download.oracle.com/docs/cd/B28359_01/text.111/b28304/cdatadic.htm#sthref497<br />

> <br />

<br />

tetfilter.sh <br />

iconv uconv <br />

<br />

tetfilter.bat <br />

<br />

<br />

> <br />

<br />

connectors/Oracle/tetfilter.sh $ORACLE_HOME/ctx/bin <br />

connectors/Oracle/tetfilter.bat %ORACLE_HOME%\bin <br />

> tetfilter.shtetfilter.bat TETDIR <br />

<br />

> <br />

<br />

<br />

<br />

TETOPT="license=aaaaaaa-bbbbbb-cccccc-dddddd-eeeeee"


HR system <br />

<br />

SQL> GRANT CTXAPP TO HR;<br />

SQL> GRANT EXECUTE ON CTX_CLS TO HR;<br />

SQL> GRANT EXECUTE ON CTX_DDL TO HR;<br />

SQL> GRANT EXECUTE ON CTX_DOC TO HR;<br />

SQL> GRANT EXECUTE ON CTX_OUTPUT TO HR;<br />

SQL> GRANT EXECUTE ON CTX_QUERY TO HR;<br />

SQL> GRANT EXECUTE ON CTX_REPORT TO HR;<br />

SQL> GRANT EXECUTE ON CTX_THES TO HR;<br />

<br />

<br />

> <br />

/connectors/Oracle<br />

> tetsetup_a.sql tetpath <br />

<br />

> sqlplus pdftable_a <br />

tetindex_a <br />

tetsetup_a.sql <br />

<br />

SQL> @tetsetup_a.sql<br />

> <br />

SQL> select * from pdftable_a where CONTAINS(pdffile, 'Whitepaper', 1) > 0;<br />

> <br />

SQL> execute ctx_ddl.sync_index('tetindex_a')<br />

> <br />

<br />

SQL> @tetcleanup_a.sql<br />

<br />

<br />

<br />

tet_pdf_loader <br />

<br />

/Info/Title <br />

length:pages <br />

<br />

> <br />

/connectors/Oracle<br />

> sqlplus pdftable_b <br />

tetindex_b


SQL> @tetsetup_b.sql<br />

> <br />

<br />

<br />

ojdbc14.jar tet_pdf_loader.java <br />

ant <br />

<br />

<br />

localhost <br />

xe HR <br />

<br />

ant -Dtet.jdbc.connection=jdbc:oracle:thin:@localhost:1521:xe<br />

-Dtet.jdbc.user=HR -Dtet.jdbc.password=HR<br />

> <br />

SQL> execute ctx_ddl.sync_index('tetindex_b')<br />

> <br />

SQL> select * from pdftable_b where CONTAINS(pdffile, 'Whitepaper', 1) > 0;<br />

> <br />

<br />

SQL> @tetcleanup_b.sql


4.5 Microsoft TET PDF IFilter<br />

<br />

www.pdflib.com/products/tetpdf-ifilter<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

> <br />

> <br />

> <br />

> <br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

> <br />

> <br />

<br />

<br />

<br />

<br />

>


<br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

<br />

<br />

> <br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

> <br />

<br />

<br />

<br />

<br />

<br />

> <br />

Title Subject Author<br />

> <br />

<br />

>


4.6 MediaWiki TET <br />

<br />

<br />

<br />

www.mediawiki.org/wiki/MediaWiki<br />

shrug <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

<br />

> <br />

<br />

> /connectors/MediaWiki/PDFIndexer.php /extensions/PDFIndexer/PDFIndexer.php <br />

> /resource/<br />

cmap /extensions/<br />

PDFIndexer/resource/cmap <br />

> LocalSettings.php <br />

# PDF <br />

include("extensions/PDFIndexer/PDFIndexer.php");<br />

> /includes/DefaultSettings.php .pdf <br />

<br />

/**<br />

* <br />

* <br />

*/<br />

$wgFileExtensions = array( 'png', 'gif', 'jpg', 'jpeg', 'pdf' );<br />

<br />

PDFIndexer.php


> <br />

> <br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

<br />

<br />

DebugLogFile <br />

Image <br />

Advanced search <br />

Image <br />

LocalSettings.php <br />

<br />

$wgNamespacesToBeSearchedDefault = array(<br />

NS_MAIN<br />

=> true,<br />

NS_IMAGE<br />

=> true,<br />

}


5 <br />

5.1 PDF <br />

<br />

> <br />

> <br />

<br />

> <br />

<br />

dumper <br />

encrypt/master encrypt/user encrypt/nocopy <br />

pcosmode <br />

<br />

<br />

open_document( ) requiredmode <br />

nocopy <br />

<br />

<br />

<br />

if ((int) tet.pcos_get_number(doc, "pcosmode") == 2 ||<br />

((int) tet.pcos_get_number(doc, "pcosmode") == 1 &&<br />

(int) tet.pcos_get_number(doc, "encrypt/nocopy") == 0))<br />

{<br />

/* */<br />

}<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

> <br />

<br />

> <br />

>


open_document( ) shrug <br />

<br />

<br />

<br />

<br />

> open_document( ) shrug <br />

> open_document( ) <br />

<br />

> open_<br />

document( ) <br />

> nocopy=true<br />

<br />

> nocopy=true <br />

<br />

> shrug true <br />

> pcosmode <br />

<br />

<br />

<br />

<br />

int doc = tet.open_document(filename, "shrug");<br />

...<br />

if ((int) tet.pcos_get_number(doc, "shrug") == 1)<br />

{<br />

/* */<br />

}<br />

else<br />

{<br />

/* */<br />

}


5.2 <br />

<br />

<br />

<br />

<br />

set_option( ) <br />

<br />

Unix PostScript Resource <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

open_document( ) <br />

open_document( ) glyphmapping <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

set_option( ) <br />

<br />

<br />

<br />

<br />

> <br />

> \ <br />

<br />

> <br />

> <br />

> <br />

<br />

> <br />

PS-Resources-1.0


<br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

searchpath <br />

<br />

<br />

PS-Resources-1.0<br />

searchpath<br />

glyphlist<br />

codelist<br />

encoding<br />

.<br />

searchpath<br />

/usr/local/lib/cmaps<br />

/users/kurt/myfonts<br />

.<br />

glyphlist<br />

myglyphlist=/usr/lib/sample.gl<br />

.<br />

codelist<br />

mycodelist=/usr/lib/sample.cl<br />

.<br />

encoding<br />

myencoding=sample.enc<br />

.<br />

<br />

searchpath <br />

<br />

<br />

searchpath <br />

<br />

<br />

searchpath <br />

<br />

<br />

searchpath <br />

<br />

HKLM\SOFTWARE\<strong>PDFlib</strong>\TET4\4.0\SearchPath<br />

HKLM\SOFTWARE\<strong>PDFlib</strong>\TET4\SearchPath<br />

HKLM\SOFTWARE\<strong>PDFlib</strong>\SearchPath<br />

<br />

SearchPath


C:\Program Files\<strong>PDFlib</strong>\TET 4.0 32bit\resource<br />

C:\Program Files\<strong>PDFlib</strong>\TET 4.0 32bit\resource\cmap<br />

searchpath <br />

/<strong>PDFlib</strong>/TET/4.0/resource/icc<br />

/<strong>PDFlib</strong>/TET/4.0/resource/fonts<br />

/<strong>PDFlib</strong>/TET/4.0/resource/cmap<br />

/<strong>PDFlib</strong>/TET/4.0<br />

/<strong>PDFlib</strong>/TET<br />

/<strong>PDFlib</strong><br />

searchpath <br />

set_<br />

option( ) <br />

<br />

<br />

> TETRESOURCEFILE <br />

<br />

<br />

> TETRESOURCEFILE <br />

<br />

upr (MVS )<br />

/tet/4.0/tet.upr (iSeries )<br />

tet.upr (WindowsUnix )<br />

<br />

> <br />

HKLM\SOFTWARE\<strong>PDFlib</strong>\TET\4.0\resourcefile<br />

< <br />

>/tet.upr <br />

<br />

<br />

> <br />

resourcefile <br />

set_option("resourcefile=/ / /tet.upr");<br />

<br />

set_option( )<br />

<br />

<br />

<br />

set_option("glyphlist={myglyphnames=/usr/local/glyphnames.gl}");


set_option( ) <br />

<br />

<br />

<br />

<br />

<br />

\ <br />

> \x 0 9 A F a f \x0D<br />

> \nnn 0 7 \015 \000 <br />

> \\ <br />

>


5.3 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

open_page( ) <br />

<br />

> docstyle=searchengine<br />

<br />

<br />

> skipengines={image}<br />

<br />

<br />

> contentanalysis={merge=0}<br />

<br />

<br />

<br />

<br />

> contentanalysis={dehyphenate=false}<br />

<br />

<br />

> contentanalysis={shadowdetect=false}<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

open_page( ) granularity=word get_text( ) <br />

<br />

> get_text( ) <br />

open_page( ) granularity=page <br />

<br />

<br />

> <br />

open_page( ) contentanalysis={lineseparator=U+0020} granularity=page <br />

get_text( )


open_page( ) granularity=word <br />

> <br />

contentanalysis={punctuationbreaks=false} <br />

<br />

<br />

> get_char_info( ) <br />

<br />

get_text( ) <br />

> open_page( ) includebox <br />

excludebox <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

unknownchar=?<br />

<br />

fold={{[:Private_Use:] remove} {[U+FFFD] remove} default}<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> open_page( ) granularity=word <br />

> open_<br />

document( ) password <br />

<br />

shrug


get_char_info( ) <br />

get_text( ) <br />

<br />

glyph wordplus Glyph <br />

<br />

unknown="true"<br />

unknownchar <br />

unknown <br />

<br />

> <br />

<br />

ignoreinvisibletext=true<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

<br />

<br />

> get_char_info( ) uv<br />

char_info <br />

get_text( ) <br />

> open_page( )granularity=glyphword <br />

granularity=glyph


checkglyphlists=true


6 <br />

6.1 PDF <br />

<br />

<br />

<br />

get_text( ) get_image( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

> <br />

<br />

<br />

<br />

Plug-Ins <strong>PDFlib</strong> TET Plugin... TET Find <br />

<br />

> Acrobat <br />

PDF<br />

<br />

> extractor <br />

> /TET/Document/Pages/Page<br />

<br />

> ...<br />

> <br />

> <br />

<br />

<br />

<br />

> dumper <br />

> /TET/Document/DocInfo<br />

<br />

<br />

> ... <br />

<br />

>


dumper <br />

> /TET/Document/DocInfo/Custom<br />

<br />

<br />

> ... ...<br />

<br />

> <br />

> <br />

<br />

XMP <br />

<br />

> dumper <br />

> /TET/Document/Metadata


TouchUp <br />

... <br />

<br />

> <br />

> image_metadata<br />

> /TET/Document/Pages/Resources/Images/Image/Metadata<br />

<br />

<br />

<br />

> <br />

> ...<br />

> <br />

> fields<br />

> <br />

<br />

<br />

<br />

<br />

> <br />

> <br />

<br />

> <br />

<br />

<br />

> annotations<br />

> <br />

<br />

<br />

<br />

> <br />

> <br />

<br />

> <br />

<br />

<br />

<br />

> bookmarks<br />

>


> <br />

<br />

<br />

> get_attachments <br />

> /TET/Document/Attachments/Attachment/Document<br />

<br />

<br />

<br />

<br />

> <br />

<br />

> <br />

PDF <br />

> <br />

> get_attachments <br />

> /TET/Document/Attachments/Attachment/Document<br />

<br />

<br />

<br />

> <br />

... ... <br />

... <br />

<br />

<br />

<br />

> <br />

<br />

> <br />

> dumper <br />

> /TET/Document/@pdfa /TET/Document/@pdfe /TET/Document/<br />

@pdfx


6.2 <br />

<br />

<br />

CropBox <br />

MediaBox Rotate <br />

<br />

<br />

1 pt = 1 inch / 72 = 25.4 mm / 72 = 0.3528 mm<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

y<br />

<br />

<br />

<br />

topdown <br />

<br />

<br />

> <br />

<br />

> <br />

<br />

... <br />

<br />

<br />

<br />

<br />

<br />

open_page( ) clippingarea <br />

<br />

unlimited <br />

cropbox <br />

<br />

open_page( )includebox<br />

excludebox


includebox <br />

excludebox <br />

<br />

<br />

<br />

<br />

get_char_info( ) <br />

<br />

<br />

> uv <br />

<br />

<br />

uv <br />

<br />

uv <br />

uv <br />

uv <br />

> type


width<br />

(x, y)<br />

beta<br />

fontsize<br />

baseline<br />

fontsize<br />

(x, y)<br />

alpha<br />

width<br />

<br />

<br />

<br />

<br />

<br />

<br />

(x, y) width <br />

uv <br />

alpha (x, y) width <br />

° fontsize <br />

> unknown <br />

unknownchar <br />

<br />

unknownchar <br />

<br />

<br />

> <br />

<br />

> (x, y) <br />

<br />

<br />

(x, y) <br />

y topdown <br />

> width <br />

<br />

width<br />

width <br />

width <br />

<br />

width


font size<br />

capheight<br />

ascender<br />

baseline<br />

descender<br />

<br />

> alpha <br />

° ° <br />

alpha <br />

° alpha beta topdown <br />

<br />

> beta <br />

alpha <br />

° beta ° <br />

<br />

> fontid <br />

<br />

<br />

<br />

> fontsize <br />

<br />

> textrendering <br />

<br />

<br />

open_page( ) ignoreinvisibletext <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

T H x <br />

x <br />

f d j p


x <br />

<br />

<br />

<br />

/* */<br />

path = "fonts[" + i + "]/ascender";<br />

System.out.println("=" + p.pcos_get_number(doc, path));<br />

path = "fonts[" + i + "]/descender";<br />

System.out.println("=" + p.pcos_get_number(doc, path));<br />

get_<br />

char_info( ) get_char_info( ) <br />

fonts[] <br />

<br />

FontDescriptor <br />

<br />

<br />

get_char_info x, y width alpha <br />

<br />

<br />

x end = lrx = x + * cos(alpha)<br />

y end = lry = y + * sin(alpha)<br />

alpha <br />

x end = lrx = x + <br />

y end = lry = y<br />

<br />

beta <br />

<br />

urx = x + * cos(alpha) - * * sin(alpha)<br />

ury = y + * sin(alpha) + * * cos(alpha)<br />

topdown=true =-1 topdown=false =1 <br />

<br />

<br />

<br />

= * / 1000<br />

<br />

translate(x,y);<br />

rotate(alpha);<br />

skew(0, -beta);


if (abs(beta) > 90)<br />

scale(1 -1);<br />

<br />

urx = x + <br />

ury = y + * <br />

<br />

<br />

x end = x<br />

y end = y - <br />

<br />

<br />

ulx = x - /2 * cos(alpha)<br />

uly = y - /2 * sin(alpha)<br />

lrx = ulx + * cos(alpha) + * * sin(alpha)<br />

lry = uly + * sin(alpha) - * * cos(alpha)<br />

topdown=true =-1 topdown=false =1


6.3 <br />

6.3.1 CMap<br />

<br />

<br />

<br />

> Adobe-Japan1 6<br />

> Adobe-CNS1 5<br />

> Adobe-GB1 5<br />

> Adobe-Korea1 2<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

6.3.2 <br />

<br />

<br />

> <br />

<br />

<br />

> alpha ° <br />

alpha=0° ° <br />

> <br />

<br />

<br />

count = p.pcos_get_number(doc, "length:fonts");<br />

for (i=0; i < count; i++)<br />

{<br />

if (p.pcos_get_number(doc, "fonts[" + id + "]/vertical"))<br />

{<br />

/* */<br />

vertical = true;<br />

}<br />

}<br />

> <br />

<br />

<br />

decompose={vertical=_none}


6.3.3 narrow wide vertical <br />

<br />

<br />

<br />

wide narrow <br />

decompose <br />

<br />

decompose={wide=_none narrow=_none}<br />

small square vertical <br />

wide narrow <br />

<br />

<br />

<br />

decompose={none}<br />

<br />

<br />

decompose <br />

<br />

narrow <br />

<br />

small<br />

<br />

<br />

<br />

square <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

U+30F2<br />

<br />

U+002C<br />

<br />

<br />

U+30AD U+30ED<br />

<br />

<br />

U+FF66<br />

<br />

U+FE50<br />

<br />

U+3314


6.4 <br />

<br />

<br />

<br />

<br />

6.4.1 <br />

<br />

word <br />

<br />

<br />

contentanalysis={bidi=logical}<br />

<br />

contentanalysis={bidi=visual}<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

bidilevel <br />

<br />

contentanalysis={bidilevel=rtl}<br />

get_char_info( ) Glyph <br />

<br />

<br />

<br />

<br />

<br />

<br />

6.4.2 <br />

<br />

<br />

<br />

<br />

decompose


decompose <br />

<br />

decompose <br />

<br />

decompose=none<br />

<br />

decompose=<br />

{final=_all medial=_all initial=_all isolated=_all}<br />

<br />

decompose=<br />

{final=_none medial=_none initial=_none isolated=_none}<br />

<br />

U+FEB2<br />

<br />

U+FEB3<br />

<br />

U+FD0E<br />

<br />

U+FEB4<br />

<br />

U+FEB2<br />

<br />

U+FEB3<br />

<br />

U+FD0E<br />

<br />

U+FEB4<br />

<br />

U+0633<br />

<br />

U+0633<br />

<br />

U+0633 U+0631<br />

<br />

U+0633<br />

<br />

U+FEB2<br />

<br />

U+FEB3<br />

<br />

U+FD0E<br />

<br />

U+FEB4<br />

<br />

<br />

<br />

<br />

fold <br />

<br />

<br />

fold <br />

<br />

fold <br />

fold={{[U+0640] remove}} <br />

fold={default}<br />

<br />

fold={{[U+0640] preserve}}<br />

<br />

U+0640<br />

<br />

U+0640<br />

<br />

<br />

U+0640


6.5 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

> <br />

> <br />

> <br />

> <br />

<br />

<br />

open_page( ) granularity get_text( ) <br />

<br />

> granularity=glyph <br />

<br />

<br />

<br />

<br />

<br />

<br />

> granularity=word <br />

<br />

<br />

<br />

<br />

<br />

> granularity=line


granularity=page <br />

<br />

<br />

granularity=word TET_get_text( ) <br />

<br />

<br />

open_page( ) wordseparator lineseparator <br />

<br />

lineseparator==U+000A<br />

granularity=glyph <br />

<br />

<br />

glyph <br />

<br />

<br />

> <br />

<br />

<br />

<br />

> <br />

<br />

open_<br />

page( ) punctuationbreaks false <br />

<br />

contentanalysis={punctuationbreaks=false}<br />

<br />

<br />

<br />

<br />

punctuationbreaks=true <br />

<br />

punctuationbreaks=false


open_page( ) <br />

contentanalysis={dehyphenate=false}<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

open_page( ) <br />

<br />

contentanalysis={shadowdetect=false}


ä <br />

a ¨


6.6 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

open_page( ) <br />

docstyle=papers<br />

docstyle <br />

<br />

> book <br />

> business <br />

> fancy <br />

> forms <br />

> generic <br />

> magazines <br />

<br />

> papers <br />

> science <br />

<br />

> searchengine <br />

<br />

<br />

<br />

> spacegrid<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

structureanalysis <br />

layoutanalysis <br />

structureanalysis={list=true bullets={{fontname=ZapfDingbats}}}<br />

layoutanalysis = {layoutrowhint={full separation=preservecolumns}}<br />

layoutdetect=2<br />

layouteffort=high


docstyle=book docstyle=business docstyle=fancy<br />

docstyle=magazines docstyle=papers docstyle=science<br />

docstyle=spacegrid


<br />

<br />

<br />

<br />

5<br />

<br />

<br />

<br />

.<br />

<br />

<br />

<br />

REFERENCES<br />

<br />

<br />

<br />

<br />

<br />

...<br />


7 Unicode <br />

7.1 Unicode <br />

<br />

<br />

<br />

www.unicode.org<br />

<br />

<br />

> <br />

<br />

<br />

>


BMP <br />

<br />

<br />

> PUA <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> UTF-8 <br />

<br />

<br />

 à <br />

> UTF-16 <br />

<br />

<br />

<br />

<br />

<br />

> UTF-32 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

þÿ<br />

ÿþ<br />

þÿ<br />

ÿþ


decompose <br />

<br />

<br />

get_text( ) <br />

get_char_info( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

> lineseparator/wordseparator


7.2 Unicode <br />

<br />

<br />

<br />

granularity=word <br />

7.2.1 <br />

<br />

<br />

<br />

fontsizerange <br />

<br />

<br />

<br />

fontsizerange={10 50}<br />

textrendering=3 <br />

<br />

<br />

<br />

textrendering=3 <br />

<br />

<br />

get_char_info( ) TET_char_info textrendering <br />

<br />

Glyph/@textrendering <br />

<br />

ignoreinvisibletext=true<br />

7.2.2 word <br />

granularity=word line page <br />

<br />

<br />

TET_char_info <br />

attributes <br />

Glyph/@hyphenation <br />

<br />

contentanalysis={dehyphenate=false}<br />

<br />

get_char_info( ) Glyph


contentanalysis={keephyphenglyphs=true}<br />

get_char_info( ) TET_char_info attributes TET_ATTR_<br />

DEHYPHENATION_ARTIFACT <br />

Glyph/@dehyphenation artifact <br />

<br />

<br />

<br />

TET_char_info attributes <br />

Glyph/@shadow<br />

<br />

<br />

contentanalysis={shadowdetect=false}<br />

<br />

<br />

<br />

<br />

unknownchar <br />

<br />

fold <br />

<br />

<br />

fold={{[:Private_Use:] remove} {[U+FFFD] remove} default}<br />

<br />

TET_char_info unknown <br />

Glyph/@unknown


7.3 Unicode <br />

<br />

<br />

<br />

> fold <br />

<br />

<br />

> decompose <br />

<br />

<br />

> normalize <br />

<br />

<br />

7.3.1 Unicode <br />

<br />

<br />

> <br />

> <br />

> <br />

<br />

<br />

TET_char_info <br />

<br />

<br />

<br />

fold <br />

<br />

fold <br />

fold <br />

fold={ {[:blank:] U+0020} } fold={ {_dehyphenation remove} }<br />

!<br />

<br />

fold={ {[:blank:] U+0020 } {_dehyphenation remove} }<br />

<br />

fold open_document( )


fold <br />

<br />

<br />

<br />

<br />

fold={{[^U+0020-U+00FF] remove}}<br />

<br />

fold={{[:Alphabetic=No:] remove}}<br />

<br />

U+0104<br />

<br />

U+0037<br />

<br />

U+0041<br />

<br />

<br />

<br />

U+0041<br />

<br />

fold={{[^[:General_Category=Decimal_Number:]] remove}}<br />

-<br />

<br />

<br />

fold={{[:Private_Use:] remove} {[U+FFFD] remove} default}<br />

<br />

fold={{[:General_Category=Dash_Punctuation:] remove}}<br />

<br />

U+0037<br />

<br />

U+0041<br />

<br />

U+FFFF<br />

<br />

U+002D<br />

<br />

U+0037<br />

<br />

<br />

<br />

<br />

fold={{[:Bidi_Control:] remove}}<br />

U+200E<br />

<br />

<br />

<br />

fold={{[:blank:] U+0020}}<br />

<br />

U+00A0<br />

<br />

U+0020<br />

<br />

fold={{[:Dash:] U+002D}}<br />

<br />

fold={{[:Unassigned:] U+FFFD}}<br />

<br />

<br />

<br />

<br />

_dehyphenation <br />

fold={{_dehyphenation preserve}}<br />

<br />

fold={{[U+0640] preserve}}<br />

<br />

<br />

fold={ {[U+2018] U+0027} {[U+2019] U+0027} {[U+201C] U+0022} {[U+201D]<br />

U+0022}}<br />

<br />

U+2011<br />

<br />

U+03A2<br />

<br />

U+002D<br />

<br />

U+0640<br />

<br />

U+201C<br />

<br />

U+002D<br />

<br />

U+FFFD<br />

<br />

U+002D<br />

<br />

U+0640<br />

<br />

U+002D U+0022


granularity=glyph <br />

<br />

default <br />

<br />

fold={ {_dehyphenation preserve} default }<br />

fold <br />

default <br />

fold <br />

<br />

<br />

fold={{[:blank:] U+0020}}<br />

<br />

U+00A0<br />

<br />

U+0020<br />

<br />

unknownchar <br />

fold={{[:Private_Use:] unknownchar}}<br />

<br />

fold={{_dehyphenation remove}}<br />

<br />

fold={{[U+0640] remove}}<br />

<br />

<br />

<br />

fold={{[:Control:] remove} {[:Unassigned:] remove}}<br />

<br />

U+E001<br />

<br />

U+002D<br />

<br />

U+0640<br />

<br />

<br />

U+000C U+03A2<br />

<br />

U+FFFD


7.3.2 Unicode <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G729<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

U+00C4<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

U+0041 U+0308<br />

<br />

U+00C4 U+2261<br />

decompose <br />

<br />

<br />

U+00C4 U+2261<br />

<br />

canonical 1 <br />

<br />

<br />

U+00C0<br />

U+0041 U+0300<br />

<br />

U+F9F4<br />

<br />

U+2126<br />

<br />

U+3070<br />

<br />

U+FB2F<br />

<br />

U+6797<br />

<br />

U+03A9<br />

<br />

<br />

<br />

<br />

<br />

U+2126 U+306F U+2126 U+306F U+3099<br />

<br />

<br />

U+05D0 U+05B8<br />

1. www.unicode.org/Public/5.2.0/charts/


U+0633<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

decompose <br />

decompose <br />

<br />

<br />

<br />

<br />

decompose open_document( ) <br />

decompose <br />

<br />

<br />

decompose={none}<br />

<br />

U+FEB2<br />

<br />

U+FEB4<br />

<br />

U+FEB3<br />

<br />

U+00C4 U+2248<br />

<br />

<br />

decompose={wide=_none narrow=_none}<br />

<br />

decompose={canonical=_all}<br />

circle <br />

<br />

decompose={none circle=_all}<br />

<br />

<br />

decompose={circle=_all}


decompose <br />

<br />

U+00C4 U+2248<br />

<br />

circle <br />

<br />

<br />

U+3251<br />

U+0032 U+0031<br />

compat 1<br />

final<br />

font<br />

fraction 1<br />

initial<br />

isolated<br />

medial<br />

narrow<br />

nobreak<br />

none<br />

small<br />

square<br />

sub 1<br />

super 1<br />

vertical<br />

wide<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

decompose <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

U+FB01<br />

<br />

U+FEB2<br />

<br />

U+2102<br />

<br />

U+00BC<br />

<br />

U+FEB3<br />

<br />

U+FD0E<br />

<br />

U+FEB4<br />

<br />

U+FF66<br />

<br />

U+00A0<br />

<br />

<br />

U+FE50<br />

<br />

U+3314<br />

<br />

U+2081<br />

<br />

U+00AA<br />

<br />

U+2122<br />

<br />

U+FE37<br />

<br />

U+FFE1<br />

<br />

<br />

<br />

<br />

U+0066 U+0069<br />

<br />

U+0633<br />

<br />

U+0043<br />

<br />

<br />

<br />

U+0031 U+2044 U+0034<br />

<br />

U+0633<br />

<br />

<br />

U+0633 U+0631<br />

<br />

U+0633<br />

<br />

U+30F2<br />

<br />

U+0020<br />

<br />

U+002C<br />

<br />

<br />

U+30AD U+30ED<br />

<br />

U+0031<br />

<br />

U+0061<br />

<br />

<br />

U+0054 U+004D<br />

<br />

U+007B<br />

<br />

U+00A3


fraction <br />

_all <br />

<br />

<br />

granularity=glyph <br />

<br />

decompose <br />

<br />

canonical<br />

compat<br />

fraction<br />

sub<br />

super<br />

all others<br />

<br />

canonical={[U+0374 U+037E U+0387 U+1FBE U+1FEF U+1FFD U+2000 U+2001 U+2126 U+212A<br />

U+212B U+2329-U+232A]}<br />

<br />

_all <br />

U+00C4<br />

<br />

compat={[U+FB00-U+FB17]}<br />

<br />

_all <br />

U+0132<br />

<br />

fraction=_none<br />

<br />

<br />

<br />

<br />

<br />

<br />

U+0039 U+00BD<br />

U+0039 U+0031 U+2044 U+0032<br />

<br />

sub={[U+208A-U+208E]}<br />

super={[U+207A-U+207E]}<br />

<br />

fraction <br />

<br />

<br />

U+2122<br />

U+0054 U+004D<br />

<br />

circle=_all final=_all ... vertical=_all wide=_all


7.3.3 Unicode <br />

<br />

<br />

<br />

> <br />

> <br />

> <br />

> <br />

<br />

www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G21796 www.unicode.org/reports/tr15/<br />

<br />

normalize<br />

<br />

normalize=nfc<br />

decompose normalize <br />

normalize none <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

U+00C4<br />

<br />

U+00C4<br />

<br />

U+0041 U+0308<br />

<br />

U+00C4<br />

<br />

U+0041 U+0308<br />

<br />

U+0041 U+0308<br />

<br />

U+00C4<br />

<br />

U+0041 U+0308<br />

<br />

U+00C4<br />

<br />

U+0041 U+0308<br />

<br />

U+0308 U+0041<br />

<br />

U+0308 U+0041<br />

<br />

U+0308 U+0041<br />

<br />

U+0308 U+0041<br />

<br />

U+0308 U+0041<br />

<br />

U+FB01<br />

<br />

U+FB01<br />

<br />

U+FB01<br />

<br />

<br />

U+0066 U+0069<br />

<br />

U+0066 U+0069<br />

<br />

<br />

U+0033 U+2075<br />

<br />

<br />

U+0033 U+2075<br />

<br />

<br />

U+0033 U+2075<br />

<br />

<br />

U+0033 U+0035<br />

<br />

U+0033 U+0035<br />

<br />

U+212B<br />

<br />

U+00C5<br />

<br />

U+0041 U+030A<br />

<br />

U+00C5<br />

<br />

U+0041 U+030A<br />

<br />

U+2122<br />

<br />

U+2122<br />

<br />

U+2122<br />

<br />

<br />

U+0054 U+004D<br />

<br />

U+0054 U+004D<br />

<br />

U+2163<br />

<br />

U+2163<br />

<br />

U+2163<br />

<br />

<br />

U+0049 U+0056<br />

<br />

U+0049 U+0056


U+FB48<br />

<br />

<br />

U+05E8 U+05BC<br />

<br />

<br />

U+05E8 U+05BC<br />

<br />

<br />

U+05E8 U+05BC<br />

<br />

U+05E8 U+05BC<br />

<br />

U+AC00<br />

<br />

U+AC00<br />

<br />

<br />

U+1100 U+1161<br />

<br />

U+AC00<br />

<br />

U+1100 U+1161<br />

<br />

U+FB48 U+3062<br />

<br />

U+FB48 U+3062<br />

<br />

<br />

U+3061 U+3099<br />

<br />

U+FB48 U+3062<br />

<br />

U+3061 U+3099<br />

<br />

U+32C9<br />

<br />

U+32C9<br />

<br />

U+32C9<br />

<br />

<br />

<br />

U+0031 U+0030 U+6708<br />

<br />

U+0031 U+0030 U+6708


7.4 <br />

U+FFFF<br />

<br />

U+1DXXX <br />

U+20000 <br />

<br />

<br />

get_char_info( ) uv <br />

<br />

<br />

get_text( )


7.5 Unicode <br />

<br />

<br />

<br />

Unicode <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

fold <br />

<br />

<br />

<br />

<br />

fold={ {[:Private_Use:] remove} }<br />

get_char_info( ) unknown <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

open_document( ) glyphmapping <br />

<br />

<br />

<br />

<br />

> forceencoding <br />

WinAnsiEncoding MacRomanEncoding <br />

<br />

> codelist tounicodecmap <br />

codelist


glyphlist <br />

<br />

> glyphrule <br />

encodinghint <br />

<br />

> <br />

encodinghint <br />

glyphrule encoding <br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

> <br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

1. <strong>PDFlib</strong> FontReporter Plugin www.pdflib.com/products/fontreporter


> <br />

> <br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

>


x 0x <br />

<br />

.cl <br />

codelist <br />

mycodelist>.gl mycodelist> <br />

searchpath <br />

<br />

.cl <br />

<br />

name <br />

set_option("codelist {name name.cl}");<br />

<br />

<br />

<br />

a b c d e <br />

<br />

% GlobeLogosOneUnicode<br />

x61 x0054 x0068 x0065 x0020 % The<br />

x62 x0042 x006F % Bo<br />

x63 x0073 x0074 x006F x006E x0020 % ston<br />

x64 x0047 x006C x006F % Glo<br />

x65 x0062 x0065 % be<br />

open_document( ) <br />

GlobeLogosOne.cl <br />

<br />

glyphmapping {{fontname=GlobeLogosOne codelist=GlobeLogosOne}}


cmap <br />

cmap <br />

Warnock open_<br />

document( ) <br />

glyphmapping {{fontname=Warnock* tounicodecmap=warnock}}<br />

<br />

<br />

<br />

<br />

<br />

> <br />

> <br />

<br />

\% <br />

<br />

> <br />

<br />

> <br />

x 0x <br />

<br />

> <br />

<br />

.gl <br />

glyphlist <br />

myglyphlist>.gl myglyphlist> <br />

searchpath <br />

<br />

.gl <br />

<br />

name <br />

set_option("glyphlist {name name.gl}");<br />

<br />

<br />

% TeXUnicode<br />

precedesequal<br />

similarequal<br />

negationslash<br />

union<br />

prime<br />

0x227C<br />

0x2243<br />

0x2044<br />

0x222A<br />

0x2032<br />

1. partners.adobe.com/public/developer/en/acrobat/5411.ToUnicode.pdf


CMSY open_<br />

document( ) <br />

glyphmapping {{fontname=CMSY* glyphlist=tarski}}<br />

<br />

<br />

<br />

<br />

G00, G01, G02, <br />

<br />

<br />

<br />

open_document( ) encodinghint <br />

<br />

<br />

encodinghint= cp1250 <br />

<br />

<br />

open_document( ) glyphmapping fontname<br />

glyphrule <br />

> fontname <br />

> prefix <br />

> base <br />

> encoding <br />

T1, T2, T3, c00, c01, c02, , cFF <br />

00, , FF <br />

open_document( ) <br />

<br />

glyphmapping {{fontname=T* glyphrule={prefix=c base=hex encoding=winansi} }}<br />

<br />

<br />

<br />

<br />

fontoutline <br />

<br />

<br />

open_document( ) fontoutline <br />

<br />

WarnockPro <br />

<br />

TET_set_option("fontoutline {WarnockPro WarnockPro.otf}");


8 <br />

8.1 <br />

<br />

<br />

> .tif <br />

<br />

<br />

<br />

<br />

<br />

> .jpg DCTDecode <br />

<br />

<br />

<br />

> .jpx JPXDecode <br />

<br />

<br />

<br />

> write_image_file( ) <br />

filename <br />

<br />

> get_image_data( ) <br />

<br />

<br />

<br />

<br />

<br />

Image/@extractedAs<br />

<br />

<br />

int imageType = tet.write_image_file(doc, tet.imageid, "typeonly");<br />

/* */<br />

String imageFormat;<br />

switch (imageType) {<br />

case 10:<br />

imageFormat = "TIFF";<br />

break;<br />

case 20:<br />

imageFormat = "JPEG";<br />

break;


case 30:<br />

imageFormat = "JPEG2000";<br />

break;<br />

case 40:<br />

imageFormat = "RAW";<br />

break;<br />

default:<br />

System.err.println("write_image_file() "<br />

+ imageType + ", , : "<br />

+ tet.get_errmsg());<br />

}<br />

<br />

<br />

www.pdflib.com/knowledge-base/xmp-metadata/<br />

<br />

<br />

write_<br />

image_file( ) get_image_data( ) keepxmp <br />

false <br />

<br />

image_metadata


8.2 <br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

> <br />

> <br />

> <br />

<br />

<br />

images[ ]/mergetype


imageanalysis={merge={disable}}<br />

open_page( ) <br />

<br />

> images[ ] length:images <br />

<br />

<br />

length:images <br />

images[ ]/mergetype <br />

artificial <br />

> images[ ] <br />

images[ ] <br />

images[ ]/mergetype consumed <br />

<br />

<br />

<br />

> <br />

> <br />

<br />

<br />

image_count <br />

<br />

No of raw image resources before merging: 82<br />

No of placed images: 12<br />

No of images after merging (all types): 83<br />

normal images: 1<br />

artificial (merged) images: 1<br />

consumed images: 81<br />

No of relevant (normal or artificial) image resources: 2<br />

<br />

<br />

imageanalysis<br />

smallimages maxarea maxcount <br />

<br />

<br />

imageanalysis={smallimages={disable}}


8.3 <br />

<br />

> <br />

<br />

<br />

<br />

<br />

PlacedImage <br />

> <br />

<br />

<br />

<br />

Image <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

get_image_info( ) imageid <br />

PlacedImage/@image <br />

<br />

get_image_info( ) <br />

imageid <br />

Image/@id <br />

<br />

<br />

<br />

< >_p< >_< >.<br />

[tif|jpg|jpx]<br />

< >_I< ID>.<br />

[tif|jpg|jpx]


8.4 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

--imageloop page <br />

images_per_page images_in_memory <br />

images_per_page <br />

<br />

<br />

get_image_info( ) <br />

imageid pcos_get_number( ) <br />

<br />

write_image_file( ) get_image_data( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

--imageloop resource <br />

image_resources <br />

<br />

<br />

<br />

<br />

<br />

<br />

pcos_get_number( ) <br />

length:images <br />

mergetype <br />

<br />

<br />

write_image_file( ) get_image_data( )


8.5 <br />

get_image_info( ) <br />

image_info <br />

> x y <br />

<br />

y topdown<br />

<br />

> width height <br />

<br />

> alpha alpha<br />

alpha <br />

alpha alpha beta topdown <br />

<br />

> beta alpha <br />

beta beta <br />

beta beta<br />

beta abs(beta) <br />

<br />

> imageid <br />

write_image_file( ) get_image_data( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

while (tet.get_image_info(page) == 1) {<br />

String imagePath = "images[" + tet.imageid + "]";<br />

int width = (int) tet.pcos_get_number(doc, imagePath + "/Width");<br />

int height = (int) tet.pcos_get_number(doc, imagePath + "/Height");<br />

double xDpi = 72 * width / tet.width;<br />

height<br />

<br />

<br />

(x, y)<br />

alpha<br />

width


}<br />

double yDpi = 72 * height / tet.height;<br />

...<br />

<br />

<br />

determine_image_resolution


8.6 <br />

<br />

> <br />

> <br />

<br />

> <br />

<br />

<br />

<br />

<br />

> <br />

> <br />

<br />

> <br />

<br />

<br />

<br />

> <br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

> <br />

write_image_<br />

file( ) <br />

> <br />

<br />

> <br />

>


8.6 <br />

<br />

> <br />

> <br />

<br />

> <br />

<br />

<br />

<br />

<br />

> <br />

> <br />

<br />

> <br />

<br />

<br />

<br />

> <br />

<br />

> <br />

<br />

<br />

> <br />

<br />

<br />

> <br />

write_image_<br />

file( ) <br />

> <br />

<br />

> <br />

>


9TET TETML<br />

9.1 TETML <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

--tetml <br />

file.tetml <br />

tet --tetml word file.pdf<br />

<br />

<br />

<br />

<br />

tetml <br />

<br />

<br />

<br />

<br />

tetml <br />

<br />

<br />

www.unicode.org/reports/tr16 <br />

<br />

> <br />

> <br />

> <br />

> <br />

> <br />

> <br />

> <br />

>


<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<strong>PDFlib</strong> GmbH<br />

2008-07-08T15:05:39+00:00<br />

FrameMaker 7.0<br />

2008-09-30T23:15:19+02:00<br />

Acrobat Distiller 7.0.5 (Windows)<br />

<strong>PDFlib</strong> FontReporter<br />

<strong>PDFlib</strong> FontReporter 1.3 Manual<br />

<br />

<br />

<br />

<br />

...XMP...<br />

<br />

<br />

<br />

tetml={} <br />

<br />

<br />

tetml={} granularity=word <br />

<br />

<br />

<br />

<strong>PDFlib</strong><br />

<br />

<br />

<br />

GmbH<br />

<br />

<br />

<br />

Munchen<br />

<br />

<br />

......<br />


<br />

<br />

<br />

<br />

<br />

<br />

......<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Glyph <br />

<br />

<br />

<strong>PDFlib</strong><br />

<br />

P<br />

D<br />

F<br />

l<br />

i<br />

b<br />

<br />

<br />

<br />

GmbH<br />

<br />

G<br />

m<br />

b<br />

H<br />

<br />

<br />

<br />

Munchen<br />

<br />

M<br />

u<br />

n


c<br />

h<br />

e<br />

n<br />

<br />


9.2 TETML <br />

<br />

<br />

<br />

<br />

<br />

> glyph <br />

<br />

<br />

> word Box <br />

<br />

<br />

<br />

Line <br />

tetml <br />

> wordplus word <br />

topdown <br />

wordplus <br />

wordplus <br />

<br />

<br />

<br />

> line Line <br />

Para line <br />

<br />

<br />

> page <br />

<br />

<br />

<br />

<br />

<br />

<br />

glyph Glyph<br />

word<br />

wordplus<br />

Para Word<br />

Line<br />

Para Word<br />

Line<br />

Table Row Cell Box <br />

Table Row Cell Box Glyph<br />

line Para Line <br />

page Para Table Row Cell


--tetml <br />

wordplus <br />

tet --tetml wordplus file.pdf<br />

<br />

<br />

> process_page( ) granularity <br />

<br />

> granularity=glyph word <br />

tetml glyphdetails <br />

<br />

wordplus <br />

<br />

granularity=word tetml={ glyphdetails={all} }<br />

<br />

<br />

<br />

<br />

<br />

<br />

glyph granularity=glyph tetml={glyphdetails={all}}<br />

word granularity=word <br />

wordplus granularity=word tetml={glyphdetails={all}}<br />

Line word granularity=word tetml={elements={line}}<br />

Line <br />

wordplus<br />

granularity=word<br />

tetml={glyphdetails={all} elements={line}}<br />

line granularity=line <br />

page granularity=page <br />

<br />

<br />

<br />

<br />

--docopt open_<br />

document( ) <br />

tetml <br />

elements <br />

<br />

<br />

tetml={ elements={nodocxmp} }


engines <br />

<br />

engines={noimage}<br />

<br />

/TET/Document/Options <br />

tetml={ elements={nooptions} }<br />

<br />

--pageopt <br />

process_page( ) <br />

tetml Glyph <br />

Glyph <br />

<br />

tetml={ glyphdetails={font} }<br />

Line <br />

tetml={ glyphdetails={font} elements={line} }<br />

Glyph sub sup <br />

<br />

tetml={ glyphdetails={sub sup} }<br />

all Glyph <br />

<br />

tetml={ glyphdetails={all} }<br />

<br />

<br />

topdown={output}<br />

<br />

<br />

<br />

contentanalysis={nopunctuationbreaks}<br />

page <br />

<br />

contentanalysis={lineseparator=U+0020}<br />

<br />

/TET/Document/Pages/Page/Options <br />

<br />

tetml={ elements={nooptions} }


Exception <br />

<br />

Object 'objects[49]/Subtype' does not exist<br />

Exception


9.3 TETML TETML <br />

<br />

<br />

http://www.pdflib.com/XML/TET3/TET-3.0<br />

<br />

http://www.pdflib.com/XML/TET3/TET-3.0.xsd<br />

<br />

<br />

<br />

<br />

<br />

<br />

Attachment<br />

Attachments<br />

Box<br />

Cell<br />

ColorSpace<br />

ColorSpaces<br />

Content<br />

Creation<br />

DocInfo<br />

Document<br />

Encryption<br />

<br />

Document <br />

<br />

name level pagenumber<br />

Attachment <br />

llx lly Box urx ury <br />

Box <br />

<br />

<br />

llx lly urx ury ulx uly lrx lry <br />

<br />

colSpan<br />

<br />

alternate base components id name<br />

ColorSpace <br />

<br />

granularity dehyphenation dropcap font geometry <br />

shadow sub sup <br />

<br />

<br />

platform tetVersion date<br />

<br />

<br />

filename pageCount filesize linearized pdfVersion pdfa <br />

pdfe pdfx tagged<br />

<br />

keylength algorithm description masterpassword userpassword noprint <br />

nomodify nocopy noannots noassemble noforms noaccessible nohiresprint <br />

plainmetadata


Exception<br />

Font<br />

Fonts<br />

Glyph<br />

<br />

<br />

<br />

Exception <br />

errnum<br />

name <br />

fullname <br />

embedded fullname id type name vertical<br />

Font <br />

<br />

<br />

Glyph <br />

Box <br />

x y width alpha beta shadow dropcap font size <br />

sub sup textrendering unknown dehyphenation <br />

Image<br />

Images<br />

Line<br />

Metadata<br />

Options<br />

Page<br />

Pages<br />

Para<br />

PlacedImage<br />

Resources<br />

Row<br />

Table<br />

TET<br />

<strong>Text</strong><br />

Word<br />

<br />

bitsPerComponent colorspace extractedAs height id mask <br />

maskonly mergetype width<br />

Image <br />

Line Word <br />

<br />

<br />

<br />

<br />

number height width topdown <br />

<br />

<br />

<br />

alpha beta height image width x y <br />

<br />

<br />

<br />

<br />

4.0 3 <br />

<br />

<br />

<br />

topdown


Exception<br />

Font<br />

Fonts<br />

Glyph<br />

<br />

<br />

<br />

Exception <br />

errnum<br />

name <br />

fullname <br />

embedded fullname id type name vertical<br />

Font <br />

<br />

<br />

Glyph <br />

Box <br />

x y width alpha beta shadow dropcap font size <br />

sub sup textrendering unknown dehyphenation <br />

Image<br />

Images<br />

Line<br />

Metadata<br />

Options<br />

Page<br />

Pages<br />

Para<br />

PlacedImage<br />

Resources<br />

Row<br />

Table<br />

TET<br />

<strong>Text</strong><br />

Word<br />

<br />

bitsPerComponent colorspace extractedAs height id mask <br />

maskonly mergetype width<br />

Image <br />

Line Word <br />

<br />

<br />

<br />

<br />

number height width topdown <br />

<br />

<br />

<br />

alpha beta height image width x y <br />

<br />

<br />

<br />

<br />

4.0 3 <br />

<br />

<br />

<br />

topdown


9.4 TETML XSLT <br />

eXtensible Stylesheet Language Transformations <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

www.w3.org/TR/xslt <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

> <br />

> <br />

> <br />

<br />

> libxslt <br />

> <br />

<br />

<br />

<br />

FontReporter.tetml tetml2html.xsl <br />

toc-generate 0 <br />

FontReporter.html <br />

> www.saxonica.com


java -jar saxon9.jar -o FontReporter.html FontReporter.tetml tetml2html.xsl<br />

toc-generate=0<br />

> xmlsoft.org/<br />

XSLT <br />

<br />

xsltproc --output FontReporter.html --param toc-generate 0 tetml2html.xsl<br />

FontReporter.tetml<br />

> <br />

<br />

Xalan -o FontReporter.html -p toc-generate 0 FontReporter.tetml tetml2html.xsl<br />

> msxsl.exe<br />

<br />

<br />

www.microsoft.com/downloads/details.aspx?familyid=2FB55371-C94E-4373-B0E9-DB4816552E41<br />

<br />

msxsl.exe FontReporter.tetml tetml2html.xsl -o FontReporter.html toc-generate=0<br />

<br />

<br />

<br />

runxslt <br />

<br />

<br />

<br />

<br />

runxslt <br />

<br />

> javax.xml.transform <br />

runxslt.java ant <br />

build.xml<br />

<br />

> System.Xml.Xsl.XslTransform <br />

runxslt.ps1 <br />

<br />

> <br />

MSXML2.DOMDocument <br />

runxslt.vbs


<br />

> javax.xml.transform <br />

> www.php.net/<br />

manual/en/intro.xsl.php <br />

<br />

<br />

xml <br />

<br />

<br />

.xml <br />

<br />

<br />


9.5 XSLT <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> <br />

<br />

ant <br />

> <br />

<br />

<br />

<br />

<br />

> <br />

<br />

<br />

Attachments xsl:template <br />

<br />

> <br />

<br />

<br />

concordance.xsl word wordplus <br />

<br />

<br />

<br />

List of words in the document along with the number of occurrences:<br />

the 207<br />

font 107<br />

of 100<br />

a 92<br />

in 83<br />

and 75<br />

fonts 64<br />

PDF 60<br />

FontReporter 58<br />

...<br />

fontfilter.xsl glyph wordplus


<strong>Text</strong> containing font 'TheSansBold-Plain' with size greater than 10:<br />

[TheSansBold-Plain/24] Contents<br />

[TheSansBold-Plain/13.98] 1<br />

[TheSansBold-Plain/13.98] Installing<br />

[TheSansBold-Plain/13.98] <strong>PDFlib</strong><br />

[TheSansBold-Plain/13.98] FontReporter<br />

[TheSansBold-Plain/13.98] 2<br />

[TheSansBold-Plain/13.98] Working<br />

[TheSansBold-Plain/13.98] with<br />

[TheSansBold-Plain/13.98] FontReporter<br />

[TheSansBold-Plain/13.98] A<br />

[TheSansBold-Plain/13.98] Revision<br />

[TheSansBold-Plain/13.98] History<br />

[TheSansBold-Plain/24] 1<br />

[TheSansBold-Plain/24] Installing<br />

[TheSansBold-Plain/24] <strong>PDFlib</strong><br />

[TheSansBold-Plain/24] FontReporter<br />

...<br />

fontfinder.xsl glyph wordplus <br />

<br />

<br />

<br />

<br />

TheSansExtraBold-Plain used on:<br />

page 1:<br />

(111, 636), (165, 636), (219, 636), (292, 636), (301, 636), (178, 603), (221, 603), (226,<br />

603),<br />

(272, 603), (277, 603), (102, 375), (252, 375), (261, 375), (267, 375)<br />

TheSans-Plain used on:<br />

page 1:<br />

(102, 266), (119, 266), (179, 266), (208, 266), (296, 266), (346, 266), (367, 266)<br />

...<br />

fontstat.xsl glyph wordplus <br />

<br />

<br />

<br />

<br />

19894 total glyphs in the document; breakdown by font:<br />

68.71% ThesisAntiqua-Normal: 13669 glyphs<br />

22.89% TheSans-Italic: 4553 glyphs<br />

6.38% TheSansBold-Plain: 1269 glyphs<br />

0.9% TheSansMonoCondensed-Plain: 179 glyphs<br />

0.49% TheSansBold-Italic: 98 glyphs<br />

0.27% TheSansExtraBold-Plain: 54 glyphs<br />

0.21% TheSerif-Caps: 42 glyphs<br />

0.15% TheSans-Plain: 29 glyphs<br />

0.01% Gen_TheSans-Plain: 1 glyphs


index.xsl word wordplus <br />

<br />

<br />

<br />

Alphabetical list of words in the document along with their page number:<br />

A<br />

about 2 7 8<br />

access 8 12<br />

accessible 11<br />

achieving 9 12<br />

Acrobat 2 5 7 8 9 10 11 14 15 17<br />

ActiveX 2<br />

actual 9 12<br />

actually 11 12 14<br />

addition 9<br />

Additional 12<br />

additions 17<br />

address 9 12<br />

addressed 9<br />

addressing 9<br />

Adobe 2 5 8 12 14<br />

...<br />

metadata.xsl <br />

<br />

<br />

<br />

dc:creator = <strong>PDFlib</strong> GmbH<br />

xmp:CreatorTool = FrameMaker 7.0<br />

table.xsl word wordplus page <br />

<br />

<br />

<br />

<br />

tetml2html.xsl wordplus <br />

<br />

<br />

<br />

> H1 H2 <br />

> <br />

<br />

> <br />

<br />

> <br />

resource tet --image --tetml file.pdf


textonly.xsl <br />

<strong>Text</strong>


10 pCOS <br />

<strong>PDFlib</strong> Comprehensive Object Syntax <br />

<br />

<br />

<br />

<br />

www.pdflib.com/pcos-cookbook/


10 pCOS <br />

<strong>PDFlib</strong> Comprehensive Object Syntax <br />

<br />

<br />

<br />

<br />

www.pdflib.com/pcos-cookbook/


11 TET API <br />

11.1 <br />

<br />

<br />

optlist <br />

<br />

<br />

<br />

<br />

<br />

<br />

sprintf( ) <br />

<br />

AppendFormat( ) <br />

<br />

Append( ) <br />

<br />

AppendFormat( ) Append( ) <br />

<br />

11.2 <br />

<br />

<br />

> <br />

<br />

<br />

> {} <br />

> <br />

<br />

<br />

> <br />

> <br />

<br />

<br />

>


key=value<br />

key = value<br />

key value<br />

key1 = value1 key2 = value2<br />

<br />

<br />

<br />

<br />

<br />

key value2 <br />

key=value1 key=value2<br />

<br />

<br />

{} <br />

searchpath={/usr/lib/tet d:\tet}<br />

(2)<br />

<br />

} { <br />

<br />

<br />

fold={ {[:Private_Use:] remove} {[U+FFFD] remove} }<br />

(2)<br />

<br />

<br />

fold={ {[:Private_Use:] remove} }<br />

(1)<br />

<br />

<br />

<br />

<br />

<br />

<br />

contentanalysis <br />

punctuationbreaks <br />

contentanalysis={punctuationbreaks=false}<br />

glyphmapping <br />

<br />

glyphmapping={ {fontname=GlobeLogosOne codelist=GlobeLogosOne} }<br />

glyphmapping


glyphmapping { {fontname=CMSY* glyphlist=tarski} {fontname=ZEH* glyphlist=zeh}}<br />

<br />

fontname <br />

glyphmapping={ {fontname={Globe Logos One} codelist=GlobeLogosOne} }<br />

<br />

fonttypes={Type1 TrueType}<br />

<br />

default <br />

<br />

fold={ {[:Private_Use:] remove} {[U+FFFD] remove} default }<br />

<br />

includeboxes={{10 20 30 40}}<br />

<br />

<br />

key1 {value1}key2 {value2}<br />

!<br />

Unknown option 'value2' <br />

<br />

key{value}<br />

key={{value1}{value2}}<br />

!<br />

!<br />

<br />

key={open brace {}<br />

!<br />

Braces aren't balanced in option list 'key={open brace {}' <br />

<br />

<br />

key={closing brace \} and open brace \{}<br />

!<br />

<br />

<br />

filename={C:\path\name\}<br />

filename={C:\path\name\\}<br />

!<br />

!


11.3 <br />

<br />

<br />

{} <br />

password={ secret string }<br />

contents={length=3mm}<br />

(3)<br />

(1)<br />

{} \ <br />

<br />

password={weird\}string}<br />

()<br />

<br />

<br />

filename={C:\path\name\\}<br />

(1)<br />

<br />

{}<br />

<br />

<br />

<br />

<strong>PDFlib</strong> <br />

<br />

<br />

<br />

<br />

<br />

x X 0x 0X U+ <br />

xAD 0xAD U+00AD <br />

shy #xAD #173 <br />

<br />

unknownchar=?<br />

unknownchar=63<br />

unknownchar=x3F<br />

unknownchar=0x3F<br />

unknownchar=U+003F<br />

lineseparator={CRLF}<br />

()<br />

(10)<br />

(16)<br />

(16)<br />

(Unicode)<br />

()<br />

<br />

replacementchar=3<br />

(U+0033 THREEU+0003!)<br />

<br />

>


U+FB00-U+FB17 <br />

<br />

U+0048U+006C<br />

> <br />

<br />

\uhhhh U+hhhh<br />

U+hhhhh<br />

\x{hhhhhh}<br />

\Uhhhhhhhh<br />

\\<br />

> <br />

type <br />

www.unicode.org/Public/UNIDATA/PropertyAliases.txt value <br />

www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt <br />

[:type=value:]<br />

[:^type=value:]<br />

\p{type=value}<br />

\P{type=value}<br />

type= <br />

<br />

> <br />

[[:letter:] [:number:]]<br />

[[:letter:] & [U+0061-U+007A]]<br />

[[:letter:]-[U+0061-U+007A]]<br />

[^U+0061-U+007A] <br />

<br />

<br />

<br />

unicode.org/cldr/utility/list-unicodeset.jsp<br />

true false <br />

true name=false noname <br />

usehostfonts<br />

nousehostfonts<br />

(usehostfonts=true)<br />

(usehostfonts=false)<br />

<br />

<br />

clippingarea=cropbox


[U+0061-U+007A]<br />

[U+0640]<br />

[\x{0640}]<br />

[U+FB00-U+FB17]<br />

[^U+0061-U+007A]<br />

[:Lu:]<br />

[:UppercaseLetter:]<br />

[:L:]<br />

[:Letter:]<br />

[:General_Category=Dash_Punctuation:]<br />

[:Alphabetic=No:]<br />

[:Private_Use:]<br />

<br />

a z <br />

<br />

<br />

<br />

a z <br />

<br />

<br />

<br />

Dash_Punctuation <br />

<br />

<br />

<br />

<br />

<br />

-12345<br />

0<br />

0xFF<br />

<br />

<br />

size = -123.45<br />

size = -123,45<br />

size = -1.2345E2<br />

size = -1.2345e+2


11.4 <br />

x y <br />

<br />

<br />

includebox = {{0 0 500 100} {0 500 500 600}}


11.5 <br />

11.5.1 <br />

C++ void set_option(string optlist)<br />

C# Java void set_option(String optlist)<br />

Perl PHP set_option(string optlist)<br />

VB RB Sub set_option(optlist As String)<br />

C void TET_set_option(TET *tet, const char *optlist)<br />

<br />

optlist <br />

<br />

searchpath <br />

<br />

asciifile cmap codelist encoding <br />

filenamehandling fontoutline glyphlist license licensefile logging userlog <br />

outputformat resourcefile searchpath<br />

<br />

<br />

<br />

<br />

TET_set_option( ) <br />

<br />

asciifile<br />

cmap 1, 2<br />

codelist 1, 2<br />

encoding 1, 2<br />

filenamehandling<br />

<br />

<br />

<br />

true false<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

unicode legacy <br />

ascii <br />

basicebcdic <br />

basicebcdic_37<br />

<br />

honorlang utf8 UTF-8 cpXXXX CPXXXX iso8859-x ISO-<br />

8859-x <br />

legacy auto <br />

honorlang <br />

unicode


TET_set_option( ) <br />

<br />

fontoutline 1, 2<br />

glyphlist 1, 2<br />

hostfont 1, 2<br />

license<br />

licensefile<br />

logging 1<br />

userlog<br />

outputformat<br />

resourcefile<br />

searchpath 1<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

open_document*( ) <br />

<br />

<br />

TET_open_document*( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

TET_new( ) <br />

<br />

<br />

TET_get_text( ) <br />

ebcdicutf8 <br />

utf8<br />

utf8 <br />

<br />

ebcdicutf8 <br />

<br />

<br />

utf16 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

tet.upr upr


11.5.2 <br />

C<br />

TET *TET_new(void)<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Java void delete( )<br />

C# void Dispose( )<br />

C<br />

void TET_delete(TET *tet)<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Dispose( ) <br />

<br />

11.5.3 <strong>PDFlib</strong> PVF<br />

C++ void create_pvf(string filename, const void *data, size_t size, string optlist)<br />

C# Java void create_pvf(String filename, byte[] data, String optlist)<br />

Perl PHP create_pvf(string filename, string data, string optlist)<br />

VB RB Sub create_pvf(filename As String, data, optlist As String)<br />

C void TET_create_pvf(TET *tet,<br />

const char *filename, int len, const void *data, size_t size, const char *optlist)<br />

<br />

<br />

filename <br />

<br />

len filename <br />

len=0 <br />

data <br />

<br />

<br />

size


optlist<br />

copy<br />

<br />

<br />

<br />

TET_delete_pvf( ) <br />

TET_delete( ) <br />

<br />

<br />

<br />

filename <br />

filename <br />

<br />

copy TET_delete_<br />

pvf( ) <br />

<br />

TET_create_pvf( ) <br />

<br />

copy<br />

<br />

<br />

<br />

<br />

false copy <br />

<br />

C++ int delete_pvf(string filename)<br />

C# Java int delete_pvf(String filename)<br />

Perl PHP<br />

VB RB<br />

C<br />

int delete_pvf(string filename)<br />

Function delete_pvf(filename As String) As Long<br />

int TET_delete_pvf(TET *tet, const char *filename, int len)<br />

<br />

<br />

filename<br />

TET_create_pvf( ) <br />

len filename <br />

len=0 <br />

<br />

<br />

<br />

<br />

filename <br />

filename <br />

<br />

filename TET_delete( ) <br />

<br />

TET_create_pvf( ) copy <br />

copy


11.5.4 Unicode <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

printf( ) <br />

<br />

<br />

C++<br />

Perl PHP<br />

C<br />

string utf8_to_utf16(string utf8string, string ordering)<br />

string utf8_to_utf16(string utf8string, string ordering)<br />

const char *TET_utf8_to_utf16(TET *tet, const char *utf8string, const char *ordering, int<br />

*size)<br />

<br />

utf8string <br />

<br />

<br />

ordering <br />

> utf16 <br />

<br />

> utf16le \xFF\xFE<br />

<br />

> utf16be <br />

\xFE\xFF <br />

size <br />

<br />

<br />

<br />

C++<br />

Perl PHP<br />

C<br />

string utf16_to_utf8(string utf16string)<br />

string utf16_to_utf8(string utf16string)<br />

const char *TET_utf16_to_utf8(TET *tet, const char *utf16string, int len, int *size)<br />

<br />

utf16string


len<br />

utf16string <br />

size <br />

<br />

<br />

\xEF\xBB\xBF<br />

<br />

<br />

C++<br />

Perl PHP<br />

C<br />

string utf32_to_utf16(string utf32string, string ordering)<br />

string utf32_to_utf16(string utf32string, string ordering)<br />

const char *TET_utf32_to_utf16(TET *tet, const char *utf32string, int len, const char<br />

*ordering,<br />

int *size)<br />

<br />

utf32string <br />

<br />

len<br />

utf32string <br />

ordering <br />

> utf16 <br />

<br />

> utf16le <br />

\xFF\xFE <br />

> utf16be <br />

\xFE\xFF <br />

size <br />

<br />

<br />

<br />

<br />

<br />

C++<br />

Perl PHP<br />

C<br />

string utf8_to_utf32(string utf8string, string ordering)<br />

string utf8_to_utf32(string utf8string, string ordering)<br />

const char *TET_utf8_to_utf32(TET *tet, const char *utf8string, const char *ordering, int<br />

*size)<br />

<br />

utf8string <br />

<br />

<br />

ordering<br />

<br />

size


C++<br />

Perl PHP<br />

C<br />

string utf32_to_utf8(string utf32string)<br />

string utf32_to_utf8(string utf32string)<br />

const char *TET_utf32_to_utf8(TET *tet, const char *utf32string, int len, int *size)<br />

<br />

utf32string <br />

<br />

len<br />

utf32string <br />

size <br />

<br />

<br />

<br />

\xEF\xBB\xBF<br />

<br />

<br />

<br />

C++<br />

Perl PHP<br />

C<br />

string utf16_to_utf32(string utf16string, string ordering)<br />

string utf16_to_utf32(string utf16string, string ordering)<br />

const char *TET_utf16_to_utf32(TET *tet, const char *utf16string, int len, const char<br />

*ordering,<br />

int *size)<br />

<br />

utf16string <br />

<br />

len<br />

utf16string <br />

ordering<br />

<br />

size


11.5.5 <br />

C++ string get_apiname( )<br />

C# Java String get_apiname( )<br />

Perl PHP string get_apiname( )<br />

VB RB Function get_apiname( ) As String<br />

C const char *TET_get_apiname(TET *tet)<br />

<br />

<br />

<br />

<br />

C++ string get_errmsg( )<br />

C# Java String get_errmsg( )<br />

Perl PHP string get_errmsg( )<br />

VB RB Function get_errmsg( ) As String<br />

C const char *TET_get_errmsg(TET *tet)<br />

<br />

<br />

<br />

<br />

<br />

C++ int get_errnum( )<br />

C# Java int get_errnum( )<br />

Perl PHP long get_errnum( )<br />

VB RB Function get_errnum( ) As Long<br />

C int TET_get_errnum(TET *tet)<br />

<br />

<br />

<br />

<br />

<br />

C<br />

C<br />

C<br />

C<br />

TET_TRY(tet)<br />

TET_CATCH(tet)<br />

TET_RETHROW(tet)<br />

TET_EXIT_TRY(tet)<br />

<br />

TET_CATCH( ) TET_TRY( )


TET_RETHROW( ) <br />

<br />

<br />

<br />

11.5.6 <br />

<br />

<br />

TET_set_option( ) <br />

<br />

TET_set_option( ) <br />

<br />

logging<br />

userlog<br />

<br />

<br />

<br />

<br />

> TET_set_option( ) logging <br />

tet.set_option("logging", "filename=debug.log remove")<br />

> TETLOGGING <br />

<br />

TET_set_option( ) logging <br />

<br />

<br />

disable<br />

enable<br />

filename<br />

flush<br />

remove<br />

stringlimit<br />

<br />

disable <br />

false<br />

<br />

stdout stderr <br />

<br />

filename <br />

tet.log / /tmp <br />

true <br />

<br />

<br />

false false<br />

true <br />

false


TET_set_option( ) logging <br />

<br />

classes<br />

<br />

<br />

<br />

<br />

{api=1 warning=1} <br />

api api=2 <br />

<br />

api=3 <br />

<br />

filesearch <br />

<br />

resource <br />

<br />

user userlog <br />

warning <br />

warning=2 TET_get_errmsg( )


11.6 <br />

C++ int open_document(string filename, string optlist)<br />

C# Java int open_document(String filename, String optlist)<br />

Perl PHP long open_document(string filename, string optlist)<br />

VB RB Function open_document(filename As String, optlist As String) As Long<br />

C int TET_open_document(TET *tet, const char *filename, int len, const char *optlist)<br />

<br />

filename <br />

searchpath <br />

<br />

<br />

<br />

len = 0 <br />

<br />

<br />

len filename <br />

len=0 <br />

optlist <br />

checkglyphlists decompose encodinghint fold glyphmapping <br />

lineseparator normalize inmemory password repair requiredmode shrug tetml <br />

usehostfonts wordseparator zoneseparator<br />

<br />

<br />

<br />

TET_<br />

get_errmsg( ) <br />

<br />

<br />

<br />

<br />

password <br />

<br />

requiredmode <br />

shrug <br />

<br />

<br />

<br />

<br />

QSYS.lib <br />

QSYS.lib <br />

<br />

QSYS.lib


TET_open_document( ) TET_open_document_callback( ) <br />

<br />

checkglyphlists<br />

decompose<br />

encodinghint<br />

<br />

true condition=allfonts <br />

<br />

<br />

false<br />

<br />

<br />

<br />

<br />

normalize <br />

none <br />

normalize decompose=none <br />

<br />

<br />

none <br />

default <br />

<br />

<br />

canonical circle compat final font fraction initial isolated medial narrow nobreak <br />

small square sub super vertical wide<br />

<br />

<br />

<br />

<br />

<br />

_all <br />

<br />

_none <br />

<br />

<br />

none <br />

winansi


TET_open_document( ) TET_open_document_callback( ) <br />

<br />

fold<br />

glyphmapping<br />

keeppua<br />

lineseparator<br />

<br />

<br />

<br />

<br />

lineseparator wordseparator <br />

<br />

<br />

<br />

none <br />

<br />

default <br />

<br />

<br />

fold <br />

<br />

<br />

<br />

_dehyphenation<br />

<br />

TET_get_char_info( ) attributes <br />

@dehyphenation <br />

<br />

<br />

(Unichar) <br />

<br />

remove <br />

preserve <br />

unknownchar<br />

unknownchar <br />

<br />

<br />

<br />

<br />

* <br />

<br />

<br />

<br />

<br />

<br />

fold={{[:Private_Use:] preserve}} fold={{[:Private_Use:] unknownchar}} <br />

<br />

granularity=zone page


TET_open_document( ) TET_open_document_callback( ) <br />

<br />

normalize<br />

inmemory<br />

password<br />

repair<br />

requiredmode<br />

shrug<br />

<br />

<br />

none <br />

nfc <br />

nfd <br />

nfkc <br />

nfkd <br />

decompose <br />

normalize normalize none <br />

decompose=none normalize <br />

decompose <br />

TET_open_document( ) true <br />

<br />

false <br />

false<br />

<br />

<br />

<br />

<br />

<br />

shrug <br />

<br />

<br />

<br />

auto <br />

force <br />

auto <br />

none <br />

<br />

minimum <br />

restricted full <br />

<br />

<br />

requiredmode=minimum <br />

full<br />

true <br />

shrug <br />

false


TET_open_document( ) TET_open_document_callback( ) <br />

<br />

tetml<br />

<br />

TET_process_page( ) <br />

<br />

elements <br />

<br />

docinfo /TET/Document/DocInfo <br />

docxmp /TET/Document/Metadata <br />

options /TET/Document/Options /TET/Document/Pages/Page/Options<br />

encodingname<br />

<br />

UTF-8 <br />

_none <br />

<br />

UTF-8 encoding="UTF-8" <br />

<br />

<br />

<br />

<br />

filename filename <br />

TET_get_xml_data( ) <br />

<br />

<br />

unknownchar<br />

usehostfonts<br />

wordseparator<br />

<br />

<br />

<br />

unknownchar <br />

fold={{[:Private_Use:] unknownchar}} fold={{[:Private_<br />

Use:] remove}} <br />

true <br />

<br />

true<br />

granularity=line page <br />

<br />

<br />

<br />

TET_open_document( ) TET_open_document_callback( ) glyphmapping <br />

<br />

codelist<br />

fontname<br />

fonttypes<br />

<br />

<br />

<br />

<br />

<br />

<br />

*<br />

<br />

* Type1 MMType1 TrueType CIDFontType2 <br />

CIDFontType0 Type3*


TET_open_document( ) TET_open_document_callback( ) glyphmapping <br />

<br />

forceencoding<br />

forcettsymbolencoding<br />

globalglyphlist<br />

glyphlist<br />

glyphrule<br />

override<br />

ignoretounicodecmap<br />

tounicodecmap<br />

<br />

winansi macroman Custom <br />

<br />

<br />

MacRoman WinAnsi MacExpert <br />

<br />

<br />

<br />

<br />

auto <br />

auto -<br />

encodinghint <br />

<br />

encodinghint builtin<br />

<br />

builtin <br />

<br />

<br />

true <br />

false<br />

<br />

<br />

<br />

prefix <br />

base <br />

ascii <br />

1 <br />

auto <br />

<br />

dec <br />

hex <br />

encoding <br />

none <br />

true <br />

false<br />

glyphlist glyphrule true <br />

<br />

<br />

true<br />

<br />

<br />

* MSTT* <br />

<br />

winansi macroman macroman_apple macroman_euro <br />

ebcdic ebcdic_37 iso8859-X cpXXXX U+XXXX


C++<br />

C<br />

int open_document_callback(void *opaque, size_t filesize,<br />

size_t (*readproc)(void *opaque, void *buffer, size_t size),<br />

int (*seekproc)(void *opaque, long offset),<br />

string optlist)<br />

int TET_open_document_callback(TET *tet, void *opaque, size_t filesize,<br />

size_t (*readproc)(void *opaque, void *buffer, size_t size),<br />

int (*seekproc)(void *opaque, long offset),<br />

const char *optlist)<br />

<br />

opaque <br />

<br />

<br />

filesize<br />

<br />

readproc size buffer <br />

<br />

<br />

seekproc offset <br />

<br />

<br />

optlist<br />

<br />

<br />

<br />

<br />

TET_open_document( ) <br />

TET_open_document( ) <br />

<br />

C++ void close_document(int doc)<br />

C# Java void close_document(int doc)<br />

Perl PHP TET_close_document(resource tet, long doc)<br />

VB RB Sub close_document(doc As Long)<br />

C void TET_close_document(TET *tet, int doc)<br />

<br />

doc<br />

TET_open_document*( ) <br />

TET_delete( )


11.7 <br />

C++ int open_page(int doc, int pagenumber, string optlist)<br />

C# Java int open_page(int doc, int pagenumber, String optlist)<br />

Perl PHP long open_page(long pagenumber, string optlist)<br />

VB RB Function open_page(doc As Long, pagenumber As Long, optlist As String) As Long<br />

C int TET_open_page(TET *tet, int doc, int pagenumber, const char *optlist)<br />

<br />

doc<br />

TET_open_document*( ) <br />

pagenumber <br />

TET_pcos_get_number( ) length:pages <br />

optlist <br />

clippingarea contentanalysis docstyle excludebox fontsizerange <br />

granularity ignoreinvisibletext imageanalysis includebox layoutanalysis <br />

layouteffort skipengines structureanalysis topdown<br />

<br />

<br />

TET_get_errmsg( )


TET_open_page( ) TET_process_page( ) <br />

<br />

clippingarea<br />

docstyle<br />

excludebox<br />

granularity<br />

contentanalysis<br />

fontsizerange<br />

ignoreinvisibletext<br />

imageanalysis<br />

<br />

includebox <br />

cropbox <br />

mediabox <br />

cropbox <br />

bleedbox <br />

trimbox <br />

artbox <br />

unlimited <br />

granularity=glyph <br />

<br />

<br />

<br />

<br />

book <br />

business <br />

fancy <br />

forms <br />

generic <br />

magazines <br />

papers <br />

science <br />

searchengine<br />

<br />

<br />

spacegrid <br />

<br />

<br />

<br />

<br />

<br />

unlimited <br />

{ 0 unlimited }<br />

TET_get_text( ) glyph <br />

<br />

word <br />

glyph <br />

<br />

word <br />

<br />

line <br />

<br />

page <br />

<br />

true <br />

false


TET_open_page( ) TET_process_page( ) <br />

<br />

includebox<br />

layouteffort<br />

skipengines<br />

layoutanalysis<br />

structureanalysis<br />

topdown<br />

<br />

<br />

<br />

granularity=glyph <br />

<br />

<br />

<br />

none low medium high extra <br />

low<br />

<br />

<br />

<br />

<br />

text <br />

image <br />

granularity=glyph <br />

<br />

y <br />

<br />

<br />

input true <br />

false <br />

includebox excludebox<br />

output true <br />

false <br />

TET_char_info y alpha beta<br />

TET_image_info y alpha beta<br />

Glyph/@y Glyph/@alpha Glyph/@beta Box/@lly Box/@ury PlacedImage/<br />

@y PlacedImage/@alpha PlacedImage/@beta<br />

TET_open_page( ) TET_process_page( ) contentanalysis <br />

<br />

bidi<br />

bidilevel<br />

dehyphenate<br />

<br />

granularity=glyph <br />

<br />

logical <br />

visual <br />

<br />

logical <br />

<br />

auto <br />

auto <br />

ltr <br />

rtl <br />

<br />

true <br />

keephyphens <br />

true


TET_open_page( ) TET_process_page( ) contentanalysis <br />

<br />

dropcapsize<br />

dropcapratio<br />

includeboxorder<br />

keep<br />

hyphenglyphs<br />

lineseparator<br />

linespacing<br />

maxwords<br />

<br />

<br />

<br />

<br />

<br />

dropcapsize dropcapratio <br />

<br />

<br />

<br />

includebox <br />

<br />

0 <br />

<br />

<br />

<br />

1 <br />

<br />

<br />

<br />

<br />

<br />

<br />

2 <br />

<br />

<br />

<br />

<br />

true dehyphenate=true <br />

get_char_info( ) Glyph <br />

<br />

fold={{_dehyphenation remove} <br />

get_text( ) <br />

false<br />

<br />

small medium <br />

large medium<br />

unlimited <br />

<br />

<br />

<br />

<br />

unlimited


TET_open_page( ) TET_process_page( ) contentanalysis <br />

<br />

merge<br />

punctuation<br />

breaks<br />

superscript<br />

wordseparator<br />

<br />

<br />

0 <br />

<br />

1 <br />

<br />

<br />

2 <br />

<br />

<br />

<br />

keep <br />

split punctuationbreaks <br />

keep <br />

true <br />

true<br />

granularity=word true <br />

<br />

true<br />

<br />

0 <br />

1 <br />

2 <br />

<br />

TET_open_document*( ) <br />

TET_open_page( ) TET_process_page( ) layoutanalysis <br />

numericentities<br />

shadowdetect<br />

<br />

layoutastable<br />

layoutcolumnhint<br />

layoutdetect<br />

<br />

true <br />

<br />

false true<br />

<br />

multicolumn <br />

multicolumn<br />

<br />

none <br />

singlecolumn<br />

<br />

<br />

0 <br />

1 <br />

2 <br />

<br />

<br />

3


TET_open_page( ) TET_process_page( ) layoutanalysis <br />

<br />

mergetables<br />

splithint<br />

layoutrowhint<br />

standalonefontsize<br />

supertablecolumns<br />

tabledetect<br />

<br />

none <br />

full <br />

none <br />

separation <br />

<br />

preservecolumns<br />

<br />

<br />

<br />

thick <br />

<br />

<br />

<br />

<br />

thin <br />

<br />

<br />

layoutanalysis = {layoutrowhint={full separation=thick}}<br />

<br />

<br />

none <br />

down <br />

none <br />

up <br />

updown <br />

<br />

<br />

includebox includebox <br />

<br />

x <br />

<br />

y <br />

<br />

<br />

layoutastable=true <br />

<br />

<br />

<br />

<br />

0 <br />

1 <br />

2


TET_open_page( ) TET_process_page( ) imageanalysis <br />

<br />

smallimages<br />

merge<br />

<br />

<br />

<br />

disable true false<br />

maxarea <br />

<br />

maxcount <br />

<br />

<br />

<br />

<br />

<br />

disable true false<br />

gap <br />

<br />

TET_open_page( ) TET_process_page( ) structureanalysis <br />

<br />

bullets<br />

list<br />

paragraph<br />

table<br />

<br />

list=true <br />

<br />

bulletchars<br />

<br />

fontname <br />

<br />

fontname <br />

bulletchars <br />

<br />

<br />

bullets={{fontname=ZapfDingbats}}<br />

bullets={{bulletchars={U+2022}}<br />

bullets={{fontname=KozGoPro-Medium bulletchars={U+2460 U+2461 U+2462 U+2463 U+2464}}<br />

false false <br />

<br />

true false <br />

<br />

true false


C++ void close_page(int page)<br />

C# Java void close_page(int page)<br />

Perl PHP close_page(long page)<br />

VB RB Sub close_page(page As Long)<br />

C void TET_close_page(TET *tet, int page)<br />

<br />

page<br />

TET_open_page( ) <br />

<br />

TET_close_document( )


11.8 <br />

C++ string get_text(int page)<br />

C# Java String get_text(int page)<br />

Perl PHP string get_text(long page)<br />

VB RB Function get_text(page As Long) As String<br />

C const char *TET_get_text(TET *tet, int page, int *len)<br />

<br />

page<br />

TET_open_page( ) <br />

len <br />

outputformat=utf16 <br />

outputformat=utf8 <br />

<br />

<br />

<br />

TET_open_<br />

page( ) granularity granularity=glyph <br />

<br />

<br />

<br />

<br />

TET_get_<br />

errnum( ) <br />

TET_set_option( ) outputformat <br />

<br />

<br />

*len=0


C++ const TET_char_info *get_char_info(int page)<br />

C# Java int get_char_info(int page)<br />

Perl PHP object get_char_info(long page)<br />

VB RB Function get_char_info(int page) As Long<br />

C const TET_char_info *TET_get_char_info(TET *tet, int page)<br />

<br />

page<br />

TET_open_page( ) <br />

<br />

<br />

TET_get_glyph_info( ) <br />

TET_get_text( ) <br />

<br />

<br />

<br />

TET_get_text( ) <br />

<br />

<br />

M <br />

N N N>0 N <br />

M <br />

> granularity=glyph <br />

N=1 <br />

M=1 <br />

M>1 TET_get_char_info( ) <br />

<br />

> glyph <br />

<br />

N M <br />

N M <br />

<br />

<br />

glyph TET_get_text( ) <br />

<br />

<br />

<br />

<br />

<br />

TET_<br />

get_char_info( ) TET_close_page( ) <br />

<br />

<br />

TET_get_char_info( )


TET_get_text( ) <br />

<br />

TET_char_info <br />

<br />

TET_get_text( ) <br />

<br />

<br />

<br />

unknown false <br />

get_text( ) <br />

<br />

<br />

<br />

get_text( ) <br />

nil <br />

TET_char_info <br />

<br />

get_text( ) <br />

<br />

<br />

<br />

long <br />

TET_char_info <br />

<br />

<br />

<br />

uv<br />

type<br />

<br />

glyph <br />

granularity=glyph <br />

<br />

<br />

<br />

<br />

<br />

0 <br />

1 <br />

x y <br />

width uv <br />

<br />

10 <br />

11 <br />

12


TET_char_info <br />

<br />

<br />

<br />

attributes<br />

<br />

<br />

0 <br />

1 <br />

2 <br />

3 <br />

4 <br />

5 contentanalysis={keephyphenglyphs=true} <br />

<br />

6 <br />

unknown<br />

false <br />

unknownchar true <br />

x, y <br />

x y <br />

<br />

width<br />

alpha<br />

beta<br />

fontid<br />

fontsize<br />

textrendering<br />

<br />

<br />

<br />

<br />

alpha <br />

<br />

alpha <br />

<br />

beta <br />

abs(beta) <br />

fonts[ ] fontid <br />

<br />

<br />

<br />

<br />

<br />

0 <br />

1 <br />

2 <br />

3 <br />

4 <br />

5 <br />

6 <br />

7


11.9 <br />

C++ const TET_image_info *get_image_info(int page)<br />

C# Java int get_image_info(int page)<br />

Perl PHP object image_info TET_get_image_info(long page)<br />

VB RB Function get_image_info(int page) As Long<br />

C const TET_image_info *TET_get_image_info(TET *tet, int page)<br />

<br />

<br />

page<br />

TET_open_page( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

TET_<br />

get_image_info( ) TET_close_page( ) <br />

<br />

<br />

TET_get_image_info( ) <br />

<br />

<br />

TET_image_info<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

nil <br />

TET_image_info <br />

<br />

<br />

<br />

TET_image_info


long <br />

<br />

TET_image_info <br />

<br />

<br />

<br />

<br />

x, y <br />

width,<br />

height<br />

alpha<br />

beta<br />

imageid<br />

<br />

<br />

alpha <br />

alpha <br />

alpha beta<br />

beta <br />

beta abs(beta) <br />

<br />

images[ ] <br />

<br />

C++ int write_image_file(int doc, int imageid, string optlist)<br />

C# Java int write_image_file(int doc, int imageid, String optlist)<br />

Perl PHP long write_image_file(long doc, long imageid, string optlist)<br />

VB RB Function write_image_file(doc As Long, imageid As Long, optlist As String) As Long<br />

C int TET_write_image_file(TET *tet, int doc, int imageid, const char *optlist)<br />

<br />

doc<br />

TET_open_document*( ) <br />

imageid TET_get_image_info( ) <br />

imageid images <br />

length:images <br />

optlist <br />

compression filename keepxmp typeonly<br />

<br />

TET_get_<br />

errmsg( ) <br />

<br />

<br />

<br />

> <br />

> .tif <br />

> .jpg <br />

> .jpx <br />

> .raw


typeonly <br />

<br />

tetlib.h <br />

TET_write_image_file( ) TET_get_image_data( ) <br />

<br />

compression<br />

filename 1<br />

keepxmp<br />

typeonly 1<br />

<br />

auto <br />

auto <br />

none <br />

<br />

typeonly filename<br />

<br />

Image/@id attribute <br />

<br />

I<br />

imageid imageid <br />

true <br />

true<br />

<br />

TET_get_image_data( ) <br />

false<br />

TET_write_image_file( ) <br />

C++ const char *get_image_data(int doc, size_t *length, int imageid, string optlist)<br />

C# Java final byte[ ] get_image_data(int doc, int imageid, String optlist)<br />

Perl PHP string get_image_data(long doc, long imageid, string optlist)<br />

VB RB Function get_image_data(doc As Long, imageid As Long, optlist As String)<br />

C const char * TET_get_image_data(TET *tet, int doc, size_t *length, int imageid, const char *optlist)<br />

<br />

doc<br />

TET_open_document*( ) <br />

length <br />

<br />

imageid TET_get_image_info( ) <br />

imageid images <br />

length:images <br />

optlist <br />

compression keepxmp


TET_get_errmsg( )


11.10 TET TETML <br />

C++ int process_page(int doc, int pagenumber, string optlist)<br />

C# Java int process_page(int doc, int pagenumber, String optlist)<br />

Perl PHP long process_page(long doc, long pagenumber, string optlist)<br />

VB RB Function process_page(doc As Long, pagenumber As Long, optlist As String) As Int<br />

C int TET_process_page(TET *tet, int doc, int pagenumber, const char *optlist)<br />

<br />

doc<br />

TET_open_document*( ) <br />

pagenumber <br />

TET_pcos_get_number( ) length:pages <br />

trailer=true pagenumber <br />

optlist <br />

> pagenumber=0<br />

clippingarea contentanalysis excludebox fontsizerange granularity <br />

ignoreinvisibletext imageanalysis includebox layoutanalysis skipengines<br />

> tetml<br />

TET_process_page( ) <br />

<br />

tetml<br />

<br />

<br />

elements <br />

line granularity=word Para <br />

Word Line false<br />

glyphdetails<br />

granularity=glyph word Glyph <br />

<br />

false <br />

all <br />

dehyphenation<br />

dehyphenation <br />

<br />

dropcap dropcap <br />

<br />

geometry x y width alpha beta <br />

font font fontsize textrendering unknown <br />

sub sub <br />

sup sup <br />

trailer true <br />

<br />

<br />

pagenumber=0 <br />

trailer=true <br />

TET_process_page( ) false


Exception<br />

<br />

TET_open_document*( ) <br />

TET_get_xml_data( ) <br />

<br />

TET_open_document*( ) <br />

<br />

TET_open_document*( ) <br />

TET_process_page( ) TET_get_xml_data( ) <br />

<br />

<br />

<br />

trailer <br />

<br />

pagenumber=0 pagenumber <br />

<br />

<br />

TET_close_document( ) <br />

TET_process_page( ) <br />

C++ const char *get_xml_data(int doc, size_t *length, string optlist)<br />

C# Java final byte[ ] get_xml_data(int doc, String optlist)<br />

Perl PHP string get_xml_data(long doc, string optlist)<br />

VB RB Function get_xml_data(doc As Long, optlist As String)<br />

C const char * TET_get_xml_data(TET *tet, int doc, size_t *length, const char *optlist)<br />

<br />

doc<br />

TET_open_document*( ) <br />

length <br />

length <br />

optlist<br />

<br />

<br />

<br />

<br />

*len=0<br />

TET_open_document*( ) TET_process_page( ) <br />

outputformat<br />

<br />

TET_process_page( ) TET_get_xml_<br />

data( ) <br />

<br />

TET_close_document( ) <br />

TET_get_xml_data( )<br />

TET_process_


page( ) TET_close_document( ) <br />

<br />

TET_open_document*( ) tetml filename <br />

<br />

<br />

<br />

<br />

TET_get_xml_data( ) <br />

<br />

<br />

<br />

<br />

<br />

bytes


11.11 pCOS <br />

<br />

<br />

C++ double pcos_get_number(int doc, string path)<br />

C# Java double pcos_get_number(int doc, String path)<br />

Perl PHP float pcos_get_number(int doc, String path)<br />

VB RB Function pcos_get_number(doc as Long, path As String) As Double<br />

C double TET_pcos_get_number(TET *tet, int doc, const char *path, ...)<br />

<br />

<br />

doc<br />

path<br />

TET_open_document*( ) <br />

<br />

key <br />

%s %d %% <br />

<br />

<br />

<br />

<br />

true <br />

<br />

C++ string pcos_get_string(int doc, string path)<br />

C# Java String pcos_get_string(int doc, String path)<br />

Perl PHP String pcos_get_string(int doc, String path)<br />

VB RB Function pcos_get_string(doc as Long, path As String) As String<br />

C const char *TET_pcos_get_string(TET *tet, int doc, const char *path, ...)<br />

<br />

doc<br />

path<br />

TET_open_document*( ) <br />

<br />

key <br />

%s %d %% <br />

<br />

<br />

<br />

<br />

<br />

<br />

true false <br />

<br />

/Info/


* nocopy=false plainmetadata=true <br />

bookmarks[...]/Title pages[...]/Annots/Contents <br />

nocopy=false <br />

<br />

<br />

TET_pcos_get_stream( ) <br />

<br />

<br />

<br />

<br />

printf( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

C++ const unsigned char *pcos_get_stream(int doc, int *length, string optlist, string path)<br />

C# Java final byte[ ] pcos_get_stream(int doc, String optlist, String path)<br />

Perl PHP String pcos_get_stream(int doc, String optlist, String path)<br />

VB RB Function pcos_get_stream(doc as Long, optlist As String, path As String)<br />

C const unsigned char *TET_pcos_get_stream(TET *tet, int doc, int *length, const char *optlist,<br />

const char *path, ...)<br />

stream fstream <br />

doc<br />

TET_open_document*( ) <br />

length <br />

<br />

optlist<br />

path<br />

<br />

<br />

key <br />

%s %d %%


stream keepfilter=true <br />

<br />

fstream <br />

<br />

convert <br />

<br />

<br />

/Root/Metadata <br />

nocopy=false plainmetadata=true stream <br />

fstream <br />

<br />

<br />

TET_pcos_get_string( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

bytes<br />

<br />

<br />

<br />

<br />

TET_pcos_get_stream( ) <br />

<br />

convert<br />

keepfilter<br />

<br />

<br />

none <br />

none <br />

unicode TET_pcos_get_string( ) <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

true filterinfo<br />

false<br />

<br />

true false


ATET<br />

(C) <br />

<br />

<br />

<br />

<br />

(C) TET *TET_new(void) 152<br />

void delete( ) 152<br />

PVF <br />

<br />

<br />

void create_pvf(String filename, byte[] data, String optlist) 152<br />

int delete_pvf(String filename) 153<br />

Unicode <br />

<br />

<br />

(C) const char *TET_utf8_to_utf16(TET *tet, const char *utf8string, const char *ordering, int *size) 154<br />

(C) const char *TET_utf16_to_utf8(TET *tet, const char *utf16string, int len, int *size) 154<br />

(C) const char *TET_utf32_to_utf16(TET *tet, const char *utf32string, int len, const char *ordering, int *size) 155<br />

(C) const char *TET_utf8_to_utf32(TET *tet, const char *utf8string, const char *ordering, int *size) 155<br />

(C) const char *TET_utf32_to_utf8(TET *tet, const char *utf32string, int len, int *size) 156<br />

(C) const char *TET_utf16_to_utf32(TET *tet, const char *utf16string, int len, const char *ordering, int *size) 156<br />

<br />

<br />

<br />

String get_apiname( ) 157<br />

String get_errmsg( ) 157<br />

int get_errnum( ) 157<br />

<br />

<br />

<br />

int open_document(String filename, String optlist) 160<br />

(C) int TET_open_document_callback(TET *tet, void *opaque, size_t filesize, size_t (*readproc)(void<br />

*opaque, void *buffer, size_t size), int (*seekproc)(void *opaque, long offset), const char *optlist) 166<br />

void close_document(int doc) 166


int open_page(int doc, int pagenumber, String optlist) 167<br />

void close_page(int page) 174<br />

<br />

<br />

<br />

String get_text(int page) 175<br />

int get_char_info(int page) 176<br />

<br />

<br />

<br />

int get_image_info(int page) 179<br />

int write_image_file(int doc, int imageid, String optlist) 180<br />

final byte[ ] get_image_data(int doc, int imageid, String optlist) 181<br />

TET TETML <br />

<br />

<br />

int process_page(int doc, int pagenumber, String optlist) 183<br />

final byte[ ] get_xml_data(int doc, String optlist) 184<br />

<br />

<br />

<br />

void set_option(String optlist) 150<br />

pCOS <br />

<br />

<br />

double pcos_get_number(int doc, String path) 186<br />

String pcos_get_string(int doc, String path) 186<br />

final byte[ ] pcos_get_stream(int doc, String optlist, String path) 187


B <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> TET 4.0 <br />

> TET 3.0 <br />

> TET 2.3 <br />

> TET 2.0 <br />

> TET 2.1.0 PHP RPG <br />

<br />

> TET 2.0.0 <br />

> TET 1.1 <br />

> TET 1.0.2 TET_open_doc_callback( ) <br />

<br />

> TET 1 1


B <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

> TET 4.0 <br />

> TET 3.0 <br />

> TET 2.3 <br />

> TET 2.0 <br />

> TET 2.1.0 PHP RPG <br />

<br />

> TET 2.0.0 <br />

> TET 1.1 <br />

> TET 1.0.2 TET_open_doc_callback( ) <br />

<br />

> TET 1 1


A<br />

API 143<br />

B<br />

BMP 92<br />

BOM Byte Order Mark 92<br />

C<br />

C++ 31<br />

codelist 108<br />

COM 33<br />

CSV 139<br />

C 29<br />

D<br />

Dispose( ) 152<br />

F<br />

float<br />

147<br />

float <br />

148<br />

FontReporter Plugin 13, 107<br />

G<br />

glyphlist 110<br />

glyphrule 111<br />

granularity 83<br />

H<br />

HTML XSLT 139<br />

I<br />

IFilter<br />

Microsoft 54<br />

J<br />

J2EE 34<br />

Javadoc 35<br />

Java 34<br />

L<br />

Lucene 47<br />

M<br />

MediaWiki 57<br />

N<br />

.NET 36<br />

O<br />

Oracle <strong>Text</strong> 51<br />

P<br />

pCOS<br />

API 186<br />

141<br />

17<br />

PDF 13<br />

Perl 37<br />

PHP 38<br />

PUA 92<br />

Python 40<br />

R<br />

REALbasic 41<br />

resourcefile 63<br />

RPG 42<br />

S<br />

searchpath 62<br />

Solr 50<br />

T<br />

tet.upr 63<br />

TET_CATCH( ) 157<br />

TET_close_document( ) 166<br />

TET_close_page( ) 174<br />

TET_create_pvf() 152<br />

TET_delete( ) 152<br />

TET_delete_pvf() 153<br />

TET_EXIT_TRY( ) 29, 157<br />

TET_get_apiname() 157<br />

TET_get_char_info( ) 176<br />

TET_get_errmsg( ) 157


TET_get_errnum( ) 157<br />

TET_get_image_data( ) 181<br />

TET_get_image_info( ) 179<br />

TET_get_text( ) 175<br />

TET_get_xml_data( ) 184<br />

TET_new( ) 152<br />

TET_open_document( ) 160<br />

TET_open_document_callback( ) 166<br />

TET_open_page( ) 167<br />

TET_pcos_get_number( ) 186<br />

TET_pcos_get_stream( ) 187<br />

TET_pcos_get_string( ) 186<br />

TET_RETHROW( ) 157<br />

TET_set_option( ) 150<br />

TET_TRY( ) 157<br />

TET_utf16_to_utf32() 156<br />

TET_utf16_to_utf8( ) 154<br />

TET_utf32_to_utf16() 155<br />

TET_utf32_to_utf8() 156<br />

TET_utf8_to_utf16( ) 154<br />

TET_utf8_to_utf16() 155<br />

TET_write_image_file( ) 180<br />

TETML 123<br />

TETML 131<br />

TETRESOURCEFILE 63<br />

TET <br />

Lucene 47<br />

MediaWiki 57<br />

Microsoft 54<br />

Oracle 51<br />

Solr 50<br />

TET 19<br />

TET 13<br />

TET 17<br />

TET <br />

Adobe Acrobat 45<br />

TeX 68<br />

ToUnicode CMap 109<br />

U<br />

Unichar <br />

146<br />

Unicode<br />

BOM 92<br />

96<br />

146<br />

91<br />

103<br />

92<br />

92<br />

99<br />

94<br />

94<br />

Unicode 96<br />

UPR 61<br />

UTF-32 105<br />

UTF 92<br />

X<br />

XMP 70<br />

XSLT 139<br />

114<br />

XSD <br />

TETML 131<br />

XSLT 134<br />

137, 16<br />

x 76<br />

<br />

94<br />

81<br />

59<br />

144<br />

<br />

TET 7<br />

73<br />

143<br />

143<br />

<br />

79<br />

85<br />

<br />

XMP 114<br />

121<br />

119<br />

119<br />

116<br />

113<br />

113<br />

117<br />

121<br />

116<br />

118<br />

117<br />

118<br />

115<br />

<br />

147<br />

92<br />

91<br />

93<br />

85<br />

<br />

149<br />

93<br />

91


111<br />

74<br />

110<br />

19<br />

XSLT 137<br />

93<br />

100<br />

<br />

<br />

65<br />

XSLT 139<br />

<br />

XSLT 137<br />

73<br />

93<br />

71<br />

<br />

119<br />

<br />

77<br />

59<br />

116<br />

92<br />

96<br />

<br />

147<br />

131<br />

103<br />

99<br />

81<br />

65<br />

<br />

79<br />

73<br />

84<br />

84<br />

71<br />

59<br />

<br />

83<br />

XSLT 140<br />

14<br />

79<br />

7<br />

<br />

79<br />

<br />

117<br />

85<br />

7<br />

89<br />

XSLT 139<br />

62<br />

71<br />

XSLT 138<br />

XSLT <br />

138<br />

XSLT 137<br />

99<br />

69<br />

87<br />

69<br />

<br />

143<br />

81<br />

118<br />

73<br />

<br />

94<br />

106<br />

73<br />

<br />

146<br />

<br />

9<br />

93<br />

<br />

144<br />

61<br />

61<br />

118<br />

83<br />

<br />

73<br />

<br />

59<br />

27<br />

C 29<br />

158<br />

<br />

147<br />

<br />

76<br />

76<br />

34<br />

92<br />

94


76<br />

77<br />

72<br />

71<br />

72<br />

16<br />

22


ABC<br />

<strong>PDFlib</strong> GmbH<br />

Franziska-Bilek-Weg 9<br />

80339 München, Germany<br />

www.pdflib.com<br />

+49 • 89 • 452 33 84-0<br />

fax +49 • 89 • 452 33 84-99<br />

PDF <br />

tech.groups.yahoo.com/group/pdflib <br />

<br />

sales@pdflib.com<br />

<br />

support@pdflib.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!