17.05.2014 Views

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

PDFlib Text Extraction Toolkit (TET) Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

convenient user interface features, it is useful as a replacement for Acrobat’s built-in<br />

copy and find features. <strong>PDFlib</strong> <strong>TET</strong> can successfully process many documents for which<br />

Acrobat provides only garbage when trying to extract the text. The <strong>TET</strong> Plugin provides<br />

the following functions:<br />

> Copy the text from a PDF document in plain text to the system clipboard or a disk<br />

file. Enhanced clipboard controls facilitate the use of copy/paste.<br />

> Convert a PDF to <strong>TET</strong>ML and place it in the clipboard or a disk file.<br />

> Copy XMP document metadata to the clipboard or a disk file.<br />

> Extract images from the document as TIFF, JPEG, or JPEG 2000 files.<br />

> Find words in the document.<br />

> Detailed configuration settings are available to adjust text and image extraction to<br />

your requirements. Configuration sets can be saved and reloaded.<br />

Advantages over Acrobat’s copy function. The <strong>TET</strong> Plugin offers several advantages<br />

over Acrobat’s built-in copy facility:<br />

> The output can be customized to match different application requirements.<br />

> <strong>TET</strong> is able to correctly interpret the text in many cases where Acrobat copies only<br />

garbage to the clipboard.<br />

> Unknown glyphs (for which proper Unicode mapping cannot be established) will be<br />

highlighted in red color, and can be replaced with a user-selected character (e.g. question<br />

mark).<br />

> <strong>TET</strong> processes documents much faster than Acrobat.<br />

> Images can be selected interactively for export, or all images on the page or in the<br />

document can be extracted.<br />

> Tiny image fragments are merged to usable images.<br />

36 Chapter 4: <strong>TET</strong> Connectors

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!