12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§125 ANNOYANCE-FILTER PDF TEXT EXTRACTOR 105<br />

125. PDF text extractor.<br />

<strong>The</strong> <strong>pdf</strong>TextExtractor decodes Portable Document File .<strong>pdf</strong> files by opening a pipe to the <strong>pdf</strong>totext<br />

program. Since this program cannot read a PDF document from standard input, we transcribe the PDF<br />

stream to a temporary file which is passed to <strong>pdf</strong>totext on the command line; the extracted text is<br />

directed to standard output whence it can be read through the pipe. <strong>The</strong> temporary file is deleted after<br />

the PDF decoding is complete. Natually, this facility is available only if the system provides <strong>pdf</strong>totext<br />

and the machinery needed to connect to it.<br />

〈 Class definitions 10 〉 +≡<br />

#ifdef HAVE_PDF_DECODER<br />

class <strong>pdf</strong>TextExtractor : public applicationStringParser {<br />

protected: bool initialised ;<br />

#ifdef HAVE_FDSTREAM_COMPATIBILITY<br />

fdistream is ;<br />

#else<br />

ifstream is ;<br />

#endif<br />

FILE ∗ip;<br />

#ifdef HAVE_MKSTEMP<br />

char tempfn [256];<br />

#else<br />

char tempfn [L tmpnam + 2];<br />

#endif<br />

public: <strong>pdf</strong>TextExtractor(mailFolder ∗f = Λ)<br />

: applicationStringParser(f), initialised (false ), ip(Λ) { }<br />

∼<strong>pdf</strong>TextExtractor( )<br />

{<br />

close ( );<br />

}<br />

virtual string name (void) const<br />

{<br />

return "PDF";<br />

}<br />

bool nextString (string &s);<br />

virtual void close (void)<br />

{<br />

if (ip ≠ Λ) {<br />

#ifndef HAVE_FDSTREAM_COMPATIBILITY<br />

is .close ( );<br />

#endif<br />

pclose (ip);<br />

remove (tempfn );<br />

ip = Λ;<br />

}<br />

applicationStringParser ::close ( );<br />

initialised = false ;<br />

}<br />

};<br />

#endif

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!