12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

§256 ANNOYANCE-FILTER DEVELOPMENT LOG 225<br />

Added a parser diagnostic to mailFolder ::nextLine to indicate when an applicationStringParser is<br />

closed.<br />

<strong>The</strong> close method of <strong>pdf</strong>TextExtractor failed to close the input stream it used to read the output<br />

from the pipe connected to <strong>pdf</strong>totext, which caused (for some bizarre reason), the raw binary PDF<br />

file to be returned, not the decoded text. I added the requisite close of the stream.<br />

When <strong>pdf</strong>TextExtractor was transcribing the decoded attachment to the temporary file to be read<br />

by <strong>pdf</strong>totext, it checked for end of file but not error conditions. I modified it to use isOK ( ) to govern<br />

the copy loop.<br />

<strong>The</strong> flashTextExtractor and its parent flashStream were not restartable because they did not<br />

propagate the close up to the applicationStringParser from which all are derived, and because<br />

flashTextExtractor did not reset its own initialised and textOnly at end of file. Fixed.<br />

Because the flashStream decoder usually terminates upon seeing a stagEnd tag in the input stream,<br />

it failed to read from the MIME decoder until end of file was encountered. This caused an extraneous<br />

blank line to be inserted in the transcript at the end of the MIME-encoded data and before the part<br />

end sentinel. I added logic to flashTextExtractor ::nextString to call get8 ( ) until an end of file is<br />

reported before returning the logical end of file for the flash stream.<br />

<strong>The</strong> input stream close I added to <strong>pdf</strong>TextExtractor ::close ran afoul of the fdistream logic used<br />

to cope with gcc 3 which, helpfully, does not define a close method. I made the close conditional on<br />

HAVE_FDSTREAM_COMPATIBILITY not being defined.<br />

This time, our attempt to rebuild the Win32 version was torpedoed by getopt in yet another innovative<br />

way. This time, the care we took to avoid including our own getopt.h stabbed us in the back, because<br />

the library’s version (which I still haven’t figured out the reason it’s being included) doesn’t define the<br />

long version of getopt , and wants a different symbol to do so than our include file. So, I added WIN32<br />

conditional code before the include of our version to force it to be included and define the long option<br />

version of getopt . This GCC/Cygwin “compatibility” is turning out to be a running bad joke.<br />

Release 1.0a.<br />

2003 September 23<br />

A file whose name contained the string “.gz” (or whatever other compressed file extension was configured)<br />

would be fed through the decompressor even if the sequence was embedded in the middle of the<br />

file name. I modified the tests to deem a file compressed only if the Compressed file type string appears<br />

at the end of the file name. This applies both to files named directly on the command line and files<br />

within directories.<br />

A PDF file which has been marked by its creator as view-only will not be processed by <strong>pdf</strong>totext—<br />

no output is generated and the message “Error: Copying of text from this document is not<br />

allowed.” is sent to standard output. <strong>The</strong>re’s nothing we can do about this, absent making a version<br />

of <strong>pdf</strong>totext which bypasses the PDF file security mechanisms. While there’s something to be said for<br />

this, it’s well beyond the mandate of annoyance−filter.<br />

An assertion added to flashStream ::ignoreTag in the process of debugging problems due to multiple<br />

flash attachments could fail when −−bsdfolder mode was used to scan a mail or junk folder. I<br />

commented out the assertion.<br />

2003 September 24<br />

Phil Karn (KA9Q) reported that on the latest Debian distribution, compilations failed due to a missing<br />

definition of assert. As far as I can determine, assert.h was pulled in by other includes in earlier<br />

libraries, but now must be included explicitly. I added the requisite includes to annoyance−filter.w<br />

and statlib.w.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!