The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
§256 ANNOYANCE-FILTER DEVELOPMENT LOG 225<br />
Added a parser diagnostic to mailFolder ::nextLine to indicate when an applicationStringParser is<br />
closed.<br />
<strong>The</strong> close method of <strong>pdf</strong>TextExtractor failed to close the input stream it used to read the output<br />
from the pipe connected to <strong>pdf</strong>totext, which caused (for some bizarre reason), the raw binary PDF<br />
file to be returned, not the decoded text. I added the requisite close of the stream.<br />
When <strong>pdf</strong>TextExtractor was transcribing the decoded attachment to the temporary file to be read<br />
by <strong>pdf</strong>totext, it checked for end of file but not error conditions. I modified it to use isOK ( ) to govern<br />
the copy loop.<br />
<strong>The</strong> flashTextExtractor and its parent flashStream were not restartable because they did not<br />
propagate the close up to the applicationStringParser from which all are derived, and because<br />
flashTextExtractor did not reset its own initialised and textOnly at end of file. Fixed.<br />
Because the flashStream decoder usually terminates upon seeing a stagEnd tag in the input stream,<br />
it failed to read from the MIME decoder until end of file was encountered. This caused an extraneous<br />
blank line to be inserted in the transcript at the end of the MIME-encoded data and before the part<br />
end sentinel. I added logic to flashTextExtractor ::nextString to call get8 ( ) until an end of file is<br />
reported before returning the logical end of file for the flash stream.<br />
<strong>The</strong> input stream close I added to <strong>pdf</strong>TextExtractor ::close ran afoul of the fdistream logic used<br />
to cope with gcc 3 which, helpfully, does not define a close method. I made the close conditional on<br />
HAVE_FDSTREAM_COMPATIBILITY not being defined.<br />
This time, our attempt to rebuild the Win32 version was torpedoed by getopt in yet another innovative<br />
way. This time, the care we took to avoid including our own getopt.h stabbed us in the back, because<br />
the library’s version (which I still haven’t figured out the reason it’s being included) doesn’t define the<br />
long version of getopt , and wants a different symbol to do so than our include file. So, I added WIN32<br />
conditional code before the include of our version to force it to be included and define the long option<br />
version of getopt . This GCC/Cygwin “compatibility” is turning out to be a running bad joke.<br />
Release 1.0a.<br />
2003 September 23<br />
A file whose name contained the string “.gz” (or whatever other compressed file extension was configured)<br />
would be fed through the decompressor even if the sequence was embedded in the middle of the<br />
file name. I modified the tests to deem a file compressed only if the Compressed file type string appears<br />
at the end of the file name. This applies both to files named directly on the command line and files<br />
within directories.<br />
A PDF file which has been marked by its creator as view-only will not be processed by <strong>pdf</strong>totext—<br />
no output is generated and the message “Error: Copying of text from this document is not<br />
allowed.” is sent to standard output. <strong>The</strong>re’s nothing we can do about this, absent making a version<br />
of <strong>pdf</strong>totext which bypasses the PDF file security mechanisms. While there’s something to be said for<br />
this, it’s well beyond the mandate of annoyance−filter.<br />
An assertion added to flashStream ::ignoreTag in the process of debugging problems due to multiple<br />
flash attachments could fail when −−bsdfolder mode was used to scan a mail or junk folder. I<br />
commented out the assertion.<br />
2003 September 24<br />
Phil Karn (KA9Q) reported that on the latest Debian distribution, compilations failed due to a missing<br />
definition of assert. As far as I can determine, assert.h was pulled in by other includes in earlier<br />
libraries, but now must be included explicitly. I added the requisite includes to annoyance−filter.w<br />
and statlib.w.