The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
224 DEVELOPMENT LOG ANNOYANCE-FILTER §256<br />
are not supported, as they were a disastrously poor idea—you can generally treat them as usual UNIX<br />
folders. By default, folders are parsed using UNIX semantics. A new −−bsdfolder option marks the<br />
following −−mail or −−junk folder as following BSD rules. Note that you must specify −−bsdfolder<br />
before each BSD-style folder; it is not modal. This is a change in default behaviour: folders were<br />
previously parsed using BSD rules, while UNIX is now the default.<br />
<strong>The</strong> very large case statement which processes command line options ran afoul of CWEAVE’s maximum<br />
token per scrap capacity limit. I added a cweb/cweave−bigger.ch file to increase the limit to 5000<br />
tokens (from 2000), and modified cweb/Makefile to apply the change file when building CWEAVE. I<br />
probably ought to break the option processing case into one piece for each option, but as there’s little<br />
or nothing to be said about each one, that really wouldn’t improve the readability of the code.<br />
2003 February 20<br />
Completed the implementation of −−autoprune. This new option permits you to specify a memory<br />
size, in bytes, at which a dictionary to which words are being added with the −−mail or −−junk<br />
options will be automatically be pruned by discarding all words which appear only once. A new<br />
dictionaryWord ::estimateMemoryRequirement method estimates the memory occupied by an inmemory<br />
word, and this is used to compute the total dictionary size. dictionary ::purge has been<br />
extended to accept an optional argument which, if nonzero, causes the pruning of the dictionary to be<br />
based on the number of occurrences of the word rather than our ability to compute its probability.<br />
If the user sets −−autoprune too low, we can fall into a trashing situation when the non-unique words<br />
in the dictionary exceed the pruning threshold. To keep this from happening, whenever the dictionary<br />
size after an automatic prune exceeds 90% of the −−autoprune threshold, the threshold is increased by<br />
25%.<br />
2003 February 21<br />
Modified the makew32.sh script to build with gcc 3.x rather than 2.x. Note that this means the<br />
source should be ./configured for a gcc 3.x build before creating winarch to transport to the Cygwin<br />
machine.<br />
When building on Cygwin with gcc 3, getopt.h managed to get included twice for some reason. I<br />
changed the condition around our local copy to __GETOPT_H__ to agree with the symbol in the library<br />
include to prevent this from happening.<br />
Updated the cygwin.dll included in the Win32 executable distribution to the January 24, 2003 version<br />
we’re currently using on Ovni.<br />
Release 1.0.<br />
2003 June 24<br />
As reported by and fixed by Wolfgang Schnerring, utilities/splitmail.pl had an assignment statement<br />
in the dispose_of_message subroutine which was missing the dollar sign before the variable<br />
name. I integrated his fix. Thank you!<br />
2003 August 27<br />
A <strong>pdf</strong>TextExtractor was not restartable—once instantiated, it could only be used once; calling close<br />
and then re-initialising with the parent applicationStringParser class setMailFolder left the extractor<br />
at end of file. This required fixes both in <strong>pdf</strong>TextExtractor, where the close method failed to reset<br />
initialised to false , and in applicationStringParser, whose close method did not reset the eof and<br />
error flags.<br />
2003 August 28