12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

224 DEVELOPMENT LOG ANNOYANCE-FILTER §256<br />

are not supported, as they were a disastrously poor idea—you can generally treat them as usual UNIX<br />

folders. By default, folders are parsed using UNIX semantics. A new −−bsdfolder option marks the<br />

following −−mail or −−junk folder as following BSD rules. Note that you must specify −−bsdfolder<br />

before each BSD-style folder; it is not modal. This is a change in default behaviour: folders were<br />

previously parsed using BSD rules, while UNIX is now the default.<br />

<strong>The</strong> very large case statement which processes command line options ran afoul of CWEAVE’s maximum<br />

token per scrap capacity limit. I added a cweb/cweave−bigger.ch file to increase the limit to 5000<br />

tokens (from 2000), and modified cweb/Makefile to apply the change file when building CWEAVE. I<br />

probably ought to break the option processing case into one piece for each option, but as there’s little<br />

or nothing to be said about each one, that really wouldn’t improve the readability of the code.<br />

2003 February 20<br />

Completed the implementation of −−autoprune. This new option permits you to specify a memory<br />

size, in bytes, at which a dictionary to which words are being added with the −−mail or −−junk<br />

options will be automatically be pruned by discarding all words which appear only once. A new<br />

dictionaryWord ::estimateMemoryRequirement method estimates the memory occupied by an inmemory<br />

word, and this is used to compute the total dictionary size. dictionary ::purge has been<br />

extended to accept an optional argument which, if nonzero, causes the pruning of the dictionary to be<br />

based on the number of occurrences of the word rather than our ability to compute its probability.<br />

If the user sets −−autoprune too low, we can fall into a trashing situation when the non-unique words<br />

in the dictionary exceed the pruning threshold. To keep this from happening, whenever the dictionary<br />

size after an automatic prune exceeds 90% of the −−autoprune threshold, the threshold is increased by<br />

25%.<br />

2003 February 21<br />

Modified the makew32.sh script to build with gcc 3.x rather than 2.x. Note that this means the<br />

source should be ./configured for a gcc 3.x build before creating winarch to transport to the Cygwin<br />

machine.<br />

When building on Cygwin with gcc 3, getopt.h managed to get included twice for some reason. I<br />

changed the condition around our local copy to __GETOPT_H__ to agree with the symbol in the library<br />

include to prevent this from happening.<br />

Updated the cygwin.dll included in the Win32 executable distribution to the January 24, 2003 version<br />

we’re currently using on Ovni.<br />

Release 1.0.<br />

2003 June 24<br />

As reported by and fixed by Wolfgang Schnerring, utilities/splitmail.pl had an assignment statement<br />

in the dispose_of_message subroutine which was missing the dollar sign before the variable<br />

name. I integrated his fix. Thank you!<br />

2003 August 27<br />

A <strong>pdf</strong>TextExtractor was not restartable—once instantiated, it could only be used once; calling close<br />

and then re-initialising with the parent applicationStringParser class setMailFolder left the extractor<br />

at end of file. This required fixes both in <strong>pdf</strong>TextExtractor, where the close method failed to reset<br />

initialised to false , and in applicationStringParser, whose close method did not reset the eof and<br />

error flags.<br />

2003 August 28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!