12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

210 DEVELOPMENT LOG ANNOYANCE-FILTER §256<br />

Added a name method to MIMEdecoder and all its children, then took advantage of that to dispense<br />

with the horrific duplication of decoder diagnostic code in 〈 Verify Content-Transfer-Encoding and<br />

activate decoder if necessary 160 〉. What was previously dispersed among the several branches of the<br />

decoder activation is now collected together in a single case after the decoder has been chosen.<br />

Modified Makefile.in to delete the fussy core.process files Linux has taken to produce.<br />

Fixed configure.in to specify −Wall if we’re building with GCC.<br />

2002 September 20<br />

On Solaris, GCC is prone to hang if invoked with −O2 (at least as of version 2.95.3). I twiddled the<br />

configure.in to change the compile option to −O for Solaris builds.<br />

ctangle and cweave spewed copious warnings on a GCC −Wall build. To avoid modifying these<br />

programs, which are prefectly compliant ANSI C, I changed Makefile.in to suppress the −Wall option<br />

for them when the compiler is detected as GCC.<br />

make dist didn’t do a make distclean before generating the distribution archive, which could result<br />

in build-specific files being included in the archive. Fixed.<br />

2002 September 21<br />

Added documentation on how to integrate annoyance−filter into a .forward pipeline to Procmail,<br />

and build a .procmailrc rule set for a typical user-level filtering. It’s 03:40 and I’m going to get some<br />

sleep before proofing this text—at the moment it’s something between a random scribble and a first<br />

draft.<br />

Okay, I just couldn’t stand it. . .I just had to take another crack at the infernal dictionary ::purge<br />

method. One of the many bees in my bonnet buzzed the idea into my ear that I could avoid both the<br />

extra memory consumption of yesterday’s scheme and the risk of instability in the container by testing<br />

the probability of the first item in the map, adding it to the queue of survivors if its probability is<br />

significant, then performing an erase (begin ( )). Cool, huh? No iterators, no mess, no two copies of any<br />

word in memory.<br />

<strong>The</strong> hits just keep on coming. . .the stupid built-in purge in dictionary ::resetCat also ran afoul of the<br />

“stale iterator” problem. I blew it away—henceforth, it’s up to you to do a −−purge after a −−clearmail<br />

or −−clearjunk. With the new tolerance for un-purged dictionaries, no great harm will be done if you<br />

forget.<br />

Added a \subsection macro to create subheads within documentation sections. <strong>The</strong> section number<br />

is automatically grabbed from the cwebmac.tex definition, but lower level numbering is manual, permitting<br />

you to add additional levels of hierarchy with a specification like:<br />

\subsection{4.2.1}{Twiddling little details}.<br />

It turns out that all the cheesy mess I put in to patch the user’s home directory into the annoyance−filter−run<br />

script wasn’t necessary after all since sendmail is kind enough to change to the user’s home directory<br />

before piping a message to a program. This means we can just cd to .annoyance−filter relative to<br />

the home directory. This also means one can remove the absolute path name from the .forward file,<br />

which cleans up the documentation on integration with Procmail.<br />

Added a rather tacky check target to the Makefile.in to serve as a “sanity check” that doesn’t<br />

require an extensive training databases. <strong>The</strong> scheme is to train the program with the source code<br />

for annoyance−filter.w serving as the mail collection and statlib.w the junk bucket. <strong>The</strong>n those<br />

programs themselves are tested, and the transcripts verified to confirm they were correctly classified.<br />

Astute observers will ask where I get off using something which isn’t a well-formed mail folder to train<br />

the program. Well, it works thanks to a gimmick I put into the probability calculation to keep it from<br />

dividing by zero if one or both of the message counts were zero. That keeps anything untoward from

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!