The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
210 DEVELOPMENT LOG ANNOYANCE-FILTER §256<br />
Added a name method to MIMEdecoder and all its children, then took advantage of that to dispense<br />
with the horrific duplication of decoder diagnostic code in 〈 Verify Content-Transfer-Encoding and<br />
activate decoder if necessary 160 〉. What was previously dispersed among the several branches of the<br />
decoder activation is now collected together in a single case after the decoder has been chosen.<br />
Modified Makefile.in to delete the fussy core.process files Linux has taken to produce.<br />
Fixed configure.in to specify −Wall if we’re building with GCC.<br />
2002 September 20<br />
On Solaris, GCC is prone to hang if invoked with −O2 (at least as of version 2.95.3). I twiddled the<br />
configure.in to change the compile option to −O for Solaris builds.<br />
ctangle and cweave spewed copious warnings on a GCC −Wall build. To avoid modifying these<br />
programs, which are prefectly compliant ANSI C, I changed Makefile.in to suppress the −Wall option<br />
for them when the compiler is detected as GCC.<br />
make dist didn’t do a make distclean before generating the distribution archive, which could result<br />
in build-specific files being included in the archive. Fixed.<br />
2002 September 21<br />
Added documentation on how to integrate annoyance−filter into a .forward pipeline to Procmail,<br />
and build a .procmailrc rule set for a typical user-level filtering. It’s 03:40 and I’m going to get some<br />
sleep before proofing this text—at the moment it’s something between a random scribble and a first<br />
draft.<br />
Okay, I just couldn’t stand it. . .I just had to take another crack at the infernal dictionary ::purge<br />
method. One of the many bees in my bonnet buzzed the idea into my ear that I could avoid both the<br />
extra memory consumption of yesterday’s scheme and the risk of instability in the container by testing<br />
the probability of the first item in the map, adding it to the queue of survivors if its probability is<br />
significant, then performing an erase (begin ( )). Cool, huh? No iterators, no mess, no two copies of any<br />
word in memory.<br />
<strong>The</strong> hits just keep on coming. . .the stupid built-in purge in dictionary ::resetCat also ran afoul of the<br />
“stale iterator” problem. I blew it away—henceforth, it’s up to you to do a −−purge after a −−clearmail<br />
or −−clearjunk. With the new tolerance for un-purged dictionaries, no great harm will be done if you<br />
forget.<br />
Added a \subsection macro to create subheads within documentation sections. <strong>The</strong> section number<br />
is automatically grabbed from the cwebmac.tex definition, but lower level numbering is manual, permitting<br />
you to add additional levels of hierarchy with a specification like:<br />
\subsection{4.2.1}{Twiddling little details}.<br />
It turns out that all the cheesy mess I put in to patch the user’s home directory into the annoyance−filter−run<br />
script wasn’t necessary after all since sendmail is kind enough to change to the user’s home directory<br />
before piping a message to a program. This means we can just cd to .annoyance−filter relative to<br />
the home directory. This also means one can remove the absolute path name from the .forward file,<br />
which cleans up the documentation on integration with Procmail.<br />
Added a rather tacky check target to the Makefile.in to serve as a “sanity check” that doesn’t<br />
require an extensive training databases. <strong>The</strong> scheme is to train the program with the source code<br />
for annoyance−filter.w serving as the mail collection and statlib.w the junk bucket. <strong>The</strong>n those<br />
programs themselves are tested, and the transcripts verified to confirm they were correctly classified.<br />
Astute observers will ask where I get off using something which isn’t a well-formed mail folder to train<br />
the program. Well, it works thanks to a gimmick I put into the probability calculation to keep it from<br />
dividing by zero if one or both of the message counts were zero. That keeps anything untoward from