12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

§256 ANNOYANCE-FILTER DEVELOPMENT LOG 211<br />

happening when we’re missing message headers, and the difference in the word content of the two files<br />

is so extreme that they reliably score correctly.<br />

Added a new Perl gizmo, TestFolder/testfolder.pl, which walks through a mail folder, breaks out<br />

each message, and passes it through annoyance−filter to obtain the probability and classification.<br />

(<strong>The</strong> annoyance−filter command is defined by a string within the Perl program, so you can modify<br />

as you wish to evaluate the effects of other settings.) At the end of the folder, the total message count,<br />

number of messages scored as junk and mail, and the mean probability of messages in the folder are<br />

printed.<br />

Added a “back” command to SplitMail/splitmail.pl. As you walk through a mail folder, the start<br />

address of each message you’ve seen is kept in a stack. <strong>The</strong> “b” command pops the stack and backs up<br />

to the previous message. This should reduce the pain when your sorting a folder and accidentally hit<br />

“d” when you meant to save the message somewhere. You can even go back after a search operation.<br />

Moved the splitmail.pl and testfolder.pl from their own dedicated directories into a new utilities<br />

directory which Makefile.in includes in the archive. If and when these utilities require common code,<br />

such as the CSV parser, it will be easier to manage them all in the same directory.<br />

Added help, requested by the “?” key, to splitmail.pl at both the disposition and the “more” prompt<br />

while viewing message text. If you assign additional folder destinations to disposition keys, they are<br />

automatically included in the help output.<br />

Now that splitmail.pl is equipped with a “back” mechanism, there’s no reason not to interpret a<br />

void disposition as a request to advance to the next message—if it’s a fat-finger, just go back. Trolling<br />

through a target-sparse folder can now be done at the expense of only one keystroke per message.<br />

2002 September 22<br />

Went ahead and added code to dereference symbolic links up to 50 deep when deciding whether files are<br />

gzip compressed in mailFolder. What the heck, it’s the solstice (well, it was a couple of hours ago)<br />

and the full Moon to boot—better to write silly code than trying to balance eggs on their little ends!<br />

Much work on the documentation today, but little on the code. Slowly the python peristalsis moves us<br />

toward release.<br />

2002 September 23<br />

We’re off to see the lizard, the wonderful lizard of WIN32! Naturally, all of our carefully crafted code<br />

to set up pipelines to decompress dictionaries evaporated under the harsh sun of WIN32. I added<br />

conditional compilation to disable everything that incompetent empire self-defined by its own limes and<br />

rusty Gates doesn’t comprehend.<br />

Building for WIN32 with DJGPP resulted in a natter about comparison of the size type of a multimap<br />

to an unsigned int. <strong>The</strong> Linux compiler accepted this without a quibble. I added a static cast to<br />

clear up the confusion.<br />

OK, it built on WIN32 with DJGPP 2.953 and even passed the rudimentary tests I threw at it. So,<br />

I copied the executable back to the development directory, then discovered and fixed numerous bugs<br />

in the archive creation code in Makefile.in when the WIN32 distribution is enabled. Got better. A<br />

Zipped WIN32 build is now posted in the Web directory and linked to from the home page.<br />

<strong>The</strong> configure.in script didn’t check for the −lm math library. This somehow managed to work on<br />

Linux and Solaris, but failed on FreeBSD. I added the necessary AC_CHECK_LIB macro. (Reported by<br />

Neil Darlow).<br />

Fixed several typos in the documentation of computeJunkProbability and reformatted the formula as a<br />

stacked fraction so it fits better on the page.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!