12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

§256 ANNOYANCE-FILTER DEVELOPMENT LOG 219<br />

to be looking for contain regular expression meta-characters, it’s a lot more convenient to specify an<br />

explicit target than remember what they all are and quote them.<br />

Corrected the diagnostic for an unknown character set in a header line string to say “Header line” rather<br />

than the obsolete and misleading “Subject line” it used to say.<br />

Added “us−ascii” to the list of character sets which require no multi-byte decoding or interpretation<br />

when they appear in header line quoted strings. Junk mail sometimes encodes even ASCII subject lines<br />

(and sometimes other headers) as Base64 or Quoted-Printable to hide the text from naïve filters.<br />

Added a script to build under Cygwin, makew32.sh. Attempting to link in our own copies of getopt.c<br />

and getopt1.c runs afoul of the Cygwin linker (why?), so I removed them from the compiles and link<br />

done by this script.<br />

Building on Cygwin failed because the library I was using didn’t define in_addr_t. I’d seen this earlier<br />

on Solaris, but had inadvertently added a new reference since I’d last tested there. I changed the<br />

offending reference (in a static_cast of all places), to our cop-out type u_int32_t, which autoconf<br />

guarantees will always be there. With that fix, the program built and worked on Cygwin, including<br />

POP3 proxying!<br />

<strong>The</strong> check for non-white space following a soft line break in a Quoted-Printable MIME part failed for a<br />

POP3 proxy message containing CR/LF line terminators. I broadened the definition of white space in<br />

〈 Character is white space 62 〉 to include carriage return.<br />

2002 November 3<br />

Scribbled a first cut README.WIN file to be included in the Win32 executable archive which explains the<br />

issues involving the included Cygwin DLL. I modified Makefile.in to include this file, the DLL, and<br />

COPYING.GNU (the GPL) in the Win32 archive.<br />

Tested the Win32 archive on a Cygwin-free machine. Seems to work OK, including POP3 proxy from<br />

another machine on the LAN.<br />

Verified that POP3 proxy on a Cygwin-free machine running Windows 98 works with the version of<br />

Outlook furnished with that system, which can be configured to retrieve messages from ”localhost” on<br />

our default port of 9110. Note, however, that one must first configure the account (defaulting to port<br />

110), then edit the properties of the account, using the “Advanced” tab to specify the POP3 port of<br />

9110.<br />

Messages embedded within other messages with the Content−Type specification of message/rfc822 did<br />

not have their own MIME parts correctly decoded because mailFolder failed to scan the header of the<br />

embedded message for its own Content−Type and boundary specifications. Fixed. This should get rid<br />

of the previously mysterious long gibberish strings which decoded out of forwarded messages with image<br />

and other binary attachments. <strong>The</strong> strings were due to the Base64 decoder not being activated for the<br />

embedded message’s attachments.<br />

2002 November 5<br />

Implemented the first cut of fast dictionary support. Having created a dictionary in memory, you can<br />

export it to a file in fast dictionary format with the −−fwrite option. <strong>The</strong> −−fread option loads such a<br />

dictionary and, if loaded, it takes precedence over a regular dictionary. This permits fast classification<br />

of messages without all the overhead of creating a full-fledged in-memory dictionary.<br />

Added memory-mapping of the fast dictionary when HAVE_MMAP is defined. In the interest of code<br />

commonality, the header fields are read from an istrstream bound to the memory mapped block, but<br />

access to the hash and word tables are pure pointer-whack.<br />

Fixed a typo in configure.in which caused a harmless but ugly warning when running the script.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!