The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
§256 ANNOYANCE-FILTER DEVELOPMENT LOG 219<br />
to be looking for contain regular expression meta-characters, it’s a lot more convenient to specify an<br />
explicit target than remember what they all are and quote them.<br />
Corrected the diagnostic for an unknown character set in a header line string to say “Header line” rather<br />
than the obsolete and misleading “Subject line” it used to say.<br />
Added “us−ascii” to the list of character sets which require no multi-byte decoding or interpretation<br />
when they appear in header line quoted strings. Junk mail sometimes encodes even ASCII subject lines<br />
(and sometimes other headers) as Base64 or Quoted-Printable to hide the text from naïve filters.<br />
Added a script to build under Cygwin, makew32.sh. Attempting to link in our own copies of getopt.c<br />
and getopt1.c runs afoul of the Cygwin linker (why?), so I removed them from the compiles and link<br />
done by this script.<br />
Building on Cygwin failed because the library I was using didn’t define in_addr_t. I’d seen this earlier<br />
on Solaris, but had inadvertently added a new reference since I’d last tested there. I changed the<br />
offending reference (in a static_cast of all places), to our cop-out type u_int32_t, which autoconf<br />
guarantees will always be there. With that fix, the program built and worked on Cygwin, including<br />
POP3 proxying!<br />
<strong>The</strong> check for non-white space following a soft line break in a Quoted-Printable MIME part failed for a<br />
POP3 proxy message containing CR/LF line terminators. I broadened the definition of white space in<br />
〈 Character is white space 62 〉 to include carriage return.<br />
2002 November 3<br />
Scribbled a first cut README.WIN file to be included in the Win32 executable archive which explains the<br />
issues involving the included Cygwin DLL. I modified Makefile.in to include this file, the DLL, and<br />
COPYING.GNU (the GPL) in the Win32 archive.<br />
Tested the Win32 archive on a Cygwin-free machine. Seems to work OK, including POP3 proxy from<br />
another machine on the LAN.<br />
Verified that POP3 proxy on a Cygwin-free machine running Windows 98 works with the version of<br />
Outlook furnished with that system, which can be configured to retrieve messages from ”localhost” on<br />
our default port of 9110. Note, however, that one must first configure the account (defaulting to port<br />
110), then edit the properties of the account, using the “Advanced” tab to specify the POP3 port of<br />
9110.<br />
Messages embedded within other messages with the Content−Type specification of message/rfc822 did<br />
not have their own MIME parts correctly decoded because mailFolder failed to scan the header of the<br />
embedded message for its own Content−Type and boundary specifications. Fixed. This should get rid<br />
of the previously mysterious long gibberish strings which decoded out of forwarded messages with image<br />
and other binary attachments. <strong>The</strong> strings were due to the Base64 decoder not being activated for the<br />
embedded message’s attachments.<br />
2002 November 5<br />
Implemented the first cut of fast dictionary support. Having created a dictionary in memory, you can<br />
export it to a file in fast dictionary format with the −−fwrite option. <strong>The</strong> −−fread option loads such a<br />
dictionary and, if loaded, it takes precedence over a regular dictionary. This permits fast classification<br />
of messages without all the overhead of creating a full-fledged in-memory dictionary.<br />
Added memory-mapping of the fast dictionary when HAVE_MMAP is defined. In the interest of code<br />
commonality, the header fields are read from an istrstream bound to the memory mapped block, but<br />
access to the hash and word tables are pure pointer-whack.<br />
Fixed a typo in configure.in which caused a harmless but ugly warning when running the script.