The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
§6 ANNOYANCE-FILTER INTEGRATING WITH PROCMAIL 11<br />
<strong>The</strong> action line which pipes the message to annoyance−filter is continued onto a second line here in order<br />
to fit on the page. Procmail permits continuations of this form, but will equally accept the command all on<br />
one line with the backslash removed.<br />
6.3.2. Routing by annoyance−filter classification<br />
Once the message has been filtered by annoyance−filter, subsequent rules can test for its classification<br />
and route the message accordingly. <strong>The</strong> following rules dispatch messages it classifies as junk to a junk<br />
folder used by the blacklist, while messages judged to be legitimate mail and those with an intermediate<br />
probability are sent to the user’s mailbox. (With the default settings, annoyance−filter will always classify<br />
a message as mail or junk, but if the −−threshjunk and −−threshmail settings are changed to as to create<br />
a gap between them, intermediate classification can occur.) Actually, the latter two recipes could be omitted<br />
since any message which fails to trigger any Procmail rule is sent to the user’s mailbox by default. <strong>The</strong><br />
variable $ORGMAIL is defined by Procmail as the user’s mailbox; using it avoids using the specific path name<br />
which is dependent on the user name and mail system configuration.<br />
:0 H:<br />
* ^X-<strong>Annoyance</strong>-<strong>Filter</strong>-Classification: Junk<br />
junk<br />
:0 H:<br />
* ^X-<strong>Annoyance</strong>-<strong>Filter</strong>-Classification: Mail<br />
$ORGMAIL<br />
:0 H:<br />
* ^X-<strong>Annoyance</strong>-<strong>Filter</strong>-Classification: Indeterminate<br />
$ORGMAIL<br />
Even if you set the mail and junk probabilities so that messages can be classified as “Indeterminate”,<br />
you’re unlikely to see many so categorised—as long as the collections of mail and junk you used to train<br />
annoyance−filter are sufficiently large and representative, the vast majority of messages will usually be<br />
scored near the extremes of probability. If you’re seeing a lot of Indeterminate messages, you should sort<br />
them manually, add them to the appropriate collection, and re-train annoyance−filter.<br />
If you have other Procmail recipes for handling specific categories of mail, you would normally place<br />
the annoyance−filter related recipes after them, at the very end of the procmailrc file. That way<br />
annoyance−filter’s evalution is used as the final guardian at the gate before a message is delivered to<br />
your mailbox.<br />
6.3.3. Other useful .procmailrc rules<br />
<strong>The</strong> following subsections have nothing at all to do with annoyance−filter, really. You can set up a<br />
.procmailrc file based exclusively on annoyance−filter classifications as described above. Still, in many<br />
cases a few Procmail rules are worthwile in addition to annoyance−filter filtering. Here are some frequently<br />
used categories. You would normally place these rules before the annoyance−filter rules discussed in section<br />
3.2.<br />
6.3.3.1. Whitelist<br />
Most people have a short list of folks with whom they correspond regularly. It’s embarrassing if the content<br />
of a message from one of them is mistakenly identified as junk mail. To prevent this, define a “whitelist” as<br />
the first rule in your Procmail configuration after the filter command; messages which match its patterns<br />
avoid further scrutiny and are delivered directly to your mailbox. You should generally include your own<br />
address in the whitelist, as well as addresses of administrative accounts on machines you’re responsible for,<br />
but be careful: junk mailers increasingly use sender addresses such as root to exploit whitelists. Here’s<br />
user blohard’s whitelist definition. Multiple Procmail rules are normally combined with a logical AND (∧)<br />
operation. Since the whitelist requires an OR (∨) operation, we manufacture one by a trivial application of<br />
Procmail’s weighted scoring facilities. Procmail patterns are regular expressions identical to those used by<br />
egrep, so metacharacters such as “.” must be quoted to be treated literally in patterns.