12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

§6 ANNOYANCE-FILTER INTEGRATING WITH PROCMAIL 11<br />

<strong>The</strong> action line which pipes the message to annoyance−filter is continued onto a second line here in order<br />

to fit on the page. Procmail permits continuations of this form, but will equally accept the command all on<br />

one line with the backslash removed.<br />

6.3.2. Routing by annoyance−filter classification<br />

Once the message has been filtered by annoyance−filter, subsequent rules can test for its classification<br />

and route the message accordingly. <strong>The</strong> following rules dispatch messages it classifies as junk to a junk<br />

folder used by the blacklist, while messages judged to be legitimate mail and those with an intermediate<br />

probability are sent to the user’s mailbox. (With the default settings, annoyance−filter will always classify<br />

a message as mail or junk, but if the −−threshjunk and −−threshmail settings are changed to as to create<br />

a gap between them, intermediate classification can occur.) Actually, the latter two recipes could be omitted<br />

since any message which fails to trigger any Procmail rule is sent to the user’s mailbox by default. <strong>The</strong><br />

variable $ORGMAIL is defined by Procmail as the user’s mailbox; using it avoids using the specific path name<br />

which is dependent on the user name and mail system configuration.<br />

:0 H:<br />

* ^X-<strong>Annoyance</strong>-<strong>Filter</strong>-Classification: Junk<br />

junk<br />

:0 H:<br />

* ^X-<strong>Annoyance</strong>-<strong>Filter</strong>-Classification: Mail<br />

$ORGMAIL<br />

:0 H:<br />

* ^X-<strong>Annoyance</strong>-<strong>Filter</strong>-Classification: Indeterminate<br />

$ORGMAIL<br />

Even if you set the mail and junk probabilities so that messages can be classified as “Indeterminate”,<br />

you’re unlikely to see many so categorised—as long as the collections of mail and junk you used to train<br />

annoyance−filter are sufficiently large and representative, the vast majority of messages will usually be<br />

scored near the extremes of probability. If you’re seeing a lot of Indeterminate messages, you should sort<br />

them manually, add them to the appropriate collection, and re-train annoyance−filter.<br />

If you have other Procmail recipes for handling specific categories of mail, you would normally place<br />

the annoyance−filter related recipes after them, at the very end of the procmailrc file. That way<br />

annoyance−filter’s evalution is used as the final guardian at the gate before a message is delivered to<br />

your mailbox.<br />

6.3.3. Other useful .procmailrc rules<br />

<strong>The</strong> following subsections have nothing at all to do with annoyance−filter, really. You can set up a<br />

.procmailrc file based exclusively on annoyance−filter classifications as described above. Still, in many<br />

cases a few Procmail rules are worthwile in addition to annoyance−filter filtering. Here are some frequently<br />

used categories. You would normally place these rules before the annoyance−filter rules discussed in section<br />

3.2.<br />

6.3.3.1. Whitelist<br />

Most people have a short list of folks with whom they correspond regularly. It’s embarrassing if the content<br />

of a message from one of them is mistakenly identified as junk mail. To prevent this, define a “whitelist” as<br />

the first rule in your Procmail configuration after the filter command; messages which match its patterns<br />

avoid further scrutiny and are delivered directly to your mailbox. You should generally include your own<br />

address in the whitelist, as well as addresses of administrative accounts on machines you’re responsible for,<br />

but be careful: junk mailers increasingly use sender addresses such as root to exploit whitelists. Here’s<br />

user blohard’s whitelist definition. Multiple Procmail rules are normally combined with a logical AND (∧)<br />

operation. Since the whitelist requires an OR (∨) operation, we manufacture one by a trivial application of<br />

Procmail’s weighted scoring facilities. Procmail patterns are regular expressions identical to those used by<br />

egrep, so metacharacters such as “.” must be quoted to be treated literally in patterns.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!