12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

§4 ANNOYANCE-FILTER OPTIONS 7<br />

−−phraselimit n<br />

Limit the length of phrases assembled according to the −−phrasemin and −−phrasemax<br />

options to n characters. This permits ignoring “phrases” consisting of gibberish from mail<br />

headers and un-decoded content. In most cases these items will be discarded by a −−prune<br />

in any case, but skipping them as they are generated keeps the dictionary from bloating<br />

in the first place. <strong>The</strong> default value is 48 characters.<br />

−−phrasemin n<br />

Calculate probabilities of phrases consisting of a minumum of n words. <strong>The</strong> default of 1<br />

calculates probabilities for single words.<br />

−−phrasemax n<br />

Calculate probabilities of phrases consisting of a maximum of n words. <strong>The</strong> default of 1<br />

calculates probabilities for single words. If you set this too large, the dictionary may grow<br />

to an absurd size.<br />

−−plot fname<br />

After loading the dictionary, create a plot in fname.png of the histogram of words, binned<br />

by their probability of appearance in junk mail. In order to generate the histogram the<br />

GNUPLOT and NETPbm utilities must be installed on the system; if they are absent, the<br />

−−plot option will not be available.<br />

−−pop3port n<br />

<strong>The</strong> POP3 proxy server activated by a subsequent −−pop3server option will listen for<br />

connections on port n. If no −−pop3port is specified, the server will listen on the default<br />

port of 9110. On most systems, you’ll have to run the program as root if you wish the<br />

proxy server to listen on a port numbered 1023 or less.<br />

−−pop3server server[:port]<br />

Activate a POP3 proxy server which relays requests made on the previously specified<br />

−−pop3port or the default of 9110 if no port is specified, to the specified server, which<br />

may be given either as an IP address in “dotted quad” notion such as 10.89.11.131 or a<br />

fully-qualified domain name like pop.someisp.tld. <strong>The</strong> port on which the server listens<br />

for POP3 connections may be specified after the server prefixed by a colon (“:”); if no<br />

port is specified, the IANA assigned POP3 port 110 will be used. <strong>The</strong> POP3 proxy server<br />

will pass each message received on behalf of a requestor through the classifier and return<br />

the annotated transcript to the requestor, who may then filter it based on the classification<br />

appended to the message header. You must load a dictionary before activating the POP3<br />

proxy server, and the −−pop3server option must be the last on the command line. <strong>The</strong><br />

server continues to run and service requests until manually terminated.<br />

−−pop3trace<br />

Write a trace of POP3 proxy server operations to standard error. Each trace message<br />

(apart from the dump of the body of multi-line replies to clients) is prefixed with the label<br />

“POP3:␣”.<br />

−−prune<br />

After loading the dictionary from −−mail and −−junk folders, this option discards words<br />

which appear sufficiently infrequently that their probability cannot be reliably estimated.<br />

One usually −−prunes the dictionary before using −−write to save it for subsequent runs.<br />

−−ptrace<br />

Include a token-by-token trace in the −−pdiag output file. This helps when adjusting<br />

the parser’s criteria for recognising tokens. Setting this option without also specifying a<br />

−−pdiag file will have no effect other than perhaps to exercise your fingers typing it on<br />

the command line.<br />

−−read, −r fname<br />

Load a dictionary (previously created with the −−write option) from file fname.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!