The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
§4 ANNOYANCE-FILTER OPTIONS 7<br />
−−phraselimit n<br />
Limit the length of phrases assembled according to the −−phrasemin and −−phrasemax<br />
options to n characters. This permits ignoring “phrases” consisting of gibberish from mail<br />
headers and un-decoded content. In most cases these items will be discarded by a −−prune<br />
in any case, but skipping them as they are generated keeps the dictionary from bloating<br />
in the first place. <strong>The</strong> default value is 48 characters.<br />
−−phrasemin n<br />
Calculate probabilities of phrases consisting of a minumum of n words. <strong>The</strong> default of 1<br />
calculates probabilities for single words.<br />
−−phrasemax n<br />
Calculate probabilities of phrases consisting of a maximum of n words. <strong>The</strong> default of 1<br />
calculates probabilities for single words. If you set this too large, the dictionary may grow<br />
to an absurd size.<br />
−−plot fname<br />
After loading the dictionary, create a plot in fname.png of the histogram of words, binned<br />
by their probability of appearance in junk mail. In order to generate the histogram the<br />
GNUPLOT and NETPbm utilities must be installed on the system; if they are absent, the<br />
−−plot option will not be available.<br />
−−pop3port n<br />
<strong>The</strong> POP3 proxy server activated by a subsequent −−pop3server option will listen for<br />
connections on port n. If no −−pop3port is specified, the server will listen on the default<br />
port of 9110. On most systems, you’ll have to run the program as root if you wish the<br />
proxy server to listen on a port numbered 1023 or less.<br />
−−pop3server server[:port]<br />
Activate a POP3 proxy server which relays requests made on the previously specified<br />
−−pop3port or the default of 9110 if no port is specified, to the specified server, which<br />
may be given either as an IP address in “dotted quad” notion such as 10.89.11.131 or a<br />
fully-qualified domain name like pop.someisp.tld. <strong>The</strong> port on which the server listens<br />
for POP3 connections may be specified after the server prefixed by a colon (“:”); if no<br />
port is specified, the IANA assigned POP3 port 110 will be used. <strong>The</strong> POP3 proxy server<br />
will pass each message received on behalf of a requestor through the classifier and return<br />
the annotated transcript to the requestor, who may then filter it based on the classification<br />
appended to the message header. You must load a dictionary before activating the POP3<br />
proxy server, and the −−pop3server option must be the last on the command line. <strong>The</strong><br />
server continues to run and service requests until manually terminated.<br />
−−pop3trace<br />
Write a trace of POP3 proxy server operations to standard error. Each trace message<br />
(apart from the dump of the body of multi-line replies to clients) is prefixed with the label<br />
“POP3:␣”.<br />
−−prune<br />
After loading the dictionary from −−mail and −−junk folders, this option discards words<br />
which appear sufficiently infrequently that their probability cannot be reliably estimated.<br />
One usually −−prunes the dictionary before using −−write to save it for subsequent runs.<br />
−−ptrace<br />
Include a token-by-token trace in the −−pdiag output file. This helps when adjusting<br />
the parser’s criteria for recognising tokens. Setting this option without also specifying a<br />
−−pdiag file will have no effect other than perhaps to exercise your fingers typing it on<br />
the command line.<br />
−−read, −r fname<br />
Load a dictionary (previously created with the −−write option) from file fname.