12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

§189 ANNOYANCE-FILTER CLASSIFY MESSAGE 159<br />

189. Given the list of most signficant tokens, we now use Bayes’ theorem to compute the aggregate<br />

probability the message is junk. If p i is the probability word i of the most significant n (nExtremal)<br />

words in a message appears in junk mail, the probability the message as a whole is junk is:<br />

n∏<br />

p i<br />

i=1<br />

n∏ ∏<br />

p i + n (1 − p i )<br />

i=1<br />

〈 Compute probability message is junk from most significant tokens 189 〉 ≡<br />

unsigned int n = min (static cast〈multimap〈double, string〉::size type 〉(nExtremal ),<br />

rtokens .size ( ));<br />

multimap〈double, string〉::const reverse iterator rp = rtokens .rbegin ( );<br />

double probP = 1, probQ = 1;<br />

i=1<br />

if (verbose ) {<br />

cerr ≪ "Rank␣␣␣Probability␣␣␣Token" ≪ endl ;<br />

}<br />

for (unsigned int i = 0; i < n; i++) {<br />

double p;<br />

if (fd ⃗ isDictionaryLoaded ( )) {<br />

p = fd ⃗ find (rp ⃗ second );<br />

if (p < 0) {<br />

p = unknownWordProbability ;<br />

}<br />

}<br />

else {<br />

dictionary ::iterator dp = d ⃗ find (rp ⃗ second );<br />

p = ((dp ≡ d ⃗ end ( )) ∨ (dp ⃗ second .getJunkProbability ( ) < 0)) ? unknownWordProbability :<br />

dp ⃗ second .getJunkProbability ( );<br />

}<br />

if (verbose ) {<br />

cerr ≪ setw (3) ≪ setiosflags (ios ::right ) ≪ (i + 1) ≪ "␣␣␣␣␣␣" ≪ setw (9) ≪ setprecision (5) ≪<br />

setiosflags (ios ::left ) ≪ p ≪ "␣␣" ≪ rp ⃗ second ≪ endl ;<br />

}<br />

probP ∗= p;<br />

probQ ∗= (1 − p);<br />

rp ++;<br />

}<br />

junkProb = probP /(probP + probQ );<br />

if (verbose ) {<br />

cerr ≪ "ProbP␣=␣" ≪ probP ≪ ",␣ProbQ␣=␣" ≪ probQ ≪ endl ;<br />

}<br />

This code is used in section 185.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!