12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

158 CLASSIFY MESSAGE ANNOYANCE-FILTER §188<br />

188. Once we’ve obtained a list of tokens in the message, we now wish to filter it by the significance of<br />

the probability that a token appears in junk or legitimate mail. This is simply the absolute value of the<br />

difference of the token’s junkProbability from 0.5—the probability for a token equally likely to appear<br />

in junk and legitimate mail. We construct a multimap called rtokens which maps this significance<br />

value to the token string; since any number of tokens may have the same significance, we must use a<br />

multimap as opposed to a map.<br />

We count on multimap being an ordered collection class which, when traversed by its reverse iterator,<br />

will return tokens in order of significance. This assumption may be unwarranted, but it’s valid for all<br />

the STL implementations I’m aware of (and is essentially guaranteed since the fact that multimap<br />

requires only the < operator for ordering effectively mandates a binary tree implementation).<br />

〈 Classify message tokens by probability of significance 188 〉 ≡<br />

multimap〈double, string〉 rtokens ;<br />

for (set〈string〉::iterator t = utokens .begin ( ); t ≠ utokens .end ( ); t++) {<br />

double pdiff ;<br />

}<br />

dictionary ::iterator dp;<br />

if (fd ⃗ isDictionaryLoaded ( )) {<br />

pdiff = fd ⃗ find (∗t);<br />

if (pdiff < 0) {<br />

pdiff = unknownWordProbability ;<br />

}<br />

pdiff = abs (pdiff − 0.5);<br />

}<br />

else {<br />

if (((dp = d ⃗ find (∗t)) ≠ d ⃗ end ( )) ∧ (dp ⃗ second .getJunkProbability ( ) ≥ 0)) {<br />

pdiff = abs (dp ⃗ second .getJunkProbability ( ) − 0.5);<br />

}<br />

else {<br />

pdiff = abs (unknownWordProbability − 0.5);<br />

}<br />

}<br />

rtokens .insert (make pair (pdiff , ∗t));<br />

This code is cited in section 256.<br />

This code is used in section 185.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!