12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§228 ANNOYANCE-FILTER MAIN PROGRAM 183<br />

228. If −−autoprune is specified, the memory consumed by the dictionary is estimated as tokens<br />

are added and, if the threshold is exceeded, all unique words are pruned from the dictionary. If, after<br />

the prune is complete, the dictionary still exceeds 90of beginning to thrash, pruning over and over to<br />

no effect. If this is the case, we automatically increase the −−autoprune setting by 25% to stave off<br />

thrashing (while, of course, running the risk of paging thrashing if physical memory is exceeded.<br />

〈 Prune unique words from dictionary if autoPrune threshold is exceeded 228 〉 ≡<br />

if ((autoPrune ≠ 0) ∧ (dict .estimateMemoryRequirement ( ) > autoPrune )) {<br />

if (verbose ) {<br />

cerr ≪ "Dictionary␣size␣" ≪ dict .estimateMemoryRequirement ( ) ≪<br />

";␣starting␣automatic␣prune." ≪ endl ;<br />

}<br />

dict .purge (1);<br />

if (dict .estimateMemoryRequirement ( ) > ((autoPrune ∗ 9)/10)) {<br />

cerr ≪ "Dictionary␣size␣after␣−−autoprune␣is␣larger␣than␣90%" ≪ endl ;<br />

cerr ≪ "of␣−−autoprune␣setting␣of␣" ≪ autoPrune ≪ "␣bytes." ≪ endl ;<br />

autoPrune = static cast〈unsigned int〉(autoPrune ∗ 1.25);<br />

cerr ≪ "Increasing␣−−autoprune␣threshold␣25%␣to␣" ≪ autoPrune ≪<br />

"␣to␣avoid␣thrashing." ≪ endl ;<br />

}<br />

}<br />

This code is used in section 227.<br />

229. <strong>The</strong> updateProbability function recomputes word probabilities in the dictionary. It should be<br />

called after any changes are made to the contents of the dictionary. Any operation which recomputes the<br />

probabilities makes us ineligible for optimising out probability computation loading the first dictionary,<br />

so we clear the singleDictionaryRead flag.<br />

〈 Global functions 184 〉 +≡<br />

static void updateProbability (void)<br />

{<br />

dict .computeJunkProbability (messageCount [dictionaryWord ::Mail ],<br />

messageCount [dictionaryWord ::Junk ], mailBias , minOccurrences );<br />

singleDictionaryRead = false ;<br />

}<br />

230. <strong>The</strong> printDictionary function dumps the dictionary in human-readable form to a specified output<br />

stream,<br />

〈 Global functions 184 〉 +≡<br />

static void printDictionary (ostream &os = cout )<br />

{<br />

updateProbability ( );<br />

os ≪ "Dictionary␣contains␣" ≪ dict .size ( ) ≪ "␣unique␣tokens." ≪ endl ;<br />

for (dictionary ::iterator dp = dict .begin ( ); dp ≠ dict .end ( ); dp ++) {<br />

dp ⃗ second .describe (os );<br />

}<br />

}

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!