The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
§228 ANNOYANCE-FILTER MAIN PROGRAM 183<br />
228. If −−autoprune is specified, the memory consumed by the dictionary is estimated as tokens<br />
are added and, if the threshold is exceeded, all unique words are pruned from the dictionary. If, after<br />
the prune is complete, the dictionary still exceeds 90of beginning to thrash, pruning over and over to<br />
no effect. If this is the case, we automatically increase the −−autoprune setting by 25% to stave off<br />
thrashing (while, of course, running the risk of paging thrashing if physical memory is exceeded.<br />
〈 Prune unique words from dictionary if autoPrune threshold is exceeded 228 〉 ≡<br />
if ((autoPrune ≠ 0) ∧ (dict .estimateMemoryRequirement ( ) > autoPrune )) {<br />
if (verbose ) {<br />
cerr ≪ "Dictionary␣size␣" ≪ dict .estimateMemoryRequirement ( ) ≪<br />
";␣starting␣automatic␣prune." ≪ endl ;<br />
}<br />
dict .purge (1);<br />
if (dict .estimateMemoryRequirement ( ) > ((autoPrune ∗ 9)/10)) {<br />
cerr ≪ "Dictionary␣size␣after␣−−autoprune␣is␣larger␣than␣90%" ≪ endl ;<br />
cerr ≪ "of␣−−autoprune␣setting␣of␣" ≪ autoPrune ≪ "␣bytes." ≪ endl ;<br />
autoPrune = static cast〈unsigned int〉(autoPrune ∗ 1.25);<br />
cerr ≪ "Increasing␣−−autoprune␣threshold␣25%␣to␣" ≪ autoPrune ≪<br />
"␣to␣avoid␣thrashing." ≪ endl ;<br />
}<br />
}<br />
This code is used in section 227.<br />
229. <strong>The</strong> updateProbability function recomputes word probabilities in the dictionary. It should be<br />
called after any changes are made to the contents of the dictionary. Any operation which recomputes the<br />
probabilities makes us ineligible for optimising out probability computation loading the first dictionary,<br />
so we clear the singleDictionaryRead flag.<br />
〈 Global functions 184 〉 +≡<br />
static void updateProbability (void)<br />
{<br />
dict .computeJunkProbability (messageCount [dictionaryWord ::Mail ],<br />
messageCount [dictionaryWord ::Junk ], mailBias , minOccurrences );<br />
singleDictionaryRead = false ;<br />
}<br />
230. <strong>The</strong> printDictionary function dumps the dictionary in human-readable form to a specified output<br />
stream,<br />
〈 Global functions 184 〉 +≡<br />
static void printDictionary (ostream &os = cout )<br />
{<br />
updateProbability ( );<br />
os ≪ "Dictionary␣contains␣" ≪ dict .size ( ) ≪ "␣unique␣tokens." ≪ endl ;<br />
for (dictionary ::iterator dp = dict .begin ( ); dp ≠ dict .end ( ); dp ++) {<br />
dp ⃗ second .describe (os );<br />
}<br />
}