12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§27 ANNOYANCE-FILTER DICTIONARY 31<br />

27. Plot a histogram of the distribution of words in the dictionary by probability. Words with negative<br />

probability are ignored, so there is no need to purge before plotting.<br />

〈 Class implementations 11 〉 +≡<br />

#ifdef HAVE_PLOT_UTILITIES<br />

#define PLOT_DEBUG<br />

void dictionary ::plotProbabilityHistogram (string fileName , unsigned int nBins ) const<br />

{<br />

if (verbose ) {<br />

cerr ≪ "Plotting␣probability␣histogram␣to␣" ≪ fileName ≪ ".png" ≪ endl ;<br />

}<br />

ofstream gp((fileName + ".gp").c str ( )), dat ((fileName + ".dat").c str ( ));<br />

〈 Build histogram of word probabilities 28 〉;<br />

〈 Write GNUPLOT data table for probability histogram 29 〉;<br />

/∗ Create GNUPLOT instructions to plot data ∗/<br />

gp ≪ "set␣term␣pbm␣small␣color" ≪ endl ;<br />

gp ≪ "set␣ylabel␣\"Number␣of␣Words\"" ≪ endl ;<br />

gp ≪ "set␣xlabel␣\"Probability\"" ≪ endl ;<br />

gp ≪ "plot␣\"" ≪ fileName ≪ ".dat\"␣using␣1:2␣title␣\"\"␣with␣boxes" ≪ endl ;<br />

string command ("gnuplot␣");<br />

command += fileName + ".gp␣|␣pnmtopng␣>" + fileName + ".png";<br />

#ifdef PLOT_DEBUG<br />

cout ≪ command ≪ endl ;<br />

#else<br />

command += "␣2>/dev/null";<br />

#endif<br />

gp.close ( );<br />

dat .close ( );<br />

system (command .c str ( ));<br />

#ifndef PLOT_DEBUG /∗ Delete the temporary files used to create the plot ∗/<br />

remove ((fileName + ".gp").c str ( ));<br />

remove ((fileName + ".dat").c str ( ));<br />

#endif<br />

}<br />

#endif /∗ HAVE_PLOT_UTILITIES ∗/<br />

28. Walk through the dictionary and bin the probabilities of words into nBins equally sized bins and<br />

compute a histogram of the numbers in each bin.<br />

〈 Build histogram of word probabilities 28 〉 ≡<br />

vector〈unsigned int〉 hist (nBins );<br />

for (const iterator mp = begin ( ); mp ≠ end ( ); mp ++) {<br />

if (mp ⃗ second .getJunkProbability ( ) ≥ 0) {<br />

unsigned int bin = static cast〈unsigned int〉(mp ⃗ second .getJunkProbability ( ) ∗ nBins );<br />

}<br />

}<br />

hist [bin ]++;<br />

This code is used in section 27.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!