12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§191 ANNOYANCE-FILTER CLASSIFY MESSAGE 161<br />

191. If we’re producing a message transcript, just before writing it add the annotations to the end of<br />

the header which indicate the junk probability and classification of the message based on the threshold<br />

settings. After these, other annotations requested by the −−annotate option are appended.<br />

<strong>The</strong> test for the end of the message header where we insert the annotations is a little curious. When<br />

we’re processing a message received from a POP3Proxy server, the transcript will contain the CR from<br />

the CR/LF termination sequences as required by POP3. (<strong>The</strong> final line feed will have been stripped by<br />

getline as the message was read.) Preserving these terminators allows us to use the standard mechanisms<br />

of mailFolder without lots of special flags, so we deem a line the end of the header if it’s either zero<br />

length (read from a UNIX mail folder with getline or if it contains a single CR (received from a POP3<br />

server). In the latter case, we set transEndl so as terminate annotations we add to the transcript with<br />

CR/LF as well.<br />

〈 Add annotation to message transcript 191 〉 ≡<br />

ostringstream os ;<br />

list〈string〉::iterator p;<br />

string transEndl = ""; /∗ Find the end of the header in the message. If this fails simply append<br />

the annotations to the end of the message. ∗/<br />

for (p = messageTranscript .begin ( ); p ≠ messageTranscript .end ( ); p++) {<br />

if (p ⃗ length ( ) ≡ 0) {<br />

break;<br />

}<br />

if (∗p ≡ "\r") {<br />

transEndl = "\r";<br />

break;<br />

}<br />

}<br />

double jp = junkProb; /∗ If the probability is sufficiently small it to be edited in scientific<br />

notation, force it to zero so it’s easier to parse. ∗/<br />

if (jp < 0.001) {<br />

jp = 0;<br />

}<br />

os ≪ Xfile ≪ "−Junk−Probability:␣" ≪ setprecision (3) ≪ jp ≪ transEndl ;<br />

messageTranscript .insert (p, os .str ( ));<br />

os .str ("");<br />

os ≪ Xfile ≪ "−Classification:␣";<br />

if (junkProb ≥ junkThreshold ) {<br />

os ≪ "Junk";<br />

}<br />

else if (junkProb ≤ mailThreshold ) {<br />

os ≪ "Mail";<br />

}<br />

else {<br />

os ≪ "Indeterminate";<br />

}<br />

os ≪ transEndl ;<br />

messageTranscript .insert (p, os .str ( ));<br />

if (Annotate (’w’)) {<br />

addSignificantWordDiagnostics (messageTranscript , p, rtokens , transEndl );<br />

}<br />

if (Annotate (’p’) ∨ Annotate (’d’)) {<br />

while (¬parserDiagnostics .empty ( )) {<br />

ostringstream os ;

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!