12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§144 ANNOYANCE-FILTER MAIL FOLDER 125<br />

144. A clever junk mail author might try to evade filtering based on the header items we include in<br />

the −−transcript by including his own, on the assumption that a downstream filter would not detect<br />

the multiple items and filter on the first one it found. To prevent this, and to make it more convenient<br />

when feeding transcripts back through the program (for testing the effects of different settings or for<br />

training on new messages), we detect header lines which begin with our Xfile sentinel and completely<br />

delete them from the transcript. <strong>The</strong> isSpoofedHeader flag causes continuation lines, if any, to be deleted<br />

as well. (At this writing we never use continuations of our header items, but better safe than sorry.)<br />

〈 Check for lines with our sentinel already present in the header 144 〉 ≡<br />

bool isSpoofedHeader = false ;<br />

if (inHeader ) {<br />

string sc = s, scx = Xfile ;<br />

}<br />

stringCanonicalise (sc);<br />

stringCanonicalise (scx );<br />

scx += ’−’;<br />

if (sc.substr (0, scx .length ( )) ≡ scx ) {<br />

if (tlist ≠ Λ) {<br />

tlist ⃗ pop back ( );<br />

}<br />

isSpoofedHeader = true ;<br />

}<br />

This code is cited in section 256.<br />

This code is used in section 143.<br />

145. When processing mail folders in bulk, as when generating a dictionary, we want to identify parser<br />

diagnostics with the message which they refer to. While processing the header, we save the Message−ID<br />

tag, which which reportParserDiagnostic prefixes the message in its −−verbose mode output. Messages<br />

which lack a Message−ID header item must be identified from the “From␣” line. RFC 2822 specifies<br />

that Message−ID should be present, but is an optional field.<br />

〈 Save Message-ID for diagnostics 145 〉 ≡<br />

{<br />

string arg ;<br />

}<br />

if (inHeader ∧ compareHeaderField (s, "message−id", arg )) {<br />

messageID = arg ;<br />

lastMessageID = "";<br />

}<br />

This code is used in section 141.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!