12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§152 ANNOYANCE-FILTER MAIL FOLDER 131<br />

152. If we’re in the body of text encoded in a multiple-byte character set, pass the text through the<br />

interpreter to convert it into a form we can better recognise.<br />

〈 Decode multiple byte character set 152 〉 ≡<br />

if ((mbi ≠ Λ) ∧ (¬inHeader ) ∧ (¬inPartHeader )) {<br />

s = mbi ⃗ decodeLine (s);<br />

}<br />

This code is used in section 130.<br />

153. If we’ve just reached the end of a MIME part header, determine if the body which follows<br />

requires decoding. If so, activate the appropriate decoder and place it in the pipeline between the raw<br />

mail folder and our parsing code.<br />

〈 Activate MIME decoder if required 153 〉 ≡<br />

if (multiPart ) {<br />

assert(mdp ≡ Λ);<br />

#ifdef TYPE_LOG /∗ If TYPE_LOG is defined, we create a file containing all of the part properties<br />

we’ve seen. You can obtain a list of things you may need to worry about by processing one of<br />

the fields n of this file with a command like cut −fn /tmp/typelog.txt | sort | uniq. ∗/<br />

typeLog ≪ mimeContentType ≪ "\t" ≪ mimeContentTypeCharset ≪ "\t" ≪<br />

mimeContentTransferEncoding ≪ endl ;<br />

#endif<br />

〈 Check for change of sentinel within message 154 〉;<br />

〈 Check for application file types for which we have a decoder 155 〉;<br />

〈 Detect binary parts worth parsing for embedded ASCII strings 156 〉;<br />

〈 Test for Content-Types we always ignore 157 〉<br />

〈 Process Content-Types we are interested in parsing 158 〉;<br />

}<br />

This code is cited in section 256.<br />

This code is used in section 141.<br />

154. <strong>The</strong> sentinel which delimits parts of a multi-part message may be changed in the middle of the<br />

message by a Content−Type of multipart/alternative specifying a new boundary=. Detect this and<br />

change the part boundary on the fly. <strong>The</strong>se parts usually seem devoid of content, but just in case fake<br />

a content type of text/plain so anything which may be there gets looked at.<br />

〈 Check for change of sentinel within message 154 〉 ≡<br />

if (mimeContentType ≡ "multipart/alternative") {<br />

if (mimeContentTypeBoundary ≠ "") {<br />

partBoundaryStack .push (partBoundary );<br />

partBoundary = mimeContentTypeBoundary ;<br />

}<br />

else {<br />

if (Annotate (’d’)) {<br />

ostringstream os ;<br />

}<br />

}<br />

}<br />

os ≪ "Boundary␣missing␣from␣Content−Type␣of␣multipart/alternative.";<br />

reportParserDiagnostic(os );<br />

This code is used in section 153.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!