The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
§152 ANNOYANCE-FILTER MAIL FOLDER 131<br />
152. If we’re in the body of text encoded in a multiple-byte character set, pass the text through the<br />
interpreter to convert it into a form we can better recognise.<br />
〈 Decode multiple byte character set 152 〉 ≡<br />
if ((mbi ≠ Λ) ∧ (¬inHeader ) ∧ (¬inPartHeader )) {<br />
s = mbi ⃗ decodeLine (s);<br />
}<br />
This code is used in section 130.<br />
153. If we’ve just reached the end of a MIME part header, determine if the body which follows<br />
requires decoding. If so, activate the appropriate decoder and place it in the pipeline between the raw<br />
mail folder and our parsing code.<br />
〈 Activate MIME decoder if required 153 〉 ≡<br />
if (multiPart ) {<br />
assert(mdp ≡ Λ);<br />
#ifdef TYPE_LOG /∗ If TYPE_LOG is defined, we create a file containing all of the part properties<br />
we’ve seen. You can obtain a list of things you may need to worry about by processing one of<br />
the fields n of this file with a command like cut −fn /tmp/typelog.txt | sort | uniq. ∗/<br />
typeLog ≪ mimeContentType ≪ "\t" ≪ mimeContentTypeCharset ≪ "\t" ≪<br />
mimeContentTransferEncoding ≪ endl ;<br />
#endif<br />
〈 Check for change of sentinel within message 154 〉;<br />
〈 Check for application file types for which we have a decoder 155 〉;<br />
〈 Detect binary parts worth parsing for embedded ASCII strings 156 〉;<br />
〈 Test for Content-Types we always ignore 157 〉<br />
〈 Process Content-Types we are interested in parsing 158 〉;<br />
}<br />
This code is cited in section 256.<br />
This code is used in section 141.<br />
154. <strong>The</strong> sentinel which delimits parts of a multi-part message may be changed in the middle of the<br />
message by a Content−Type of multipart/alternative specifying a new boundary=. Detect this and<br />
change the part boundary on the fly. <strong>The</strong>se parts usually seem devoid of content, but just in case fake<br />
a content type of text/plain so anything which may be there gets looked at.<br />
〈 Check for change of sentinel within message 154 〉 ≡<br />
if (mimeContentType ≡ "multipart/alternative") {<br />
if (mimeContentTypeBoundary ≠ "") {<br />
partBoundaryStack .push (partBoundary );<br />
partBoundary = mimeContentTypeBoundary ;<br />
}<br />
else {<br />
if (Annotate (’d’)) {<br />
ostringstream os ;<br />
}<br />
}<br />
}<br />
os ≪ "Boundary␣missing␣from␣Content−Type␣of␣multipart/alternative.";<br />
reportParserDiagnostic(os );<br />
This code is used in section 153.