The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
122 MAIL FOLDER ANNOYANCE-FILTER §139<br />
139. Each message in a folder begins with a line containing the text “From␣” starting in the first<br />
column. Well, more or less. . . . In the beginning there were BSD mail folders, in which messages were<br />
simply concatenated together with the start of each message indicated by a line beginning with the<br />
“From␣” sentinel. In this scheme, any line in a message body which matches this pattern must be<br />
quoted, usually by inserting a “>” character in column 1, but this is not universal. This was kind of<br />
ugly, and could cause problems when messages began to contain content other than human-readable<br />
text, so then there were Sun message folders, where each message header indicated the number of bytes<br />
in the message with a “Content−Length” header item. You can imagine how disastrous this was in<br />
the typical UNIX environment where people pass mail folders and messages through all kinds of text<br />
filters—’nuff said; better forgotten. <strong>The</strong>se days the most common form of text file mail folder is a<br />
compromise in which the basic BSD scheme is used, but the “From␣” sentinel only designates the start<br />
of a message if it appears following a blank line. This avoids quoting many cases in body copy, while<br />
remaining robust against editing and ease of parsing by simple programs.<br />
If BSDfolder is set, we follow the original BSD semantics and recognise any “From␣” as beginning a<br />
new message. Otherwise, we only treat the sentinel as denoting the start of message if it follows a blank<br />
line or appears at the start of the folder.<br />
Upon finding the start of a message, we increment the number of messages in the folder, mark the<br />
start of a new message, and set the inHeader flag to indicate we’re parsing the header section of the<br />
message.<br />
One complication is that some mail systems which store messages as files in a directory do not include<br />
the “From␣” sentinel at the start of message files. We use the expectingNewMessage flag to cope with<br />
this. This flag gets set at the start of every new file we begin to read (whether a concatenated mail<br />
folder or a file within a directory). When this flag is set, the first nonblank line in the file is considered<br />
the start of message, even if it isn’t the “From␣” sentinel.<br />
#define messageSentinel "From␣" /∗ First line of each message in folder ∗/<br />
〈 Check for start of new message in folder 139 〉 ≡<br />
#ifdef BSD_DIAG<br />
if (s.substr (0, (sizeof messageSentinel ) − 1) ≡ messageSentinel ) {<br />
if (¬BSDfolder ∧ ¬lastLineBlank ) {<br />
cerr ≪ "***␣NonBSD␣From␣line␣ditched:␣" ≪ s ≪ endl ;<br />
}<br />
}<br />
#endif<br />
if (((s.substr (0, (sizeof messageSentinel ) − 1) ≡ messageSentinel ) ∧ (BSDfolder ∨ lastLineBlank )) ∨<br />
(expectingNewMessage ∧ (s.length ( ) > 0) ∧ (¬isISOspace (s[0])))) {<br />
nMessages ++;<br />
newMessage = true ;<br />
expectingNewMessage = false ;<br />
inHeader = true ;<br />
multiPart = false ;<br />
inPartHeader = false ;<br />
partHeaderLines = 0;<br />
bodyContentType = bodyContentTypeCharset = bodyContentTypeName =<br />
bodyContentTransferEncoding = "";<br />
fromLine = s; /∗ Save last “From␣” line for diagnostics ∗/<br />
lastFromLine = lastMessageID = messageID = "";<br />
while (¬partBoundaryStack .empty ( )) {<br />
ostringstream os ;<br />
os ≪ "Orphaned␣part␣boundary␣on␣stack:␣\"" ≪ partBoundaryStack .top( ) ≪ "\"";<br />
reportParserDiagnostic(os );<br />
partBoundaryStack .pop( );