12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

122 MAIL FOLDER ANNOYANCE-FILTER §139<br />

139. Each message in a folder begins with a line containing the text “From␣” starting in the first<br />

column. Well, more or less. . . . In the beginning there were BSD mail folders, in which messages were<br />

simply concatenated together with the start of each message indicated by a line beginning with the<br />

“From␣” sentinel. In this scheme, any line in a message body which matches this pattern must be<br />

quoted, usually by inserting a “>” character in column 1, but this is not universal. This was kind of<br />

ugly, and could cause problems when messages began to contain content other than human-readable<br />

text, so then there were Sun message folders, where each message header indicated the number of bytes<br />

in the message with a “Content−Length” header item. You can imagine how disastrous this was in<br />

the typical UNIX environment where people pass mail folders and messages through all kinds of text<br />

filters—’nuff said; better forgotten. <strong>The</strong>se days the most common form of text file mail folder is a<br />

compromise in which the basic BSD scheme is used, but the “From␣” sentinel only designates the start<br />

of a message if it appears following a blank line. This avoids quoting many cases in body copy, while<br />

remaining robust against editing and ease of parsing by simple programs.<br />

If BSDfolder is set, we follow the original BSD semantics and recognise any “From␣” as beginning a<br />

new message. Otherwise, we only treat the sentinel as denoting the start of message if it follows a blank<br />

line or appears at the start of the folder.<br />

Upon finding the start of a message, we increment the number of messages in the folder, mark the<br />

start of a new message, and set the inHeader flag to indicate we’re parsing the header section of the<br />

message.<br />

One complication is that some mail systems which store messages as files in a directory do not include<br />

the “From␣” sentinel at the start of message files. We use the expectingNewMessage flag to cope with<br />

this. This flag gets set at the start of every new file we begin to read (whether a concatenated mail<br />

folder or a file within a directory). When this flag is set, the first nonblank line in the file is considered<br />

the start of message, even if it isn’t the “From␣” sentinel.<br />

#define messageSentinel "From␣" /∗ First line of each message in folder ∗/<br />

〈 Check for start of new message in folder 139 〉 ≡<br />

#ifdef BSD_DIAG<br />

if (s.substr (0, (sizeof messageSentinel ) − 1) ≡ messageSentinel ) {<br />

if (¬BSDfolder ∧ ¬lastLineBlank ) {<br />

cerr ≪ "***␣NonBSD␣From␣line␣ditched:␣" ≪ s ≪ endl ;<br />

}<br />

}<br />

#endif<br />

if (((s.substr (0, (sizeof messageSentinel ) − 1) ≡ messageSentinel ) ∧ (BSDfolder ∨ lastLineBlank )) ∨<br />

(expectingNewMessage ∧ (s.length ( ) > 0) ∧ (¬isISOspace (s[0])))) {<br />

nMessages ++;<br />

newMessage = true ;<br />

expectingNewMessage = false ;<br />

inHeader = true ;<br />

multiPart = false ;<br />

inPartHeader = false ;<br />

partHeaderLines = 0;<br />

bodyContentType = bodyContentTypeCharset = bodyContentTypeName =<br />

bodyContentTransferEncoding = "";<br />

fromLine = s; /∗ Save last “From␣” line for diagnostics ∗/<br />

lastFromLine = lastMessageID = messageID = "";<br />

while (¬partBoundaryStack .empty ( )) {<br />

ostringstream os ;<br />

os ≪ "Orphaned␣part␣boundary␣on␣stack:␣\"" ≪ partBoundaryStack .top( ) ≪ "\"";<br />

reportParserDiagnostic(os );<br />

partBoundaryStack .pop( );

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!