The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
16 A BRIEF HISTORY OF ANNOYANCE-FILTER ANNOYANCE-FILTER §9<br />
deep enough for high diving. No more laps round the worry room for me!<br />
I must confess to some inside information in this regard. While working for a regrettable employer in an<br />
odious swamp, I’d twigged to the fact that network television advertisers tagged their commercials with a<br />
signature in the vertical retrace interval to permit audit bureaux to measure how many network affiliates<br />
actually broadcast each commercial. This tag appeared to me the Achilles’ heel of television advertising.<br />
As long as one could distinguish tagged commercials from an un-tagged program, it would be more or less<br />
straightforward to detect when a commercial was being transmitted and pause the VCR until the program<br />
resumed.<br />
If only. . . . In reality, only nationally broadcast commercials bore the tag, and only some of them. Local<br />
commercials were never tagged. This created a difficult marketing dilemma for my grand scheme. While it<br />
might have been possible to block some of the most ubiquitous and irritating commercials on mass-market<br />
network series, the bottom feeders who watch those shows probably enjoyed the commercials and wouldn’t<br />
be prospects for my gadget, while those like myself, infuriated by incessant commercials interrupting late<br />
night movies, would find the device ineffective since local commercials on independent stations were never<br />
tagged. Real-time analysis of video or even audio in the 1970’s and early 80’s was technologically out of<br />
the question for a product aimed at a mass consumer market. So, I put the idea of an annoyance filter for<br />
television aside and occupied myself with other endeavours.<br />
We now arrive at the late 1980’s. I’d spent the last decade or so filling up the money bin more or less flat<br />
out, and having reached a level I judged more than adequate, I began to turn my attention to matters I’d<br />
neglected during those laser-focused years.<br />
Writing science fiction, for one thing. <strong>The</strong>re was something about the advertising filter which had dug its<br />
way into my brain so deeply that nothing could dislodge it. <strong>The</strong> year is 1989; the Berlin Wall is about to<br />
tumble; and I’m scribbling a story about two programmers spending the downtime between Christmas and<br />
New Year’s Day (the period when I’d accomplished about half of my own productive work over the previous<br />
half decade) prowling the nascent Internet for evidence of an extraterrestrial message already received, but<br />
not recognised as such. In<br />
We’ll Return, After this Message,<br />
it is an annoyance filter which recognises an extraterrestrial message for what it is, advertising, and as<br />
von Mises observed, distinguishable by its own strident clamouring for attention.<br />
A decade later, in the very years in which I set my science fiction story, I launched my own search for a<br />
message from our Creator hidden in the most obvious of locations—no results so far. Yet still I scour the<br />
Net.<br />
Which brings us, more or less, to the present. <strong>The</strong> idea of an annoyance filter continued to intermittently<br />
occupy my thoughts, especially as the volume of junk arriving in my mailbox incessantly mounted despite<br />
ongoing efforts to filter it with increasingly voluminous and clever Procmail rules. <strong>The</strong>n, in August 2002, my<br />
friend and colleague Kern Sibbald brought to my attention Paul Graham’s brilliant design for an adaptable,<br />
Bayesian filter to discriminate junk and legitimate mail by word frequencies measured in actual samples of<br />
mail pre-sorted into those categories. Now that sounded promising! Here was a design which was simple in<br />
concept, theoretically sound, and best of all, it seemed to work. Graham implemented his prototype filter in<br />
the “Arc” Lisp dialect used in his research. I decided to build a deployable tool in industrial-strength C++,<br />
founded on his design, and handling all the details required so the filter could, as much as possible, interpret<br />
mail the same way a human would—decoding, translating, and extracting wherever necessary to defeat the<br />
techniques junk mailers adopt to hide their content from nave filtering utilities.<br />
This is not a simple task. Consider—you can probably sort out a message you’re interested in reading<br />
from unsolicited junk in a fraction of a second, but that assumes it’s presented to you after all of the mail<br />
transfer and content encodings have been peeled away to reveal the true colours of the content. Long gone<br />
are days when E-mail was predominantly ASCII text. Today, it’s more than likely to be HTML (if not a<br />
Flash animation or some other horror), often transmitted in Quoted−Printable or Base64 encodings largely<br />
in the interest of “stealth”—to hide the content from filters not equipped with the decoding facilities of a<br />
full-fledged mail client.