12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

16 A BRIEF HISTORY OF ANNOYANCE-FILTER ANNOYANCE-FILTER §9<br />

deep enough for high diving. No more laps round the worry room for me!<br />

I must confess to some inside information in this regard. While working for a regrettable employer in an<br />

odious swamp, I’d twigged to the fact that network television advertisers tagged their commercials with a<br />

signature in the vertical retrace interval to permit audit bureaux to measure how many network affiliates<br />

actually broadcast each commercial. This tag appeared to me the Achilles’ heel of television advertising.<br />

As long as one could distinguish tagged commercials from an un-tagged program, it would be more or less<br />

straightforward to detect when a commercial was being transmitted and pause the VCR until the program<br />

resumed.<br />

If only. . . . In reality, only nationally broadcast commercials bore the tag, and only some of them. Local<br />

commercials were never tagged. This created a difficult marketing dilemma for my grand scheme. While it<br />

might have been possible to block some of the most ubiquitous and irritating commercials on mass-market<br />

network series, the bottom feeders who watch those shows probably enjoyed the commercials and wouldn’t<br />

be prospects for my gadget, while those like myself, infuriated by incessant commercials interrupting late<br />

night movies, would find the device ineffective since local commercials on independent stations were never<br />

tagged. Real-time analysis of video or even audio in the 1970’s and early 80’s was technologically out of<br />

the question for a product aimed at a mass consumer market. So, I put the idea of an annoyance filter for<br />

television aside and occupied myself with other endeavours.<br />

We now arrive at the late 1980’s. I’d spent the last decade or so filling up the money bin more or less flat<br />

out, and having reached a level I judged more than adequate, I began to turn my attention to matters I’d<br />

neglected during those laser-focused years.<br />

Writing science fiction, for one thing. <strong>The</strong>re was something about the advertising filter which had dug its<br />

way into my brain so deeply that nothing could dislodge it. <strong>The</strong> year is 1989; the Berlin Wall is about to<br />

tumble; and I’m scribbling a story about two programmers spending the downtime between Christmas and<br />

New Year’s Day (the period when I’d accomplished about half of my own productive work over the previous<br />

half decade) prowling the nascent Internet for evidence of an extraterrestrial message already received, but<br />

not recognised as such. In<br />

We’ll Return, After this Message,<br />

it is an annoyance filter which recognises an extraterrestrial message for what it is, advertising, and as<br />

von Mises observed, distinguishable by its own strident clamouring for attention.<br />

A decade later, in the very years in which I set my science fiction story, I launched my own search for a<br />

message from our Creator hidden in the most obvious of locations—no results so far. Yet still I scour the<br />

Net.<br />

Which brings us, more or less, to the present. <strong>The</strong> idea of an annoyance filter continued to intermittently<br />

occupy my thoughts, especially as the volume of junk arriving in my mailbox incessantly mounted despite<br />

ongoing efforts to filter it with increasingly voluminous and clever Procmail rules. <strong>The</strong>n, in August 2002, my<br />

friend and colleague Kern Sibbald brought to my attention Paul Graham’s brilliant design for an adaptable,<br />

Bayesian filter to discriminate junk and legitimate mail by word frequencies measured in actual samples of<br />

mail pre-sorted into those categories. Now that sounded promising! Here was a design which was simple in<br />

concept, theoretically sound, and best of all, it seemed to work. Graham implemented his prototype filter in<br />

the “Arc” Lisp dialect used in his research. I decided to build a deployable tool in industrial-strength C++,<br />

founded on his design, and handling all the details required so the filter could, as much as possible, interpret<br />

mail the same way a human would—decoding, translating, and extracting wherever necessary to defeat the<br />

techniques junk mailers adopt to hide their content from nave filtering utilities.<br />

This is not a simple task. Consider—you can probably sort out a message you’re interested in reading<br />

from unsolicited junk in a fraction of a second, but that assumes it’s presented to you after all of the mail<br />

transfer and content encodings have been peeled away to reveal the true colours of the content. Long gone<br />

are days when E-mail was predominantly ASCII text. Today, it’s more than likely to be HTML (if not a<br />

Flash animation or some other horror), often transmitted in Quoted−Printable or Base64 encodings largely<br />

in the interest of “stealth”—to hide the content from filters not equipped with the decoding facilities of a<br />

full-fledged mail client.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!