12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

80 APPLICATION STRING PARSERS ANNOYANCE-FILTER §98<br />

98. Application string parsers.<br />

An application string parser reads files in application-defined formats (for example, word processor<br />

documents, spreadsheets, page description languages, etc.) and returns strings included in the file.<br />

Unlike tokenParser in “byte stream” mode, there is nothing heuristic in the operation of an application<br />

string parser—it must understand the structure of the application data file in order to identify and<br />

extract strings within it.<br />

<strong>The</strong> applicationStringParser class is the virtual parent of all specific application string parsers. It<br />

provides common services to derived classes and defines the external interface. When initialising an<br />

applicationStringParser , the caller must supply a pointer to the mailFolder from which it will be<br />

invoked, through which the folder’s nextByte method will be called to return decoded binary bytes of<br />

the application file. It would be much cleaner if we could simply supply an arbitrary function which<br />

returned the next byte of the stream we’re decoding, but that runs afoul of C++’s rules for taking the<br />

address of class members. Consequently, we’re forced to make applicationStringParser co-operate with<br />

mailFolder to obtain decoded bytes.<br />

〈 Class definitions 10 〉 +≡<br />

class applicationStringParser {<br />

protected:<br />

bool error , eof ; /∗ Error and end of file indicators ∗/<br />

mailFolder ∗mf ;<br />

virtual unsigned char get8 (void);<br />

virtual void get8n (unsigned char ∗buf , const int n)<br />

{ /∗ Store next n bytes into buf ∗/<br />

for (int i = 0; (¬eof ) ∧ (i < n); i++) {<br />

buf [i] = get8 ( );<br />

}<br />

}<br />

public:<br />

applicationStringParser(mailFolder ∗f = Λ) : error (false ) , eof (false ), mf (Λ)<br />

{<br />

setMailFolder (f);<br />

}<br />

virtual ∼applicationStringParser( )<br />

{ }<br />

virtual string name (void) const = 0;<br />

void setMailFolder (mailFolder ∗f)<br />

{<br />

mf = f;<br />

}<br />

virtual bool nextString (string &s) = 0;<br />

virtual void close (void){ error = eof = false ; } bool isError (void) const { return error ; }<br />

bool isEOF (void) const<br />

{<br />

}<br />

return eof ;<br />

bool isOK (void) const<br />

{<br />

return (¬isEOF ( )) ∧ (¬isError ( ));<br />

}<br />

} ;

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!