12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

148 TOKEN PARSER ANNOYANCE-FILTER §174<br />

174. <strong>The</strong> nextToken method stores the next token from the input source into its dictionary word<br />

argument and returns true if a token was found or false if the end of the input source was encountered<br />

whilst scanning for the next token.<br />

#define ChIx (c) (static cast〈unsigned int〉((c)) & # FF)<br />

〈 Class implementations 11 〉 +≡<br />

bool tokenParser ::nextToken (dictionaryWord &d)<br />

{<br />

string token ;<br />

while (¬atEnd ) {<br />

〈 Check for assembled phrases in queue and return next if so 175 〉;<br />

token = "";<br />

string ::size type necount = 0;<br />

if (source ⃗ isByteStream ( )) {<br />

〈 Parse plausible tokens from byte stream 178 〉;<br />

} /∗ Ignore non-token characters until start of next token ∗/<br />

while ((clp < cl .length ( )) ∧ (inHTMLcomment ∨ (¬(td ⃗ isTokenMember (ChIx (cl [clp])))))) {<br />

〈 Check for HTML comments and ignore them 176 〉;<br />

〈 Check for within HTML content 177 〉;<br />

clp ++;<br />

} /∗ If end of line encountered before token start, advance to next line ∗/<br />

if (clp ≥ cl .length ( )) {<br />

nextLine ( );<br />

continue;<br />

} /∗ Check for characters we don’t accept as the start of a token ∗/<br />

if (td ⃗ isTokenNotAtEnd (ChIx (cl [clp]))) {<br />

clp ++;<br />

continue;<br />

} /∗ First character of token recognised; store and scan balance ∗/<br />

if (td ⃗ isTokenNotExclusively (ChIx (cl [clp]))) {<br />

necount ++;<br />

}<br />

token += cl [clp ++];<br />

while ((clp < cl .length ( ))) {<br />

if ((¬inHTMLcomment ) ∧ (td ⃗ isTokenMember (ChIx (cl [clp])))) {<br />

if (td ⃗ isTokenNotExclusively (ChIx (cl [clp]))) {<br />

necount ++;<br />

}<br />

token += cl [clp ++];<br />

}<br />

else {<br />

〈 Check for HTML comments and ignore them 176 〉;<br />

if (inHTMLcomment ) {<br />

clp ++;<br />

continue;<br />

}<br />

break;<br />

}<br />

} /∗ Prune characters we don’t accept at the end of a token ∗/<br />

while ((token .length ( ) > 0) ∧ td ⃗ isTokenNotAtEnd (ChIx (token [token .length ( ) − 1]))) {<br />

token .erase (token .length ( ) − 1);<br />

} /∗ Verify that the token meets our minimum and maximum length constraints ∗/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!