The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
148 TOKEN PARSER ANNOYANCE-FILTER §174<br />
174. <strong>The</strong> nextToken method stores the next token from the input source into its dictionary word<br />
argument and returns true if a token was found or false if the end of the input source was encountered<br />
whilst scanning for the next token.<br />
#define ChIx (c) (static cast〈unsigned int〉((c)) & # FF)<br />
〈 Class implementations 11 〉 +≡<br />
bool tokenParser ::nextToken (dictionaryWord &d)<br />
{<br />
string token ;<br />
while (¬atEnd ) {<br />
〈 Check for assembled phrases in queue and return next if so 175 〉;<br />
token = "";<br />
string ::size type necount = 0;<br />
if (source ⃗ isByteStream ( )) {<br />
〈 Parse plausible tokens from byte stream 178 〉;<br />
} /∗ Ignore non-token characters until start of next token ∗/<br />
while ((clp < cl .length ( )) ∧ (inHTMLcomment ∨ (¬(td ⃗ isTokenMember (ChIx (cl [clp])))))) {<br />
〈 Check for HTML comments and ignore them 176 〉;<br />
〈 Check for within HTML content 177 〉;<br />
clp ++;<br />
} /∗ If end of line encountered before token start, advance to next line ∗/<br />
if (clp ≥ cl .length ( )) {<br />
nextLine ( );<br />
continue;<br />
} /∗ Check for characters we don’t accept as the start of a token ∗/<br />
if (td ⃗ isTokenNotAtEnd (ChIx (cl [clp]))) {<br />
clp ++;<br />
continue;<br />
} /∗ First character of token recognised; store and scan balance ∗/<br />
if (td ⃗ isTokenNotExclusively (ChIx (cl [clp]))) {<br />
necount ++;<br />
}<br />
token += cl [clp ++];<br />
while ((clp < cl .length ( ))) {<br />
if ((¬inHTMLcomment ) ∧ (td ⃗ isTokenMember (ChIx (cl [clp])))) {<br />
if (td ⃗ isTokenNotExclusively (ChIx (cl [clp]))) {<br />
necount ++;<br />
}<br />
token += cl [clp ++];<br />
}<br />
else {<br />
〈 Check for HTML comments and ignore them 176 〉;<br />
if (inHTMLcomment ) {<br />
clp ++;<br />
continue;<br />
}<br />
break;<br />
}<br />
} /∗ Prune characters we don’t accept at the end of a token ∗/<br />
while ((token .length ( ) > 0) ∧ td ⃗ isTokenNotAtEnd (ChIx (token [token .length ( ) − 1]))) {<br />
token .erase (token .length ( ) − 1);<br />
} /∗ Verify that the token meets our minimum and maximum length constraints ∗/