12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§96 ANNOYANCE-FILTER UNICODE INTERPRETER CLASS 79<br />

96. Unicode Interpreter class.<br />

This interpreter class parses Unicode characters into a form which can be comprehended by the parser.<br />

〈 Class definitions 10 〉 +≡<br />

class Unicode MBCSinterpreter : public MBCSinterpreter {<br />

public:<br />

Unicode MBCSinterpreter( )<br />

{<br />

setPrefixSuffix ("␣", "␣");<br />

}<br />

virtual string name (void)<br />

{<br />

return "Unicode";<br />

}<br />

string getNextDecodedChar (void);<br />

};<br />

97. Our getNextDecodedChar implementation attempts to represent the Unicode characters in a fashion<br />

which will best enable the parser to classify them. Characters in the first 256 code positions, which<br />

are identical to ISO-8859 are output as ISO characters. Other codes are represented as “UCS−nnnn”<br />

where nnnn is the Unicode code value in hexadecimal. Codes representing iedographs are output separated<br />

by spaces while codes for alphanumeric characters are not space-separated.<br />

〈 Class implementations 11 〉 +≡<br />

string Unicode MBCSinterpreter ::getNextDecodedChar (void)<br />

{<br />

assert(dp ≠ Λ);<br />

}<br />

int dc = dp ⃗ getNextDecodedChar ( );<br />

if (dc < 0) {<br />

return ""; /∗ End of input stream ∗/<br />

}<br />

if (dc ≤ # FF) {<br />

string r(1, static cast〈char〉(dc)); /∗ ASCII character ∗/<br />

}<br />

return r;<br />

ostringstream os ;<br />

os .setf (ios ::uppercase );<br />

if (((dc ≥ # 3200) ∧ (dc < # D800)) ∨ ((dc ≥ # F900) ∧ (dc < # FAFF))) {<br />

os ≪ prefix ≪ "UCS−" ≪ hex ≪ dc ≪ dec ≪ suffix ; /∗ Ideographic–space on both sides ∗/<br />

}<br />

else {<br />

os ≪ "UCS−" ≪ hex ≪ dc ≪ dec; /∗ Alphabetic—don’t space around characters ∗/<br />

}<br />

return os .str ( );

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!