12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

78 SHIFT-JIS INTERPRETER CLASS ANNOYANCE-FILTER §94<br />

94. Our getNextDecodedChar implementation is a bit more complicated than the default provided by<br />

the parent class. In addition to handling ASCII and two byte character codes, we also wish to interpret<br />

Katakana single byte characters, which are emitted without spaces between them.<br />

〈 Class implementations 11 〉 +≡<br />

string Shift JIS MBCSinterpreter ::getNextDecodedChar (void)<br />

{<br />

assert(dp ≠ Λ);<br />

}<br />

int dc = dp ⃗ getNextDecodedChar ( );<br />

if (dc < 0) {<br />

return ""; /∗ End of input stream ∗/<br />

}<br />

if (dc < # A1) {<br />

string r(1, static cast〈char〉(dc)); /∗ ASCII character ∗/<br />

}<br />

return r;<br />

ostringstream os ;<br />

os .setf (ios ::uppercase );<br />

if ((dc ≥ # A1) ∧ (dc ≤ # DF)) {<br />

os ≪ "SJIS−K" ≪ hex ≪ dc ≪ dec; /∗ Katakana—don’t space around characters ∗/<br />

}<br />

else {<br />

os ≪ prefix ≪ "SJIS−" ≪ hex ≪ dc ≪ dec ≪ suffix ; /∗ Kanji–space on both sides ∗/<br />

}<br />

return os .str ( );<br />

95. Korean Interpreter class.<br />

This interpreter class parses Korean characters into tokens which downstream parsers can comprehend.<br />

This type (usually expressed as a charset of euc−kr) is uncommon, but we handle it to illustrate an<br />

interpreter for an alphabetic non-Western language.<br />

〈 Class definitions 10 〉 +≡<br />

class KR MBCSinterpreter : public MBCSinterpreter {<br />

public:<br />

virtual string name (void)<br />

{<br />

return "KR";<br />

}<br />

};

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!