The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
§75 ANNOYANCE-FILTER SHIFT-JIS DECODER 67<br />
75. Shift-JIS decoder.<br />
Shift-JIS is used to encode Japanese characters on MS-DOS, Windows, and the Macintosh (which adds<br />
four additional one-byte characters which we support here). <strong>The</strong> encoding uses code points # 21– # 7E<br />
for ASCII/JIS-Roman single byte characters, code points # A1– # DF for single byte hald width katakana,<br />
plus two-byte characters introduced by first bytes in the ranges # 81– # 9F, # E0– # EF, and, for user-defined<br />
characters, # F0– # FC. <strong>The</strong> second byte of a valid two-byte character will always be in one of the ranges<br />
#<br />
40– # 7E and # 80– # FC.<br />
〈 Class definitions 10 〉 +≡<br />
class Shift JIS MBCSdecoder : public MBCSdecoder {<br />
protected:<br />
string pending ;<br />
public:<br />
Shift JIS MBCSdecoder( )<br />
: pending ("") { }<br />
virtual ∼Shift JIS MBCSdecoder( )<br />
{ }<br />
virtual string name (void)<br />
{<br />
return "Shift_JIS";<br />
}<br />
virtual int getNextDecodedChar (void); /∗ Get next decoded byte ∗/<br />
};<br />
76. Decode the next logical character. We return −1 when the end of the encoded line is encountered.<br />
An invalid second byte of a two byte character terminates processing of the line, as it’s likely to be<br />
gibberish from then on.<br />
〈 Class implementations 11 〉 +≡<br />
int Shift JIS MBCSdecoder ::getNextDecodedChar (void)<br />
{<br />
〈 Check for pending characters and return if so 78 〉;<br />
}<br />
int c1 = getNextEncodedByte ( );<br />
if (c1 ≥ 0) {<br />
〈 Check for Shift-JIS two byte character and assemble as required 77 〉;<br />
〈 Check for Macintosh-specific single byte characters and translate 79 〉;<br />
}<br />
return c1 ;