12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§84 ANNOYANCE-FILTER UTF-8 UNICODE DECODER 73<br />

}<br />

}<br />

return result ;<br />

}<br />

85. UTF-16 Unicode decoder.<br />

<strong>The</strong> UTF-16 encoding of Unicode encodes logical characters as sequence of 16 bit codes. Most Unicode<br />

characters are encoded in a single 16 bit quantity, but character codes greater than 65535 are encoded in<br />

a pair of 16 bit values in the surrogate range. Naturally, this encoding can be either big- or little-endian<br />

in byte sequence; we handle either, as set by the setBigEndian method or the constructor.<br />

〈 Class definitions 10 〉 +≡<br />

class UTF 16 Unicode MBCSdecoder : public Unicode MBCSdecoder {<br />

protected:<br />

bool bigEndian ;<br />

int getNextUTF 16Word (void)<br />

{<br />

int c1 = getNextEncodedByte ( );<br />

}<br />

if (c1 < 0) {<br />

return c1 ;<br />

}<br />

int c2 = getNextEncodedByte ( );<br />

if (c2 < 0) {<br />

ostringstream os ;<br />

os ≪ name ( ) ≪ "_MBCSdecoder:␣Premature␣end␣of␣line␣in␣UTF−16␣character.";<br />

reportDecoderDiagnostic(os );<br />

return −1;<br />

}<br />

if (bigEndian ) {<br />

c1 = (c1 ≪ 8) | c2 ;<br />

}<br />

else {<br />

c1 |= (c2 ≪ 8);<br />

}<br />

return c1 ;<br />

public:<br />

UTF 16 Unicode MBCSdecoder(bool isBigEndian = true )<br />

{<br />

setBigEndian (isBigEndian );<br />

}<br />

void setBigEndian (bool isBigEndian = true )<br />

{<br />

bigEndian = isBigEndian ;<br />

}<br />

virtual string name (void)<br />

{<br />

return "UTF_16_Unicode";<br />

}<br />

virtual int getNextDecodedChar (void); /∗ Get next decoded byte ∗/<br />

};

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!