12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

70 UCS-2 UNICODE DECODER ANNOYANCE-FILTER §81<br />

81. UCS-2 Unicode decoder.<br />

UCS-2 encoding of Unicode is simply a sequence of 16 bit quantities, which may be stored in either<br />

little-endian or big-endian order; usually identified by a Unicode Byte Order Mark at the start of the<br />

file. Here we do not attempt to auto-sense byte order; it must be set by the setBigEndian method before<br />

the decoder is used.<br />

〈 Class definitions 10 〉 +≡<br />

class UCS 2 Unicode MBCSdecoder : public Unicode MBCSdecoder {<br />

protected:<br />

bool bigEndian ;<br />

public:<br />

UCS 2 Unicode MBCSdecoder(bool isBigEndian = true )<br />

{<br />

setBigEndian (isBigEndian );<br />

}<br />

void setBigEndian (bool isBigEndian = true )<br />

{<br />

bigEndian = isBigEndian ;<br />

}<br />

virtual string name (void)<br />

{<br />

return "UCS_2_Unicode";<br />

}<br />

virtual int getNextDecodedChar (void); /∗ Get next decoded byte ∗/<br />

};<br />

82. Decode the next logical character. We return −1 when the end of the encoded line is encountered.<br />

〈 Class implementations 11 〉 +≡<br />

int UCS 2 Unicode MBCSdecoder ::getNextDecodedChar (void)<br />

{<br />

int c1 = getNextEncodedByte ( );<br />

int c2 = getNextEncodedByte ( );<br />

if (c2 ≡ −1) {<br />

ostringstream os ;<br />

os ≪ name ( ) ≪ "_MBCSdecoder:␣Premature␣end␣of␣line␣in␣two␣byte␣character.";<br />

reportDecoderDiagnostic(os );<br />

return −1;<br />

}<br />

if (bigEndian ) {<br />

c1 = (c1 ≪ 8) | c2 ;<br />

}<br />

else {<br />

c1 |= (c2 ≪ 8);<br />

}<br />

return c1 ;<br />

}

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!