12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

§71 ANNOYANCE-FILTER EUC DECODER 65<br />

71. EUC decoder.<br />

This decoder extracts logical characters from byte streams encoded in EUC encoding. In EUC, if a byte<br />

in the input stream is in the range # A1– # FE and the subsequent byte in the range # 80– # FF, then the<br />

variant fields encoded in the two bytes define the character code. A byte not within the range of the first<br />

byte of a two byte character is interpreted as a single byte character with ASCII/ISO-8859 semantics.<br />

〈 Class definitions 10 〉 +≡<br />

class EUC MBCSdecoder : public MBCSdecoder {<br />

public:<br />

virtual string name (void)<br />

{<br />

return "EUC";<br />

}<br />

virtual int getNextDecodedChar (void); /∗ Get next decoded byte ∗/<br />

};<br />

72. Bytes are parsed from the input stream as follows. Any bytes with values within the range<br />

#<br />

A1– # FE denote the first byte of a two byte character, whose second byte must be within the range<br />

#<br />

80– # FF. Any violation of the constraints on the second byte indicates an invalid sequence. Characters<br />

outside the range of initial characters are considered single byte codes. We return −1 when the end of<br />

the encoded line is encountered.<br />

〈 Class implementations 11 〉 +≡<br />

int EUC MBCSdecoder ::getNextDecodedChar (void)<br />

{<br />

int c1 = getNextEncodedByte ( );<br />

if ((c1 ≥ # A1) ∧ (c1 ≤ # FE)) {<br />

int c2 = getNextEncodedByte ( );<br />

if ((c2 ≥ # 80) ∧ (c2 ≤ # FF)) {<br />

return (c1 ≪ 8) | c2 ;<br />

}<br />

if (c2 ≡ −1) {<br />

ostringstream os ;<br />

os ≪ name ( ) ≪ "_MBCSdecoder:␣Premature␣end␣of␣line␣in␣two␣byte␣character.";<br />

reportDecoderDiagnostic(os );<br />

return −1;<br />

} /∗ Odds are that once we’ve encountered an invalid second byte, the balance of the<br />

encoded line will be screwed up as well. To avoid such blithering, discard the line after<br />

such an error. ∗/<br />

discardLine ( );<br />

ostringstream os ;<br />

os ≪ name ( ) ≪ "_MBCSdecoder:␣Invalid␣second␣byte␣in␣two␣byte␣character:␣""0x" ≪<br />

setiosflags (ios ::uppercase ) ≪ hex ≪ c1 ≪ "␣" ≪ "0x" ≪ c2 ≪ ".";<br />

reportDecoderDiagnostic(os );<br />

return c1 ;<br />

}<br />

return c1 ;<br />

}

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!