The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
The Annoyance Filter.pdf - Fourmilab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
§71 ANNOYANCE-FILTER EUC DECODER 65<br />
71. EUC decoder.<br />
This decoder extracts logical characters from byte streams encoded in EUC encoding. In EUC, if a byte<br />
in the input stream is in the range # A1– # FE and the subsequent byte in the range # 80– # FF, then the<br />
variant fields encoded in the two bytes define the character code. A byte not within the range of the first<br />
byte of a two byte character is interpreted as a single byte character with ASCII/ISO-8859 semantics.<br />
〈 Class definitions 10 〉 +≡<br />
class EUC MBCSdecoder : public MBCSdecoder {<br />
public:<br />
virtual string name (void)<br />
{<br />
return "EUC";<br />
}<br />
virtual int getNextDecodedChar (void); /∗ Get next decoded byte ∗/<br />
};<br />
72. Bytes are parsed from the input stream as follows. Any bytes with values within the range<br />
#<br />
A1– # FE denote the first byte of a two byte character, whose second byte must be within the range<br />
#<br />
80– # FF. Any violation of the constraints on the second byte indicates an invalid sequence. Characters<br />
outside the range of initial characters are considered single byte codes. We return −1 when the end of<br />
the encoded line is encountered.<br />
〈 Class implementations 11 〉 +≡<br />
int EUC MBCSdecoder ::getNextDecodedChar (void)<br />
{<br />
int c1 = getNextEncodedByte ( );<br />
if ((c1 ≥ # A1) ∧ (c1 ≤ # FE)) {<br />
int c2 = getNextEncodedByte ( );<br />
if ((c2 ≥ # 80) ∧ (c2 ≤ # FF)) {<br />
return (c1 ≪ 8) | c2 ;<br />
}<br />
if (c2 ≡ −1) {<br />
ostringstream os ;<br />
os ≪ name ( ) ≪ "_MBCSdecoder:␣Premature␣end␣of␣line␣in␣two␣byte␣character.";<br />
reportDecoderDiagnostic(os );<br />
return −1;<br />
} /∗ Odds are that once we’ve encountered an invalid second byte, the balance of the<br />
encoded line will be screwed up as well. To avoid such blithering, discard the line after<br />
such an error. ∗/<br />
discardLine ( );<br />
ostringstream os ;<br />
os ≪ name ( ) ≪ "_MBCSdecoder:␣Invalid␣second␣byte␣in␣two␣byte␣character:␣""0x" ≪<br />
setiosflags (ios ::uppercase ) ≪ hex ≪ c1 ≪ "␣" ≪ "0x" ≪ c2 ≪ ".";<br />
reportDecoderDiagnostic(os );<br />
return c1 ;<br />
}<br />
return c1 ;<br />
}