12.06.2015 Views

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

The Annoyance Filter.pdf - Fourmilab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§79 ANNOYANCE-FILTER SHIFT-JIS DECODER 69<br />

79. <strong>The</strong> four additional characters added by the Macintosh are # 80 (backslash), # FD (copyright<br />

symbol), # FE (trademark symbol), and # FF (ellipsis). We check for them and translate them into<br />

plausible ISO 8859 replacements, expanding as necessary into multiple character sequences via the<br />

pending string mechanism.<br />

〈 Check for Macintosh-specific single byte characters and translate 79 〉 ≡<br />

switch (c1 ) {<br />

case # 80:<br />

c1 = ’\\’; /∗ Macintosh backslash ∗/<br />

break;<br />

case # FD:<br />

c1 = # A9; /∗ ISO 8859 c○ symbol ∗/<br />

break;<br />

case # FE: c1 = ’T’; /∗ Trademark ( TM ) symbol ∗/<br />

pending = "M";<br />

break;<br />

case # FF: /∗ Ellipsis (“. . .”) ∗/<br />

c1 = ’.’;<br />

pending = "..";<br />

break;<br />

}<br />

This code is used in section 76.<br />

80. Unicode decoders.<br />

<strong>The</strong> Unicode character set (itself a subset of the 32 bit ISO 10646 character set), uses a variety of<br />

encoding schemes. <strong>The</strong> Unicode MBCSdecoder is the parent class for all specific Unicode decoders and<br />

provides common services for them.<br />

〈 Class definitions 10 〉 +≡<br />

class Unicode MBCSdecoder : public MBCSdecoder {<br />

public:<br />

virtual string name (void)<br />

{<br />

return "Unicode";<br />

}<br />

virtual int getNextDecodedChar (void) = 0; /∗ Get next decoded byte ∗/<br />

};

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!