General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Now, checking whether a code is a prefix code can be a tedious undertaking: the naive algorithm<br />
for this needs to check all pairs of codewords. Therefore we will look at a couple of properties of<br />
character codes that will ensure a prefix code and thus decodeability.<br />
Sufficient Conditions for Prefix Codes<br />
Theorem 215 If c is a code with |c(a)| = k for all a ∈ A for some k ∈ N, then c is prefix<br />
code.<br />
Proof: by contradiction.<br />
P.1 If c is not at prefix code, then there are a, b ∈ A with c(a) ⊳ c(b).<br />
P.2 clearly |c(a)| < |c(b)|, which contradicts our assumption.<br />
Theorem 216 Let c: A → B + be a code and ∗ ∈ B be a character, then there is a prefix<br />
code c ∗ : A → (B ∪ {∗}) + , such that c(a) ⊳ c ∗ (a), for all a ∈ A.<br />
Proof: Let c ∗ (a) := c(a) + ”∗” for all a ∈ A.<br />
P.1 Obviously, c(a) ⊳ c ∗ (a).<br />
P.2 If c ∗ is not a prefix code, then there are a, b ∈ A with c ∗ (a) ⊳ c ∗ (b).<br />
P.3 So, c ∗ (b) contains the character ∗ not only at the end but also somewhere in the middle.<br />
P.4 This contradicts our construction c ∗ (b) = c(b) + ”∗”, where c(b) ∈ B +<br />
c○: Michael Kohlhase 124<br />
2.4.3 Character Codes in the Real World<br />
We will now turn to a class of codes that are extremely important in information technology:<br />
character encodings. The idea here is that for IT systems we need to encode characters from<br />
our alphabets as bit strings (sequences of binary digits 0 and 1) for representation in computers.<br />
Indeed the Morse code we have seen above can be seen as a very simple example of a character<br />
encoding that is geared towards the manual transmission of natural languages over telegraph lines.<br />
For the encoding of written texts we need more extensive codes that can e.g. distinguish upper<br />
and lowercase letters.<br />
The ASC<strong>II</strong> code we will introduce here is one of the first standardized and widely used character<br />
encodings for a complete alphabet. It is still widely used today. The code tries to strike a balance<br />
between a being able to encode a large set of characters and the representational capabiligies<br />
in the time of punch cards (cardboard cards that represented sequences of binary numbers by<br />
rectangular arrays of dots). 6 EdNote:6<br />
The ASC<strong>II</strong> Character Code<br />
Definition 217 The American Standard Code for Information Interchange (ASC<strong>II</strong>) code<br />
assigns characters to numbers 0-127<br />
Code ···0 ···1 ···2 ···3 ···4 ···5 ···6 ···7 ···8 ···9 ···A ···B ···C ···D ···E ···F<br />
0··· NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI<br />
1··· DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US<br />
2··· ! ” # $ % & ′<br />
( ) ∗ + , − . /<br />
3··· 0 1 2 3 4 5 6 7 8 9 : ; < = > ?<br />
4··· @ A B C D E F G H I J K L M N O<br />
5··· P Q R S T U V W X Y Z [ \ ] ˆ<br />
6··· ‘ a b c d e f g h i j k l m n o<br />
7··· p q r s t u v w x y z { | } ∼ DEL<br />
6 EdNote: is the 7-bit grouping really motivated by the cognitive limit?<br />
67