General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Idea: Unicode supports multiple encodings (but not character sets) for efficiency<br />
Definition 226 (Unicode Transformation Format) UTF-8, 8-bit, variable-width<br />
encoding, which maximizes compatibility with ASC<strong>II</strong>.<br />
UTF-16, 16-bit, variable-width encoding (popular in Asia)<br />
UTF-32, a 32-bit, fixed-width encoding (for safety)<br />
Definition 227 The UTF-8 encoding follows the following encoding scheme<br />
Unicode Byte1 Byte2 Byte3 Byte4<br />
U+000000 − U+00007F 0xxxxxxx<br />
U+000080 − U+0007F F 110xxxxx 10xxxxxx<br />
U+000800 − U+00F F F F 1110xxxx 10xxxxxx 10xxxxxx<br />
U+010000 − U+10F F F F 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx<br />
Example 228 $ = U+0024 is encoded as 00100100 (1 byte)<br />
¢ = U+00A2 is encoded as 11000010,10100010 (two bytes)<br />
e = U+20AC is encoded as 11100010,10000010,10101100 (three bytes)<br />
c○: Michael Kohlhase 129<br />
Note how the fixed bit prefixes in the encoding are engineered to determine which of the four cases<br />
apply, so that UTF-8 encoded documents can be safely decoded..<br />
2.4.4 Formal Languages and Meaning<br />
After we have studied the elementary theory of codes for strings, we will come to string representations<br />
of structured objects like terms. For these we will need more refined methods.<br />
As we have started out the course with unary natural numbers and added the arithmetical<br />
operations to the mix later, we will use unary arithmetics as our running example and study<br />
object.<br />
A formal Language for Unary Arithmetics<br />
Idea: Start with something very simple: Unary Arithmetics<br />
(i.e. N with addition, multiplication, subtraction, and integer division)<br />
Eun is based on the alphabet Σun := Cun ∪ V ∪ F 2 un ∪ B, where<br />
Cun := {/} ∗ is a set of constant names,<br />
V := {x} × {1, . . . , 9} × {0, . . . , 9} ∗ is a set of variable names,<br />
F 2 un := {add, sub, mul, div, mod} is a set of (binary) function names, and<br />
B := {(, )} ∪ {,} is a set of structural characters. ( “,”,”(“,”)” characters!)<br />
define strings in stages: Eun := <br />
i∈N Ei un, where<br />
E 1 un := Cun ∪ V<br />
E i+1<br />
un := {a, add(a,b), sub(a,b), mul(a,b), div(a,b), mod(a,b) | a, b ∈ E i un}<br />
We call a string in Eun an expression of unary arithmetics.<br />
c○: Michael Kohlhase 130<br />
The first thing we notice is that the alphabet is not just a flat any more, we have characters<br />
with different roles in the alphabet. These roles have to do with the symbols used in the complex<br />
70