15.08.2013 Views

General Computer Science 320201 GenCS I & II Lecture ... - Kwarc

General Computer Science 320201 GenCS I & II Lecture ... - Kwarc

General Computer Science 320201 GenCS I & II Lecture ... - Kwarc

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Idea: Unicode supports multiple encodings (but not character sets) for efficiency<br />

Definition 226 (Unicode Transformation Format) UTF-8, 8-bit, variable-width<br />

encoding, which maximizes compatibility with ASC<strong>II</strong>.<br />

UTF-16, 16-bit, variable-width encoding (popular in Asia)<br />

UTF-32, a 32-bit, fixed-width encoding (for safety)<br />

Definition 227 The UTF-8 encoding follows the following encoding scheme<br />

Unicode Byte1 Byte2 Byte3 Byte4<br />

U+000000 − U+00007F 0xxxxxxx<br />

U+000080 − U+0007F F 110xxxxx 10xxxxxx<br />

U+000800 − U+00F F F F 1110xxxx 10xxxxxx 10xxxxxx<br />

U+010000 − U+10F F F F 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx<br />

Example 228 $ = U+0024 is encoded as 00100100 (1 byte)<br />

¢ = U+00A2 is encoded as 11000010,10100010 (two bytes)<br />

e = U+20AC is encoded as 11100010,10000010,10101100 (three bytes)<br />

c○: Michael Kohlhase 129<br />

Note how the fixed bit prefixes in the encoding are engineered to determine which of the four cases<br />

apply, so that UTF-8 encoded documents can be safely decoded..<br />

2.4.4 Formal Languages and Meaning<br />

After we have studied the elementary theory of codes for strings, we will come to string representations<br />

of structured objects like terms. For these we will need more refined methods.<br />

As we have started out the course with unary natural numbers and added the arithmetical<br />

operations to the mix later, we will use unary arithmetics as our running example and study<br />

object.<br />

A formal Language for Unary Arithmetics<br />

Idea: Start with something very simple: Unary Arithmetics<br />

(i.e. N with addition, multiplication, subtraction, and integer division)<br />

Eun is based on the alphabet Σun := Cun ∪ V ∪ F 2 un ∪ B, where<br />

Cun := {/} ∗ is a set of constant names,<br />

V := {x} × {1, . . . , 9} × {0, . . . , 9} ∗ is a set of variable names,<br />

F 2 un := {add, sub, mul, div, mod} is a set of (binary) function names, and<br />

B := {(, )} ∪ {,} is a set of structural characters. ( “,”,”(“,”)” characters!)<br />

define strings in stages: Eun := <br />

i∈N Ei un, where<br />

E 1 un := Cun ∪ V<br />

E i+1<br />

un := {a, add(a,b), sub(a,b), mul(a,b), div(a,b), mod(a,b) | a, b ∈ E i un}<br />

We call a string in Eun an expression of unary arithmetics.<br />

c○: Michael Kohlhase 130<br />

The first thing we notice is that the alphabet is not just a flat any more, we have characters<br />

with different roles in the alphabet. These roles have to do with the symbols used in the complex<br />

70

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!