15.08.2013 Views

General Computer Science 320201 GenCS I & II Lecture ... - Kwarc

General Computer Science 320201 GenCS I & II Lecture ... - Kwarc

General Computer Science 320201 GenCS I & II Lecture ... - Kwarc

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.4 Encoding Programs as Strings<br />

With the abstract data types we looked at last, we studied term structures, i.e. complex mathematical<br />

objects that were built up from constructors, variables and parameters. The motivation<br />

for this is that we wanted to understand SML programs. And indeed we have seen that there is a<br />

close connection between SML programs on the one side and abstract data types and procedures<br />

on the other side. However, this analysis only holds on a very high level, SML programs are not<br />

terms per se, but sequences of characters we type to the keyboard or load from files. We only<br />

interpret them to be terms in the analysis of programs.<br />

To drive our understanding of programs further, we will first have to understand more about sequences<br />

of characters (strings) and the interpretation process that derives structured mathematical<br />

objects (like terms) from them. Of course, not every sequence of characters will be interpretable,<br />

so we will need a notion of (legal) well-formed sequence.<br />

2.4.1 Formal Languages<br />

We will now formally define the concept of strings and (building on that) formal languages.<br />

The Mathematics of Strings<br />

Definition 184 An alphabet A is a finite set; we call each element a ∈ A a character, and<br />

an n-tuple of s ∈ A n a string (of length n over A).<br />

Definition 185 Note that A 0 = {〈〉}, where 〈〉 is the (unique) 0-tuple. With the definition<br />

above we consider 〈〉 as the string of length 0 and call it the empty string and denote it with<br />

ɛ<br />

Note: Sets = Strings, e.g. {1, 2, 3} = {3, 2, 1}, but 〈1, 2, 3〉 = 〈3, 2, 1〉.<br />

Notation 186 We will often write a string 〈c1, . . . , cn〉 as ”c1 . . . cn”, for instance ”a, b, c”<br />

for 〈a, b, c〉<br />

Example 187 Take A = {h, 1, /} as an alphabet. Each of the symbols h, 1, and / is a<br />

character. The vector 〈/, /, 1, h, 1〉 is a string of length 5 over A.<br />

Definition 188 (String Length) Given a string s we denote its length with |s|.<br />

Definition 189 The concatenation conc(s, t) of two strings s = 〈s1, ..., sn〉 ∈ A n and<br />

t = 〈t1, ..., tm〉 ∈ A m is defined as 〈s1, ..., sn, t1, ..., tm〉 ∈ A n+m .<br />

We will often write conc(s, t) as s + t or simply st<br />

(e.g. conc(”t, e, x, t”, ”b, o, o, k”) = ”t, e, x, t” + ”b, o, o, k” = ”t, e, x, t, b, o, o, k”)<br />

c○: Michael Kohlhase 116<br />

We have multiple notations for concatenation, since it is such a basic operation, which is used<br />

so often that we will need very short notations for it, trusting that the reader can disambiguate<br />

based on the context.<br />

Now that we have defined the concept of a string as a sequence of characters, we can go on to<br />

give ourselves a way to distinguish between good strings (e.g. programs in a given programming<br />

language) and bad strings (e.g. such with syntax errors). The way to do this by the concept of a<br />

formal language, which we are about to define.<br />

Formal Languages<br />

Definition 190 Let A be an alphabet, then we define the sets A + := <br />

i∈N + Ai of nonempty<br />

strings and A ∗ := A + ∪ {ɛ} of strings.<br />

63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!