General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
General Computer Science 320201 GenCS I & II Lecture ... - Kwarc
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.4 Encoding Programs as Strings<br />
With the abstract data types we looked at last, we studied term structures, i.e. complex mathematical<br />
objects that were built up from constructors, variables and parameters. The motivation<br />
for this is that we wanted to understand SML programs. And indeed we have seen that there is a<br />
close connection between SML programs on the one side and abstract data types and procedures<br />
on the other side. However, this analysis only holds on a very high level, SML programs are not<br />
terms per se, but sequences of characters we type to the keyboard or load from files. We only<br />
interpret them to be terms in the analysis of programs.<br />
To drive our understanding of programs further, we will first have to understand more about sequences<br />
of characters (strings) and the interpretation process that derives structured mathematical<br />
objects (like terms) from them. Of course, not every sequence of characters will be interpretable,<br />
so we will need a notion of (legal) well-formed sequence.<br />
2.4.1 Formal Languages<br />
We will now formally define the concept of strings and (building on that) formal languages.<br />
The Mathematics of Strings<br />
Definition 184 An alphabet A is a finite set; we call each element a ∈ A a character, and<br />
an n-tuple of s ∈ A n a string (of length n over A).<br />
Definition 185 Note that A 0 = {〈〉}, where 〈〉 is the (unique) 0-tuple. With the definition<br />
above we consider 〈〉 as the string of length 0 and call it the empty string and denote it with<br />
ɛ<br />
Note: Sets = Strings, e.g. {1, 2, 3} = {3, 2, 1}, but 〈1, 2, 3〉 = 〈3, 2, 1〉.<br />
Notation 186 We will often write a string 〈c1, . . . , cn〉 as ”c1 . . . cn”, for instance ”a, b, c”<br />
for 〈a, b, c〉<br />
Example 187 Take A = {h, 1, /} as an alphabet. Each of the symbols h, 1, and / is a<br />
character. The vector 〈/, /, 1, h, 1〉 is a string of length 5 over A.<br />
Definition 188 (String Length) Given a string s we denote its length with |s|.<br />
Definition 189 The concatenation conc(s, t) of two strings s = 〈s1, ..., sn〉 ∈ A n and<br />
t = 〈t1, ..., tm〉 ∈ A m is defined as 〈s1, ..., sn, t1, ..., tm〉 ∈ A n+m .<br />
We will often write conc(s, t) as s + t or simply st<br />
(e.g. conc(”t, e, x, t”, ”b, o, o, k”) = ”t, e, x, t” + ”b, o, o, k” = ”t, e, x, t, b, o, o, k”)<br />
c○: Michael Kohlhase 116<br />
We have multiple notations for concatenation, since it is such a basic operation, which is used<br />
so often that we will need very short notations for it, trusting that the reader can disambiguate<br />
based on the context.<br />
Now that we have defined the concept of a string as a sequence of characters, we can go on to<br />
give ourselves a way to distinguish between good strings (e.g. programs in a given programming<br />
language) and bad strings (e.g. such with syntax errors). The way to do this by the concept of a<br />
formal language, which we are about to define.<br />
Formal Languages<br />
Definition 190 Let A be an alphabet, then we define the sets A + := <br />
i∈N + Ai of nonempty<br />
strings and A ∗ := A + ∪ {ɛ} of strings.<br />
63