N Regular Languages and Finite Automata - The Computer Science ...

NLecture Notes onRegular Languagesand Finite Automatafor Part IA of the Computer Science TriposProf. Andrew M. PittsCambridge University Computer Laboratoryc A. M. Pitts, 1998-2002

First Edition 1998.Revised 1999, 2000, 2001, 2002.

ContentsLearning Guideii1 Regular Expressions 11.1 Alphabets, strings, and languages . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Pattern matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Some questions about languages . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8¡2 Finite State Machines 112.1 Finite automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Determinism, non-determinism, and -transitions . . . . . . . . . . . . . . . 142.3 A subset construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Regular Languages, I 213.1 Finite automata from regular expressions . . . . . . . . . . . . . . . . . . . . 213.2 Decidability of matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Regular Languages, II 294.1 Regular expressions from finite automata . . . . . . . . . . . . . . . . . . . 294.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3 Complement and intersection of regular languages . . . . . . . . . . . . . . . 324.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 The Pumping Lemma 375.1 Proving the Pumping Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Using the Pumping Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.3 Decidability of language equivalence . . . . . . . . . . . . . . . . . . . . . . 425.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Grammars 456.1 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Backus-Naur Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.3 Regular grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Lectures Appraisal Form 53

11 Regular ExpressionsDoubtless you have used pattern matching in the command-line shells of various operatingsystems (Slide 1) and in the search facilities of text editors. Another important example ofthe same kind is the ‘lexical analysis’ phase in a compiler during which the text of a programis divided up into the allowed tokens of the programming language. The algorithms whichimplement such pattern-matching operations make use of the notion of a finite automaton(which is Greeklish for finite state machine). This course reveals (some of!) the beautifultheory of finite automata (yes, that is the plural of ‘automaton’) and their use for recognisingwhen a particular string matches a particular pattern.Pattern matchingWhat happens if, at a Unix/Linux shell prompt, you type£¥¤§¦and press return?£Suppose the current directory contains ¨©files called, and (strangely)¨©£¥ , ¨©££ , ¨© . What happens if you type£¥!© ,"£¤#¦§"and press return?Slide 11.1 Alphabets, strings, and languagesThe purpose of Section 1 is to introduce a particular language for patterns, called regularexpressions, and to formulate some important problems to do with pattern-matching whichwill be solved in the subsequent sections. But first, here is some notation and terminology todo with character strings that we will be using throughout the course.

Y%2D2 1 REGULAR EXPRESSIONSAlphabetsAn alphabet is specified by giving a finite set, $ , whose elementsare called symbols. For us, any set qualifies as a possiblealphabet, so long as it is finite.%'&)(+*-,.0/1.32.54.768.39.5:.5;.7=?Examples:%BAC(+*-D.5E-.5FG.0H0H0H7.>IJ.7K.5L?—characters of the English language.of decimal digits.%ONP(+*GQSRGQUTS%'&V?Non-example:(+*-,.0/1.32.54.0H0H0HZ?—an alphabet, because it is infinite.&XW2M:—/@,-element set of decimal digits.-element set of lower-case-element set of all subsets of the alphabet— set of all non-negative whole numbers is notSlide 2Strings over an alphabetA string of [ (\^] length ) over an $ alphabet-tuple of elements of $ , written without punctuation.[is just an orderedExample: if%_(+*-D.5EG.5F@?, then,DE,DDF, andof lengths one, two, three and four respectively.are strings overÈ EaDF$cbed-fhg iset of all strings over $ of any finite length.N.B. there is a unique string of length zero over $ , called the nullstring (or empty string) and denoted j (no matter which $ weare talking about).Slide 3

¡.s.w(q(¡%%%%((¡?1.1 Alphabets, strings, and languages 3Concatenation of stringsThe concatenation of two kml`npoS$ b strings is the kJn stringobtained by joining the strings end-to-end.Examples: q IfwPs and(rDÈ DE, s(utvD(rF`DxztMD.and w, then s"q,(rDE(rF`Dx(ytMDDEThis generalises to the concatenation of three or more strings.qqE.g. k{n|}k{n i~€-"~‚-~ƒ~¥€@"~.Slide 4Slides 2 and 3 define the notions of an alphabet, and the setof finite strings over an%…„alphabet. The length of a q string will be denoted †ˆ‡a‰Šv‹ ŒŽ q¥ by . Slide 4 defines theD’‘“%operationof concatenation of strings. We make no notational distinction between a symbol %c„ andthe corresponding string of length one over : so can be regarded as a subset of . % „Notethat is never empty—it always contains the ¡ null string, , the unique string of length zero.Note also that q for any‘“%P„q¡¡-q and Ž qshwqs"wq”Ž s"w•and †ˆ‡a‰Šv‹ ŒŽ qs ( †ˆ‡`‰Šv‹ ŒŽ q¥–r†ˆ‡a‰Šv‹ ŒŽ s .Example 1.1.1. Examples of% „for different:(i) If%_(+*-D?, then%P„contains.5D.5DD.5DDD.5DDDD.0H`HVH(ii) If%_(+*-D.5E@?, then%P„contains.5D.5EG.5DD.5DE-.5È D.5E3EG.5D"DD.5D"DEG.>DÈ D.7DÈ EG.>È D"D.7EaDE@.5EaEaD.7EaÈ E@.0HVHVH(iii) If (the empty set — the unique set with no elements), thencontaining the null string.%—(˜%•„š(›*, the set just

£„œœttRegular expressions over an alphabet $œif and Ÿ are regular expressions, then so is¡ Ÿz¢binds more tightly than £¤£ , binds more tightly than £means Ž˜Ž¡%˜, not ŽŽ˜%%, or Ž>ŽR%(%4 1 REGULAR EXPRESSIONS1.2 Pattern matchingSlide 5 defines the patterns, or regular expressions, over an alphabet that we will use.Each such regular expression, , represents a whole set (possibly an infinite set) of strings%P„in that match . The precise definition of this matching relation is given on Slide 6. Itmight seem odd to include a regular expression that is matched by no strings at all—but itis technically convenient to do so. Note that the regular ¡ expression is in fact equivalent to, in the sense that a q string ¡ matches).˜ „˜ „iff it matches ¡ (iff qœeach symbol ~ o_$ is a regular expressionis a regular expressionjœž is a regular expressionif and Ÿ are regular expressions, then so is Ÿœif is a regular expression, then so ¢ b isEvery regular expression is built up inductively, by finitely manyapplications of the above rules.(N.B. we assume ¡ ,, Ž , ,R „, andare not symbols in.)Slide 5Remark 1.2.1 (Binding precedence in regular expressions). In the definition on Slide 5we assume implicitly that the alphabet does not contain the six symbolsR „Then, concretely speaking, the regular expressions over form a certain set of strings overthe alphabet obtained by adding these six symbols to . However it makes things morereadable if we adopt a slightly more abstract syntax, dropping as many brackets as possibleand using the convention thatSo, for example,tR¦¥V§5„tR¦¥§ „tR¦¥§ „tR¨¥0§> „ , etc.£ .VŽ

%q&qœœœœœœt?two strings, k i n| , with n matching and | matching Ÿ,(¡Žtt„(1.2 Pattern matching 5Matching strings to regular expressionsk matches ~ oS$ iff k i—~no string matchesk matches j iff k i jk matches ¡ Ÿ iff k matches eitheror Ÿk matchesŸ iff it can be expressed as the concatenation ofk matches b iff either k i j , or k matches , or k can beexpressed as the concatenation of two or more strings, eachof which matchesSlide 6The definition of ‘q matchest „’ on Slide 6 is equivalent to sayingfor ©«ª A¬H0H0H someq , can be expressed as a concatenation © of q strings,, where each q® matches .q©qqThe case just means thatthat matches (so any string matching also matchesand , then the strings matching aret…(rDE(r,t „(so always t „¡ matches¡t„); and the © case). For example, ifjust means(¯/%e(°*-D.5EG.5F@?.5DEG.5DEaDE-.5DÈ DÈ DEG.HetcNote that we didn’t include a regular expression ‘± for the ’ occurring in the %²( UNIXexamples on Slide 1. However, once we know which alphabet we are referring to,say, we can get the ± effect of using the regular expression*-D&`.5DAG.0H0H0H3.5DD&vR¨DA"R5H0H0H@R¨Dwhich is indeed matched by any string inin ).%•„(becauseD&vR¨DA"R5H0H0H@R¨D is matched by any symbol8

t¾œœœœœExamples of matching, with $ í ³ ]lvµœž µ ¡ ] is just matched by ]ttt,–t¥tW&(¡tŽ,Htt¾6 1 REGULAR EXPRESSIONS] ¡ µ is matched by each symbol in $µ8 ·] ¡ µ"¢7b is matched by any string in $¸b that starts with a ‘µ ’a X] ¡ µ"¢0 ·] ¡ µ"¢3¢ b is matched by any string of even length in $ b·] ¡ µ"¢ b X] ¡ µ"¢ b is matched by any string in $ b¡ ]¢0 Xj ¡ µ"¢ ¡ µ8µ is matched by just the strings j , ] , µ , ]µ , andXjµ8µSlide 7Notation 1.2.2. The notation is quite often used for what we write as .The notation , ©+ª for , is an abbreviation for the regular expression obtained by© concatenating copies of . Thus:tR¦¥ò »½¼¹ t(¿tò »½¼q Thus q matches iff matches ©pª for some tvt"„ .We use as an abbreviation for q . Thus matchesconcatenation of one or more strings, each one matching .t „iff it can be expressed as thez¾1.3 Some questions about languagesSlide 8 defines the notion of a formal language over an alphabet. We take a very extensionalview of language: a formal language is completely determined by the ‘words in thedictionary’, rather than by any grammatical rules. Slide 9 gives some important questionsabout languages, regular expressions, and the matching relation between strings and regularexpressions.

d-fhg i ³ kÄoÅ$ b ¡ k matches 1.3 Some questions about languages 7LanguagesA (formal) language À over an alphabet $ is just a set of stringsin $cb .Thus any subset À^Á$Âb determines a language over $ .The language determined by a regular expression over $ isÀÃ ¢Two regular expressionsand Ÿ (over the same alphabet) areequivalent iff ÀÃ ¢ and ÀÃ ZŸz¢ are equal sets (i.e. have exactly thesame members).Slide 8Some questions(a) Is there an algorithm which, given a k string and a regularexpression (over the same alphabet), computes whether ork not matches ?(b) In formulating the definition of regular expressions, have wemissed out some practically useful notions of pattern?(c) Is there an algorithm which, given two regular expressionsŸ and (over the same alphabet), computes whether or notthey are equivalent? (Cf. Slide 8.)(d) Is every language of the form À¸ ¢ ?Slide 9

*qR%%t%t/ERttt%8 1 REGULAR EXPRESSIONSThe answer to question (a) on Slide 9 is ‘yes’. Algorithms for deciding such patternmatchingquestions make use of finite automata. We will see this during the next few sections.If you have used the UNIX utility ÆÇÈMÉ , or a text editor with good facilities for regularexpression based search, like È@ÊËÌÍ , you will know that the answer to question (b) on Slide 9is also ‘yes’—the regular expressions defined on Slide 5 leave out some forms of patternthat one sees in such applications. However, the answer to the question is also ‘no’, in thesense that (for a fixed alphabet) these extra forms of regular expression are definable, upto equivalence, from the basic forms given on Slide 5. For example, if the symbols of thealphabet are ordered in some standard way, it is common to provide a form of pattern fornaming ranges of symbols—for example ÎÏË £ÑÐ"Ò might denote a pattern matching any lowercaseletter. It is not hard to see how to define a regular expression (albeit a rather long one)which achieves the same effect. However, some other commonly occurring kinds of patternare much harder to describe using the rather minimalist syntax of Slide 5. The principleexample is Ó}Žcomplementation, :iffq does not match.ÓÔŽ £%•„%•„Õ'ŽIt will be a corollary of the work we do on finite automata (and a good measure of its power)that every pattern making use of the complementation operation can be replaced byan equivalent regular expression just making use of the operations on Slide 5. But why dowe stick to the minimalist syntax of regular expressions on that slide? The answer is that itreduces the amount of work we will have to do to show that, in principle, matching stringsagainst patterns can be decided via the use of finite automata.The answer to question (c) on Slide 9 is ‘yes’ and once again this will be a corollary ofthe work we do on finite automata. (See Section 5.3.)Finally, the answer to question (d) is easily seen to be ‘no’, provided the alphabetcontains at least one symbol. For in that case is countably infinite; and hence the number oflanguages over , i.e. the number of subsets of is uncountable. (Recall Cantor’s diagonalargument.) But since is a finite set, there are only countably many regular expressionsover . (Why?) So the answer to (d) is ‘no’ for cardinality reasons. However, even amongstthe countably many languages that are ‘finitely describable’ (an intuitive notion that we willnot formulate precisely) many are not of the form for any regular expression . Forexample, in Section 5.2 we will use the ‘Pumping Lemma’ to see thatq matches ÓÔŽ*-D,?is not of this form.1.4 Exercises©ÖªExercise 1.4.1. Write down an ML data type declaration for a type ×>ËyÇÈzÆzØÙzÉconstructorwhose values correspond to the regular expressions over an ×>Ë alphabet .Exercise 1.4.2. Find regular expressions over*-,.0/M?that determine the following languages:(a)q contains an even number of?’s

*qR*¡?%,¡?%%1.4 Exercises 9(b)q contains an odd number of?’sExercise 1.4.3. For which alphabetsalphabet?is the set% „of all finite strings overitself anExercise 1.4.4. If an alphabet contains only one symbol, what does look %Å(Ú*like? %…„ (Answer:see Example 1.1.1(i).) So if is the set consisting of just the null string, what is ?(The answer is not .) Moral: the punctuation-free notation we use for the concatenation oftwo strings can sometimes be confused with the punctuation-free notation we use to denotestrings of individual symbols.Tripos questions 1999.2.1(s) 1997.2.1(q) 1996.2.1(i) 1993.5.12%…„

10 1 REGULAR EXPRESSIONS

%ÞÝÞÞWÝ&AÝNÝÞ%112 Finite State MachinesWe will be making use of mathematical models of physical systems called finite automata,or finite state machines to recognise whether or not a string is in a particular language.This section introduces this idea and gives the precise definition of what constitutes a finiteautomaton. We look at several variations on the definition (to do with the concept ofdeterminism) and see that they are equivalent for the purpose of recognising whether or nota string is in a given language.2.1 Finite automataExample of a finite automatonÛvÜÛßÛMàÛváStates: ÛvÜ , Ûß , ÛMà , Ûvá .Input symbols: ~ , € .Transitions: as indicated above.Start state: .ÛvÜAccepting state(s): Û á .Slide 10The key features of this abstract notion of ‘machine’ are listed below and are illustratedby the example on Slide 10.example there are four states, labelled â, â, â, and â.¢There are only finitely many different states that a finite automaton can be in. In theWe do not care at all about the internal structure of machine states. All we care about¢is which transitions the machine can make between the states. A symbol from somefixed alphabet is associated with each transition: we think of the elements ofas input symbols. Thus all the possible transitions of the finite automaton can bespecified by giving a finite graph whose vertices are the states and whose edges have

âW&âD&âWN%çââNAâ(WEN12 2 FINITE STATE MACHINESboth a direction and a label (drawn frompossible transitions from â state are). In the example%ã(^*-D.5E-?and the only&åäæ£and&åç£æAvHIn other words, in â state the machine can either input the symbol and enter state, or it can input the symbol and enter â state . (Note that transitions from a stateback to the same state are allowed: â e.g. in the example.)£æ. In the graphicalrepresentation of a finite automaton, the start state is usually indicated by means ofa unlabelled arrow.¢There is a distinguished start state. 1 In the example it is âThe states are partitioned into two kinds: accepting states 2 and non-accepting states.¢In the graphical representation of a finite automaton, the accepting states are indicatedby double circles round the name of each such state, and the non-accepting states areindicated using single circles. In the example there is only one accepting â state, ; theother three states are non-accepting. (The two extreme possibilities that all states areaccepting, or that no states are accepting, are allowed; it is also allowed for the startstate to be accepting.)The reason for the partitioning of the states of a finite automaton into ‘accepting’ and‘non-accepting’ has to do with the use to which one puts finite automata—namely to recognisewhether or not a q string is in a particular language ( subset of ). q Given we„% „‘e%begin in the start state of the automaton and traverse its graph of transitions, using up thesymbols q in in the correct order reading the string from left to right. If we can use up all thesymbols q in in this way and reach an accepting state, q then is in the language ‘accepted’(or ‘recognised’) by this particular automaton; q otherwise is not in that language. This issummed up on Slide 11.1 The term initial state is a common synonym for ‘start state’.2 The term final state is a common synonym for ‘accepting state’.

âââcase ©WAl„„„âNâSó £ æç :£æD„„((RââNAŽ(ŽDt2.1 Finite automata 13·èé¢ , language accepted by a finite automaton èÀÃconsists of all k strings over its alphabet of input symbolssatisfying b with the start state and some acceptingÛ ÛvÜ Û ëì ÛvÜyêstate. HereÛ Üëì b Û êmeans, if ki—~ ß ~ à Gv ~ísay, that for some statestransitions of the forml Û í iÛ(not necessarily all distinct) there areÛ ßGvl Û àÛ Üë7ì Û ß Ý@ï ë7ì Û à Ý@ð ë5ìÝMîÝ@ò ëzì Û í i Û ñGñvñN.B.case ©: ââvô .(r,âvô iff ââvôç£æ(¯/âvô iff âSlide 11Example 2.1.1. õ Let be the finite automaton pictured on Slide 10. Using the notationintroduced on Slide 11 we have:Wžçaç`ç`äæ £·£8£(soDDDEC‘ÕPŽ õ^ )çaäöç`çæ £·£8£(soDD“÷ ‘ DÈÕPŽ õ^ )äöçaç`çæ £·£8£(no conclusion about Õ'Ž õ^ ).â iff âIn fact in this caseâ iff âq contains three consecutive?zH’sâV® (ø (For ) corresponds to the state in the process of reading a string in which theø last symbols read were alldetermined bythe regular expressionõ^ (+* q ÕPŽ(ù,.0/1.32’s.) So ÕPŽ õ^ coincides with the language Õ'Žt…(D{R E „ DDDD{RË „(cf. Slide 8).

œœœDââi.e. in õâD©, no state can be reached from â&&DWE‘14 2 FINITE STATE MACHINESA non-deterministic finite automaton (NFA), è ,is specified byœa finite set úJû>üûZý-þMÿ (of states)a finite set $ ÿ(the alphabet of input symbols)for each Û oSúJû>üûZý-þ ÿ and each ~ o_$ ÿ , a subsetÿÛ l ~ ¢'ÁùúJû>üûZý-þ ÿ (the set of states that can bereached from Û with a single transition labelled ~ )an element ŸMÿ oãúJûZüûZý-þMÿ (the start state)œa ¡£¢¤¢Vý¦¥”û ÿ Áùú!ûZüûZý-þMÿ subset (of accepting states)Slide 122.2 Determinism, non-determinism, and § -transitionsSlide 12 gives the formal definition of the notion of finite automaton.¨©Note ç that the functiongives a precise way of specifying õthe allowed â £æâ ô transitions â ô¨©of .5D, via: iffâ . ŽThe reason for the qualification ‘non-deterministic’ on Slide 12 is ‘because D“‘ž% in©general,‹v‹·‡for each state and each input symbol , we allow the possibilities thatâthere are no, one, or many states that can be reached in a â single transition labelled from ,has no, one, or many õelements. For example, ifcorresponding to the cases that ¨© Ž âis the NFA pictured on Slide 13, then.5D&0.5E(r˜with a transition labelled;¨©Ž â¨©Ž âlabelled&0.5D (+*;AG?i.e. in õ, precisely one state can be reached from âwith a transition¨©Ž âi.e. õ intransition labelled .WM.5D (+*WM.&`?, precisely two states can be reached from âwith a

ÝÞ³ kUo ³"~ l € b ¡ k contains three consecutive ~ ’sâHÝÞ2.2 Determinism, non-determinism, and ¡ -transitions 15Example of a non-deterministic finite automatonInput alphabet: ³"~ l € .States, transitions, start state, and accepting states as shown:Û ÜÝ Û ß Ý Û à Ý Û áThe language accepted by this automaton is the same as for theautomaton on Slide 10, namelySlide 13When each subset has exactly one element we say that õ is deterministic.This is a particularly important case and is singled out for definition on Slide 14.¨©Ž â.5DThe finite automaton pictured on Slide 10 is deterministic. But note that if we *-D.5EG.5F-? took thesame graph of transitions but insisted that the alphabet of input symbols WM.5F was (#˜ say,then we have specified an NFA not a DFA, since Ž âfor example . The moral ofthis is: when specifying an NFA, as well as giving the graph of state transitions, it is importantto say what is the alphabet of input symbols (because some input symbols may not ¨©appear inthe graph at all).When constructing machines for matching strings with regular expressions (as we willdo in Section 3) it is useful to consider finite state machines exhibiting an ‘internal’ formof non-determinism in which the machine is allowed to change state without consuming anyinput symbol. One calls such transitions ¡ -transitions and writes them asâ ôó £ æThis leads to the definition on Slide 15. Note NFAó that õ in an , , we ¡always % assume©thatis not an element of the alphabet of input symbols.

ÝÞÛëìÛ Ý Û i Mÿ Û l ~ ¢iffÛÞÞÝÞ16 2 FINITE STATE MACHINESA deterministic finite automaton (DFA)is an NFA è with the property that for each Û oSúJû>üûZý-þ ÿ and~ o_$ ÿ , the finite set ÿÛ l ~ ¢ contains exactly oneelement—call it ÿÛ l ~ ¢ .Thus in this case transitions è in are essentially specified by anext-state 1ÿ function, , mapping each (state, input symbol)-pairl ~ ¢ to the unique state ÿ Û l ~ ¢ which can be reached from ÛÛby a transition labelled :~Slide 14An NFA j with (NFA -transitions )is specified by an è NFA together with a binary relation, calledthe j -transition relation, on the set ú!ûZüûZý-þzÿ. We writeÛto indicate that the pair of statesÛ l Û ¢ is in this relation.ë ì Example (with input alphabet = ³"~ l € ):Ûß Ý ÛMà Ý ÛváÛ ÜÛÛÛÛSlide 15

EÛÛÛQ ê iff or there is a sequence Û ë ì Ûmore -transitions Û in from to i Û Û è j ÛÝÞ Û Ý(for l € oÅ$…ÿ ) iff Û ~Q©W#!QWñÞDôŽ©2.3 A subset construction 17·èé¢ , language accepted NFA èÀÃ by anconsists k of all strings $ ÿover the alphabet of input symbolswith the initial state and some acceptingÛ ÛvÜ Ûsatisfying ÛvÜstate. Here ñ£ñ is defined by:ñGñvñ Û of one orÛ Û (for ~ oÅ$…ÿ ) iff Û Ý ëìñ ññ ìëand similarly for longer stringsÝ ëìñ ñÛ When using an NFAó õSlide 16to accept a string qof input symbols, we are interested in„ ‘“%sequences of transitions in which the symbols q in occur in the correct order, but with zeroor ¡ more -transitions before or after each one. We write#â ô â"!to indicate that such a sequence exists from â state to âzô state in NFAó the . Then, by definitionis accepted by the NFAó õ iff â â holds for â the start state and â some accepting state:qsee Slide 16. For example, for NFAó the on Slide 15, it is not too hard to see that the languageaccepted consists of all strings which either contain two consecutive ’s or contain twoconsecutive ’s, i.e. the language determined by the regular Ž expression .DRË „DD{R È ED{R E „`Ž2.3 A subset constructionNote that every DFA is an NFA (whose transition relation is deterministic) and that everyNFA is an NFAó (whose ¡ -transition relation is empty). It might seem that non-determinismand ¡ -transitions allow a greater range of languages to be characterised as recognisable by afinite automaton, but this is not so. We can use a construction, called the subset construction,to convert an NFAó õ into a DFA $cõ accepting the same language (at the expense ofincreasing the number of states, possibly exponentially). Slide 17 gives an example of thisconstruction. The name ‘subset construction’ refers to the fact that there is one state of $Âõfor each subset of the setis a transitionç£æ‹%v‹·‡of states õ ofjust in case. Given two subsets, thereâMô -states reachable fromQ”.3QT&‹%v‹X‡'ô in $Âõô consists of all the õ

Qç#ÝÞ*ââÝâDÛvÜ ³ ÛGÜ l Ûß l ÛMà ³ ÛMà ³Ûß ³ Ûß ³ÛMà ³ ÛMà ³ÛvÜ l Û"ß ³ ÛGÜ l Ûß l ÛMà ³ ÛMà ³Û ß l Û à ³ Û ß ³ Û à ³W*âââW£ æóââA*ââ©AâQâ*â©18 2 FINITE STATE MACHINESâ states in via ( thevia finitely ¡ many -transitions followed by an-transitions. ¡â ô in õ-transition followed by finitely many( relation defined on Slide 16, i.e. such that we can get from â toExample of the subset construction*mÿ ) ~ €è )Û ßÛvÜl Û à ³ Û Ül Û ßl Û à ³ Û à³ Û ÜÛ àÛ ßl Û à ³ Û Ül Û ßl Û à ³ Û àl³ Û ÜSlide 17By definition, the start state of $cõis the ‹%v‹X‡'subset of whose elements are QrT+ the; and ‹%v‹·‡a subsetis an. Thus in the exampleand¡ õ$Âõ õ WM. &V. AG?states reachable by -transitions from the start state ofaccepting state of iff some accepting state of is an element ofon Slide 17 the start state isç£æäæ£DEP‘Õ'Ž õÚ because in õ : âDEP‘Õ'Ž,$cõ^ because in $Âõ :WM.&0.A@?ç£æWM.&V.AG?_äæ£AG?zHIndeed, in this case Õ'Ž õ^ (language in this case is no accident, as the Theorem on Slide 18 shows. That slide also givesthe definition of the subset construction in general.D „ E „ (Õ'ŽÕ'Ž,$cõ^ . The fact that õ and $Âõ accept the same

õœœœëd@fZg i ³0/Ú¡1/ Á°úJûZüûZý-þ ÿ d@fZg i ³ Û ¡ Ÿvÿ Û ¡£¢¤¢Vý¦¥”û *mÿ d@fZg iœoãúJûZüûZý-þ *mÿ ¡02 Û o / Û o3¡£¢¤¢Vý¦¥”û ÿ ¢a³0/Ž©¥ ©¥E; ©so âso âVââ‘GF¤; © &© A•‘GF¤;‘GF¤; ©ŽŽŽQŽQ‘©H‘#ó2.3 A subset construction 19Theorem. For each NFABè there is a DFA -šè with thesame alphabet of input symbols and accepting exactly the samestrings è as , i.e. ÀÃ .-šè¿¢ i ÀÃ ·èé¢withDefinition -šè of (refer to Slide 12):úJû>üûZý-þ *mÿ$ *mÿ d-fhg i$ ÿœ / Ýì / in -šè iff / i *mÿ_ / l ~ ¢ , where/ l ~ ¢)d@fZg i ³ Û ¡12 Û o / Û Ý *mÿ_Û in è¿¢`Ÿ* ÿSlide 18To prove the theorem on Slide 18, given any NFAó õ we have to show that Õ'Ž õ^ (Proof õ^ Õ'Ž4$Âõ^ that . Consider the case ¡ of first: ¡‘658797if ¥D , hence ‡:‹Õ'Ž4$Âõ^ . We split the proof into two halves.¥ ©, â then Õ'Ž õ^ for. Now given any non-Õ'Ž,$cõ^â somenull q stringof the formand thus ¡ , if q is accepted by õ then there is a sequence of transitions in(1)ç?>#çA@ &#çAB#‘C5D77Since it is deterministic, feedingD&7DA H0H0HZD to $cõ results in the sequence of transitions(?(?(âV‡:‹(2)ç?>æ £çA@æ £ç¤Bæ £ˆ£Q &where(1) we deduceQ &O(&F¤; ©¥E; © .5D&,QAP(&F¤; ©Q”&0.5DA, etc. By definition ofF?; ©(?(?((Slide 18), from¥E; © .5D&(uQ”&Q”&0.5DA(uQ{AH0H0H&@.5D (uQ IH

Õ(QQQ‘&QŽ© >and âA&õAŽ©©©&©&Q‘A%#óQââ&QŽâA©&20 2 FINITE STATE MACHINESand hence$cõ by .‘J58797‡:¥‹; ©(because âV‘J58797‡%:‹). Therefore (2) shows that q is acceptedProof T © ‘ ¥;that . Consider the case ¡ of first: ¡ ÕPŽ,$ÂõÚ if , 58797then; ÕPŽ,$ÂõÚ ‘e¥IHsince©âVIH#âVOHwith .IHIH OHIH âVIH‘J58797‡%:‹2.4 SummaryThe important concepts in Section 2 are those of a deterministic finite automaton (DFA) andthe language of strings that it accepts. Note that if we know that a Õ language is of the formõÚ for some DFA õ , then we have a method for deciding whether or not any givenÕ'Žq string (over the alphabet Õ of ) is Õ in or not: begin in the start state õ of and carry outthe sequence of transitions given by q reading from left to right (at each step the next state isuniquely determined õ because is deterministic); if the final state reached is accepting, thenis in Õ , otherwise it is not.qWe also introduced other kinds of finite automata (with non-determinism ¡ and -transitions) and proved that they determine exactly the same class of languages as DFAs.2.5 ExercisesExercise 2.5.1. For each of the two languages mentioned in Exercise 1.4.2 find a DFA thataccepts exactly that set of strings.Exercise 2.5.2. The example of the subset construction given on Slide 17 constructs a D„aE@„ DFAwith eight states whose language of accepted strings happens Õ'Ž to be . Give a DFAwith the same language of accepted strings, but fewer states. Give an NFA with even fewerstates that does the same job.Exercise 2.5.3. Given õ a DFA , construct a õ^ô new DFA with the same alphabet of % ©‘Ú%'„inputsymbol and with the property that q©for q all , is õ¯ô accepted q by iff is notõ accepted by .Exercise 2.5.4. õGiven two DFAs with the same alphabet of ‘ù% input „symbols,construct a õthird such DFA with qthe property that õis accepted by iff it isõaccepted õby both and of&V.. [Hint: take the states of õ to be ordered pairs Ž â&0.states with â© @.]&C‘SA'‘TTripos questions 2001.2.1(d) 2000.2.1(b) 1998.2.1(s) 1995.2.19 1992.4.9(a)‹%v‹X‡'‹v‹·‡

õ(%(tt%t%(%(%ttt213 Regular Languages, ISlide 19 defines the notion of a regular language, which is a set of strings of the form ÕPŽ õ^for some õ DFA (cf. Slides 11 and 14). The slide also gives the statement of Kleene’sTheorem, which connects regular languages with the notion of matching strings to regularexpressions introduced in Section 1: the collection of regular languages coincides with thecollection of languages determined by matching strings with regular expressions. The aim ofthis section is to prove part (a) of Kleene’s Theorem. We will tackle part (b) in Section 4.DefinitionA language is regular iff it is the set of strings accepted by somedeterministic finite automaton.Kleene’s Theorem(a) For any regular expression(cf. Slide 8)., À¸ ¢ is a regular language(b) Conversely, every regular language is the form ÀÃ ¢ forsome regular expression .Slide 193.1 Finite automata from regular expressionsõ‘“%…„q q q t (õ Õ'Ž ÕPŽ õ^NFAóVUÕPŽ U Õ'ŽU$ U ÕPŽ õ^ (ÕPŽ,$ U ÕPŽ U Õ'Ž¡NFAóGiven a regular expression , over an alphabet say, we wish to construct a DFA withalphabet of input symbols and with the property that for each , matches iff isaccepted by —so that .Note that by the Theorem on Slide 18 it is enough to construct an with theproperty . For then we can apply the subset construction to to obtain a DFAwith . Working with finite automata thatare non-deterministic and have -transitions simplifies the construction of a suitable finiteautomaton from .Let us fix on a particular alphabet and from now on only consider finite automatawhose set of input symbols is . The construction of an for each regular expressionover proceeds by recursion on the syntactic structure of the regular expression, as follows.

Thus Õ'ŽThus Õ'Žfor all ©åªõ,7&&7õAõAA&AqõõAA&qRRqq‘D,&&or qand qand each q®&&‘ãnd Õ'Ž7and ÕPŽ‘ÕPŽ õ^ ?zH, and for all regular expressions of size `Å© , there exists an NFAóR?˜„õõAAAAt22 3 REGULAR LANGUAGES, IDU‘y%(i) For each atomic form of regular expression, ¡ ( ), , andaccepting just the strings matching that regular expression., we give an NFAó(ii) Given NFAó any õ spropertyand õ, we construct a new NFAó , W!‰YX[ZG‰”Ž õ&V.with the&V.A ?zHÕPŽW!‰YX[Zv‰!Ž õÕPŽ õ> (+* q.Õ'Ž õtGA (t1&vR t-A (&V.t1& (ÕPŽ õÕ'ŽW!‰YX[ZG‰”Ž õ> when Õ'ŽÕ'Ž õ(iii) Given NFAó any õ spropertyand õ, we construct a new NFAó , \]ZG‰v‹VŽ õ&V.with the& ‘A ?zHA'‘&0.AÃRÕPŽ¤\^ZG‰v‹3Ž õÕ'Ž õ> (Ú* q.Õ'Ž õt-A (t1&>t-A (&V.tz& (> when Õ'ŽÕPŽ¤\^Zv‰ÕPŽ õv‹`Ž õÕ'Ž õ(iv) Given any NFAó õ , we construct a new NFAó ,‹_Ž õ^ with the propertyA”H0H0HqÕ'Ž‹_Ž õÚ> (+* q©¤ª‹%_Ž õ^> when Õ'ŽÕPŽThus Õ'Žtz„ (( tõÚ . Õ'ŽThus starting with step (i) and applying the constructions in steps (ii)–(iv) over and overagain, we eventually NFAó build s with the required property for every regular expression .Put more formally, one can prove the statementsuch that ÕPŽt (ÕPŽ õ^by mathematical induction © on , using step (i) for the base case and steps (ii)–(iv) for theinduction steps. Here we can take the size of a regular expression to be the number ofoccurrences of (£ union ) in it.£ ), concatenation (£Ñ£ ), or star (£Step (i) Slide 20 gives NFAs whose languages of accepted strings are Õ'Žrespectively(any, Õ'Ž and .*-D?D’‘“%), Õ'Ž ¡M (¯* ¡(r˜D (

õ&W£ æóA‘õõjust accepts the one-symbol string ~just accepts the null string, jAõA‘&A#WWâ&&‘W&AõAAõ‘A&Rq&W#!‘#!Aâ&&&&Aor qõõAAA‘‘Aõ&WAõAAA3.1 Finite automata from regular expressions 23NFAs for atomic regular expressionsÛvÜ Ý ÛßÛvÜÛvÜaccepts no stringsSlide 20NFAó õ õW!‰YX[ZG‰”Ž © õ© > ‹%v‹·‡ ‹%v‹·‡ @ &V.õ W!‰YX[ZG‰”Ž õ &V.õWJ‰aX[ZG‰”Ž ââõõ&V.õõ¡ õõ õâStep (ii) Given s and , the construction of is pictured onSlide 21. First, renaming states if necessary, we assume that and aredisjoint. Then the states of are all the states in either or , togetherwith a new state, called say. The start state of is this and its acceptingstates are all the states that are accepting in either or . Finally, the transitions ofare given by all those in either or , together with two new -transitions out of to the start states of and .W!‰YX[ZG‰”Ž õq ÕPŽ âõ‡:‹© >¥ © &V.!ÕPŽW!‰YX[ZG‰”Ž â > qõõ&V.‘J58797ÕPŽ Õ'Ž q ‘J5D77õ> õâ‡:‹©‡%:‹@ ©â> â â¡â&0.õ> qq qThus if , i.e. if we havefor some , then weget showing that . Similarly for . Socontains the union of and . Conversely if is accepted by, there is a transition sequence with or .Clearly, in either case this transition sequence has to begin with one or other of the -transitions from , and thereafter we get a transition sequence entirely in one or other ofor finishing in an acceptable state for that one. So if , theneither or . So we do indeed haveõ &`. ÕPŽW!‰YX[ZG‰”Žõ W!‰YXZG‰!Ž¥ © >&V.& ‘b58797ÕPŽW!‰YX[Zv‰JŽ õÕPŽ õÕPŽ õ&V.A ?zHÕPŽW!‰YX[ZG‰”Ž õ> (+* qÕ'Ž õÕ'Ž õ

77&õ&AA7õA&AA&©@ ¥#! @AõA7A Ÿvÿ ï èâ¥ >© #! >&AAõ&AAâqA‘A&Aq&©@ ¥#! @&qA7âàA&õ77and qAõ7õA&(Aq7#&õAq>AâA&&õ&A&&24 3 REGULAR LANGUAGES, Ic^dfe%gId·è ß l`è à ¢Ÿ ÿ î è ßÛ ÜSet of accepting states is union of ¡£¢A¢Vý¦¥”û ÿ î and ¡£¢¤¢Vý¦¥”û ÿ ï .Slide 21Step NFAó (iii)õGiven õ s and , the \^ZG‰ M‹`Ž õconstruction ofSlide 22. First, renaming states if necessary, we ‹%v‹·‡assume &V.thatdisjoint. Then the \^Zv‰ states of are all the states õ in eitheris the start õ state of&`.or õ© >andis pictured on. The start‹%v‹·‡© @arestate of \^ZG‰. The accepting states of \^Zv‰õ\^Zv‰ v‹VŽ õõ õõqqare the accepting states of . Finally, the transitions ofthose in either orthe start state of (only one such new transition is shown in the picture).Thus if and , there are transition sequencesv‹VŽ õ&c‘&V.are given by allto, together with ¡ new -transitions from each accepting state õ of¥ © AÃ‘! >v‹VŽ õin õ&V.v‹`Ž õ&V.with â© >, andin õwith â© @. These combine to yield&C‘J5D77Õ'Ž õÕPŽ õA…‘J5D77‡:‹‡:‹ó £ æ&`.&V.õ in witnessing the fact q that is accepted \]ZG‰ v‹`Ž õ by &`.\^ZG‰ v‹aŽÕPŽ¤\^ZG‰ v‹aŽ õtransition sequence witnessing s the fact that is accepted starts out in õthe states of. Con-is of this form. For any>but. At some point in the sequence one of the newversely, it is not hard to see that every sfinishes in the disjoint set of states of õto õand thus we can split s as swith qaccepted by õand qaccepted by õ¡ -transitions occurs to get from õ. So we do indeed have& ‘A ?zHA•‘&V.A¸RÕ'Ž¤\^ZG‰Õ'Ž õM‹aŽ õ> (+* qÕ'Ž õ

WWèß Ÿ ÿ ï è&qWR,and each q®à‘WWÕ'Ž õÚ ?zH3.1 Finite automata from regular expressions 25h gOd¢VüûG ·è ß l`è à ¢Ÿ ÿ îSet of accepting states is ¡£¢A¢Vý¦¥”û ÿ ï .Slide 22NFAó õ ‹%_Ž õÚõ^ õâ‹_Žõ^ ‹_Ž õÚ ‹%_Ž âõ¡ âõÚ õ‹%_Žõ âStep (iv) Given an , the construction of is pictured on Slide 23. Thestates of are all those of together with a new state, called say. The start stateof is and this is also the only accepting state of . Finally, the transitionsof are all those of together with new -transitions from to the start state ofand from each accepting state of to (only one of this latter kind of transition is shownin the picture).Clearly,one or more strings accepted õ byâ of in a transition sequence witnessing this fact allow us to s split into the concatenation ofzero or more strings, each of which is accepted õ by . So we do indeed haveõ^ accepts ¡ (since its start state is accepting) and any concatenation of‹%_Ž. Conversely, s if is accepted ‹%_Ž õÚ by , the occurrencesA”H0H0HÕPŽ‹%_Ž õ^> (+* qq©pª

t‘ ÷Û Ü Ÿvÿ èttt‘ttt26 3 REGULAR LANGUAGES, IúJû>üOi hè¿¢The only accepting úJû>üOi hè¿¢ state of is .Ü ÛSlide 23DRË „3DŽNFAó õ Õ'Ž õ^ ( D{R E „3DÕ'ŽZŽ¡This completes the proof of part (a) of Kleene’s Theorem (Slide 19). Figure 1 shows howthe step-by-step construction applies in the case of the regular expression to producean satisfying . Of course an automaton with fewer states and-transitions doing the same job can be crafted by hand. The point of the construction is thatit provides an automatic way of producing automata for any given regular expression.3.2 Decidability of matchingThe proof of part (a) of Kleene’s Theorem provides us with a positive answer to question (a)on Slide 9. In other words, it provides a method that, given any q string and regular expression, decides whether or q not matches . The method is:¢construct a DFA õ satisfying Õ'Ž õÚ ( Õ'Ž;¢ õ õâ õ õ qbeginning in ’s start state, carry out the sequence of transitions in correspondingto the string , reaching some state of (because is deterministic, there is aunique such transition sequence);otherwise q¢check whether â is accepting or not: if it is, then qÕ'Ž õÚ (, so q does not match.ÕPŽ õ^ (ÕPŽ, so q matches;Õ'ŽNote. The subset construction used to convert the NFAó resulting from steps (i)–(iv) ofSection 3.1 to a DFA produces an exponential blow-up of the number of states. ($Âõ has

Step of type (iv): ŽStep of type (iii): ŽDEçóóFigure 1: Steps in constructing an NFAó for Žóóóóóóóçäçäóóóóçäçä3.2 Decidability of matching 27Step of type (i):Step of type (i):Step of type (ii):D{R ED{RË „D{R E „3DDRË „aD

228 3 REGULAR LANGUAGES, Istates õ if © has .) This makes the method described above very inefficient. (Much moreefficient algorithms exist.)3.3 ExercisesExercise 3.3.1. Why can’t the automatonconstructed simply by õ taking and adding ¡ new -transitions back from each accepting stateto its start state?‹%_Ž õÚ required in step (iv) of Section 3.1 beExercise 3.3.2. Work through the steps in Section 3.1 to construct NFAó õ anõ^ ( R E „ DDE „. Do the same for some other regular expressions.Õ'ŽsatisfyingExercise 3.3.3. Show that any finite set of strings is a regular language.Õ'Ž>Ž ¡Tripos question 1992.4.9(b) [needs Slide 26 from Section 4]

Lemma Given an NFA è(In case t¥tR¥!˜„t, for each subset uéÁùú!ûZüûZý-þ ÿ˜©294 Regular Languages, IIThe aim of this section is to prove part (b) of Kleene’s Theorem (Slide 19).4.1 Regular expressions from finite automataõ t (õ Õ'Ž Õ'Ž õÚ tjk¤l k9mâ ol k9m qGiven any DFA , we have to find a regular expression (over the alphabet of input symbolsof ) satisfying . In fact we do something more general than this, as describedin the Lemma on Slide 24. 1 Note that if we can find such regular expressions for anyto be the whole ofand to be the start state, say, then by definition of , a string matches this regularchoice n of â , , â ô and , then the problem is solved. For n tj takingexpression iff there is £æa &0.0H0H0H3. transition sequenceâmany t accepting jstates,In other words o9lqk > (?(?(the (Ú,regular expressionof Kleene’s tTheorem. (In caseempty and so we can use the regular expression. As âvô ranges over the finitely‹M‹X‡in õ âGôsay, then we match exactly all the strings accepted by õ .R t jâ?pksr has the property we want for part (b)ol, i.e. there are no accepting states õ in , Õ'Ž õÚ then is.)and each pair of states Û l Û oãúJûZüû>ý-þ ÿexpression w?xqwy satisfyingIv, there is a regularv w?xqw y ¢ í ³ kžoy >$ ÿ ¢ b ¡ Û+ê ëì b Û in èÀÃwith all intermediatestates of the sequencein u .ÀÃ hè¿¢ i ÀÃ ¢ Hence , whereß ¡ ñGñvñ ¡ zand{ ši number of accepting states,ii v }xqw¦~ with u i úJûZüûZý-þ ÿ ,|start state,i Ÿ| i€th accepting state.Û, taketo be the regular expression.)(r,Slide 24tjcan be constructed by induc-k¤lqk9mProof of the Lemma on Slide 24. The regular expressiontion on the number of elements in the n subset .1 The lemma works just as well whether is deterministic or non-deterministic; it also works forNFA‚ s, provided we replace ƒ „…‡† by ƒ ˆ(cf. Slide 16).

â.„Ž©(¿t jŽ' k%’‘*qR(„*qR„»½¼(ét j“'ò k%9‘k¤lqk.tW/*tâtŽâŽpp‘n/((»½¼( t j“'ò k.’‘k%Alqk%R(((.âW.DW‘Wò »½¼(¿t j“' k.’‘ t?”m k%Alqk/H/n30 4 REGULAR LANGUAGES, IIBase n case, is empty. In this case, for each pair of â statesexpression to describe the set of stringsâ ô , we are looking for a regular?zHwith no intermediate statesô ââ ! £æSo each element of this set is either a single input symbol¡ possibly , in â casecan simply take(if âholds in õ âGô. If there are no input symbols that take us from â to â1ô in õâMôif â‹Šç£æ) or, wet‰m k¤lqkâvôô . â»½¼(ò¹ Õn the other hand, if there are some such input symbols,D&0.0H0H0H3.5Dp say, we can take¡ if ât‰k¤lqk9mif â‹ŠRif â ¡âvô. âvô¹ D&vRR¨D(ºa»½¼D&vRR¨D(?(?(Induction step.Suppose we have defined the required regular expressions for all subsets(?(?(is a subset ©Â– with elements, choose some âWv?’(°*Wv?element. Then for any pair of states, by inductive hypothesis we have already constructed the regular expressionsof states © with elements. n Ifand consider © the -element n&Œ‘SsetâTŠW…‘âGôand‹%v‹·‡t1&ò »½¼. t-A. t@NConsider the regular expressionk¤lqk m(¿t1&MR t-Aºa»½¼t@N „ t?”1HClearly every string matchingis in the set?zHqt"&âb! £æâ ô âq ÕPŽt1&t’ªtP–t-AÕ'Žâ t t-N£ âât?” Õ'Žq t-A t@N â„3t?” ÕPŽâvôÕ'Žq Õ'Žk¤lqk9mConversely, if is in this set, consider the number of times the sequence of transitionspasses through state . If this number is zero then (by definition of). Otherwise this number is and the sequence splits into pieces: the first pieceis in (as the sequence goes from to the first occurrence of ), the next piecesare in (as the sequence goes from one occurrence of to the next), and the last pieceis in (as the sequence goes from the last occurrence of to ). So in this case is in. So in either case is in . So to complete the induction step we can defineto be this regular expression.tjt…(yt"&MR t-At@N „3t?”â ô with all intermediate states in this sequence in nâ ! £æ4.2 An examplePerhaps an example will help to understand the rather clever argument in Section 4.1. Theexample will also demonstrate that we do not have to pursue the inductive construction of (r˜theregular expression to the bitter end (the n base case ): often it is possible to find some ofthe k¤lqk m regular expressions one needs by ad hoc arguments.t•j

tW A‘ & & llttW| x ] µ]j ~µ– ~~ b €¡ ~~ b € jW&lW A‘ & l‘tW &lA‘ W lltW A‘ & llWlA‘ & tttltW A‘ & llW A‘ & llWW A‘ W ll]ÝtWWtt–&WWl& ‘l‘‘ÝÞW& WlW& A ‘Wl& ‘„„ŽŽÞ–µ| x ] µ–/W&‘ t &l Ž2t AlWŽt AŽWA ‘‘‘tWÝWll&AWA & ‘WlA ‘‘WH‘H–,4.2 An example 31Note also that at the inductive steps in the construction of a regular expression õ forwe are free to choose which â state to remove from the current state n set . A good rule ofthumb is: choose a state that disconnects the automaton as much as possible.ExampleDirect inspection yields:Ü?˜Ü x à?˜0—1—~ b ~ b € ]µSlide 25As an example, consider the NFA shown on Slide 25. Since the start state is and thisis also the only accepting state, the language of accepted strings is that determined byAthelregular expression . Choosing to remove state from the state set, we havelWlAlAlAlAW (R t„ t(3) Õ'ŽlWlWDirect inspection shows ÕPŽ Õ'Žthat ÕPŽ and, we choose to remove state :, and ÕPŽlWÕ'ŽW (D8„& (ÕPŽD„aE. To calculateÕ'Ž„ t& (& R t„ tÕ'ŽÕ'ŽlAlAlWW (W R tThese regular expressions can all be determined by inspection, as shown on Slide 25. ThusÕ'ŽÕ'Ž& (R¨DE"R DD „ Eand it’s not hard to see that this is equal to Õ'Ž ¡R DD„aE; andÕPŽÕPŽ ¡Ž ¡M>W (˜R¨DDD „ÕPŽÕPŽŽ ¡v>

œœœœ%tW &lA‘ W lltttransitions of šg û- hè¿¢ = transitions of èstart state of šg û@ hè¿¢ = start state of èqtt32 4 REGULAR LANGUAGES, IIwhich is equal to ÕPŽDDD „. Substituting all these values into (3), we getW (D „ R D „ ER DD „ E „ DDD „ HÕ'ŽÕPŽŽ ¡Ž ¡ Sois a regular expression whose matching strings comprise thelanguage accepted by the NFA on Slide 25. (Clearly, one could simplify this to a smaller, butequivalent regular expression (in the sense of Slide 8), but we do not bother to do so.)D„zR¨D8„aER DD„aE „aDDD„4.3 Complement and intersection of regular languagesWe saw in Section 3.2 that part (a) of Kleene’s Theorem allows us to answer question (a)on Slide 9. Now that we have proved the other half of the theorem, we can say more aboutquestion (b) on that slide.Complementation Recall that on page 8 we mentioned that for each regular expressionover an alphabet , we can find a regular Ó}Žexpression that determines the complementof the language determined by :‘ % „ R‘ ÷t ?zHÕPŽ½Ó}Ž> (¯* qÕ'ŽAs we now show, this is a consequence of Kleene’s Theorem.šgûG ·èé¢úJû>üûZý-þ› œž Ÿ ÿG¡ d@fZg iúJû>üûZý-þMÿ$…ÿ$¢› œž Ÿ ÿG¡ d@fZg iProvided èis a deterministic finite automaton, then k is¡£¢¤¢Vý¦¥”û ›£œ'ž.Ÿ ÿG¡ í ³ Û oãúJûZüûZý-þ1ÿ ¡ Û¥¤ o¡£¢A¢Vý¦¥”û ÿ .œaccepted by šg û@ hè¿¢ iff it is not accepted by è :šgÀ¸·èé¢3¢ í ³ kÄoÅ$ b ¡ k ¤ ožÀÃ ·èé¢a .ûGSlide 26

q‘ ÷this Ó}Ž5(Õ?tõA&&ÕAt5ÕAAt5qõqA‘ ÷õtA&q‘Õ&Aand q%*q‘Õ*qqtÕ(q&‘ ÷q5&‘ ÷%ÕA*?qAõA4.3 Complement and intersection of regular languages 33Lemma 4.3.1. Õ If is a regular language over alphabetis also regular., then its complement‘ % „ RProof. Õ Since is regular, by definition there is a õ DFAof strings accepted ¦VZG‹`Ž õ^ by and hence is regular.be the DFA constructed from õsuch Õ thatas indicated on Slide 26. Thenis the setõ^ . Let ¦VZG‹VŽ õ^‘p%…„ÃR Õ'Žthat ÕPŽGiven a regular expression, by part (a) of Kleene’s Theorem there is a DFA õ such( tõÚ . Then by part (b) of the theorem applied to the DFA ¦VZG‹VŽ õ^ , we canÕ'Žfind a regular Ó Žexpression so Õ'Ž½Ó}Ž > ( Õ'Ž%¦‡Zv‹aŽ õÚ> that . Since‘“% „ R‘“% „ R‘ ÷t ?z.is the regular expression we need for the complement of.Õ'Ž%¦‡Zv‹aŽ õÚ> (+* qÕ'Ž õ^ ?C(+* qÕPŽNote. The construction given on Slide 26 can be õapplied ‘ to % a „ finite R automatonnot ÕPŽ%¦VZG‹`Ž õ^> it is deterministic. However, for to equalto be deterministic. See Example 4.4.2.õÕ'Ž õ^ ?whether orwe needAs another example of the power of Kleene’s Theorem, given regular expres-we can show the existence of a regular Ž expression with the property:Intersectionsions andt1&t-At&9§•t-Atz&’§•t-Aiff q matchestz&and q matchestGA.q matches ŽThis can be deduced from the following lemma.Lemma 4.3.2. Õ Ifintersectionand Õare a regular languages over an alphabet, then their&©¨‘“% „ RAG?»½¼( *òis also regular.Õ Õõ( /1.32&V.õ® Õ'Ž õž® Õ õõ õ ‘“%P„(ø‰«ªŽ &V.&`.&«¨‰«ª¥Ž AB(õ&`. qõ ÕõProof. Since and are regular languages, there are DFA and such that). Letbe the DFA constructed from and as onSlide 27. It is not hard to see thathas the property that any is acceptedbyiff it is accepted by both and . Thusis a regular language.‰«ª¥Ž õÕ'Ž‰«ªŽ õ>

õ*q&5q‘ ÷œà ßoSúJû>üûZý-þMÿ î and ÛMà oãúJûZüûZý-þMÿ ïÛßand èalphabet of ¡ œàœœœAand Û à¢ë ì Ýstart state of ¡õAtÝ®(%à¡ßà&ß2 / /2 / 2ßß,àà¢àA5õAßßß%234 4 REGULAR LANGUAGES, IId¬hèlVèd“¬¢ are all ordered pairsÛ ßl Û àstates of ¡¢ with·èlVèd“¬hèlVè¢ is the common alphabet of èÛ ßëì Û ß l Û àin èl Û àà ¢ in ¡iff Û ß Ý ë ì ¢Û ßin è·èlVèd“¬Û d“¬·è ß l`è à ¢ is ZŸ ÿ î l0Ÿ ÿ ï ¢d“¬Û ßl Û à¢ accepting in ¡hèlVè¢ iff Û ß accepting in èand Û à accepting in èà.Thus given regular expressions andDFA õ and Õ'Ž ÕPŽ õž® (øwith tz&’§•t-Aõfind a regular expressionÕPŽt1&9§•t-Aso &V.thatiff, as required.t-A‰«ªŽ õt"&accepts q , iff both õSlide 27, by part (a) of Kleene’s Theorem we can find). Then by part (b) of the theorem we cant-A(«/1.32and õtz&¦§…t-A (&V.õ > . Thus q matchesÕPŽtz&‰®ª¥Žaccept , q iff matches bothq and4.4 ExercisesExercise 4.4.1. Use the construction in Section 4.1 to find a regular expression for the *-,.0/1.32?DFAwhose state set is , whose start state is , whose only accepting state is , whosealphabet of input symbols is , and whose next-state function is given by the followingF table. D E ©°¯/ 2 ,*-D.5E-?Exercise 4.4.2. The õ ± construction ¦VZG‹VŽ õÚ given on Slide 26 applies to both DFA andNFA; but for æ õÚ> to be the complement of Õ'Ž õÚ we need õÕ'Ž%¦‡Zv‹aŽGive an example of an alphabet‘“%C„¸RÕPŽ õ^ ?and a NFA õis not the same set as ÕPŽ%¦VZG‹`Ž õ^> .to be deterministic.with set of input symbols , such that

q‘ ÷ŽŽt%tt‘4.4 Exercises 35Exercise 4.4.3. Let. Find a complement for over the %r( *-D.5E-?alphabet, i.e. a regular ÓÔŽexpressions over the alphabet ÕPŽ½ÓÔŽsatisfying. Do the same for the alphabet .% „ Rt ?Õ'Žt´(D{R E „ DED{R E „*-D.5EG.5F@?Tripos questions 1995.2.20 1994.3.3 1988.2.3 2000.2.7> ( * q

36 4 REGULAR LANGUAGES, II

œœí€í¡ [Å\^]DDEDE375 The Pumping LemmaIn the context of programming languages, a typical example of a regular language (Slide 19)is the set of all strings of characters which are well-formed tokens (basic keywords, identifiers,etc) in a particular programming language, Java say. By contrast, the set of all strings whichrepresent well-formed Java programs is a typical example of a language that is not regular.Slide 28 gives some simpler examples of non-regular languages. For example, there is noway to use a search based on matching a regular expression to find all the palindromes in apiece of text (although of course there are other kinds of algorithm for doing this).Examples of non-regular languagesparentheses ‘ ’ and ‘¢ ’ occur well-nested.GvThe set of strings over ³ 5l@¢Vl ~ l € llA²¥ in which thei.e. which read the same backwards as forwards.GvThe set of strings over ³"~ l € ll'² which are palindromes,œ ³"~Slide 28The intuitive reason why the languages listed on Slide 28 are not regular is that a machinefor recognising whether or not any given string is in the language would need infinitely manydifferent states (whereas a characteristic feature of the machines we have been using is thatthey have only finitely many states). For example, to recognise that a string is of the formone would need to remember how many s had been seen before the first is encountered,requiring countably many states of the form ‘just © seen s’. This section make this intuitiveargument rigorous and describes a useful way of showing that languages such as these arenot regular.The fact that a finite automaton does only have finitely many states means that as we lookat longer and longer strings that it accepts, we see a certain kind of repetition—the pumpinglemma property given on Slide 29.

wqõ&sqA&sqall |óUÀœœœ&s(with ´ ýfor [Å\^] all k & ,qq&qÕßßqn && q‘íkà,©/ßà&ß,&(sqAà&38 5 THE PUMPING LEMMAThe Pumping LemmaFor every regular language À , there is a number ³ \´µ satisfyingthe pumping lemma property:concatenation of three strings, | i k, where k, n and kdµsatisfy:û·¬ X| ¢•\K³ can be expressed as ankdaµ´ ý·n¥¢•\´µ û·¬ş Š ¡ (i.e. )daµû·¬ ·k(i.e. qÕ ,Õ [but we knew that anyway],n¥¢º¹K³´ ýA'‘A•‘oUÀA'‘s"qs"s"qÕ ,Õ ,etc).A'‘ss"sqSlide 295.1 Proving the Pumping LemmaÕ Since is regular, it is equal to the Õ'Ž õ^ set of strings accepted by some õ DFAcan take the » number mentioned on Slide 29 to be the number of states õ(ÚD&5DA”H0H0HZDin with ©Uª¼» . If w. Then we. For supposews (½¸`L»q©Äªthe top of Slide 30. Then can be split into three pieces as shown on that slide. Note that. So it just remainsto check that for all . As shown on the lower half of Slide 30, the stringby choice ø of ½ and †ˆ‡`‰Šv‹ ŒŽ s ( ½ £ øPªA}‘ ,and †ˆ‡`‰Šv‹ ŒŽ qÕPŽ õ^ , then there is a transition sequence as shown ats õ â0® â¾ © ¥ â0®© (â?¿ âV® © â@® ‘658797‡%:‹ âV A'‘ â0®takes the machine from state back to the same state (since ). So for any ,takes us from the initial state to , then times round the loop from toitself, and then from to is accepted by, i.e. qÕ .. Therefore for any ©Uª, qNote. In the above construction it is perfectly possible ø that¡ null-string, .(«,, in which case qis the

klßÊà‘Õ,5.2 Using the Pumping Lemma 39If [Å\K³ inumber of states of è , then instatesŸvÿ i Û Ü ÝMîÝ@ò ëzì Û í o¡£¢A¢Vý¦¥”û ÿñvñGñë7ì Û ß Ý@ïÛ à ñGñvñ Ý¤Àë5ìëXìÛEÁÂ ÃÄ ÅÁYÆ ßÛ for someÛ ÜGv]G¹Û Ácan’t all be distinct states. So Û | il¹K³ . So the above transition sequence looks likeÈÇ¼ÉÛ |ÿ i Û Ürê î ëVì b Ÿi Û ê ï ëVì b Û í o¡£¢A¢Vý¦¥”û ÿbwhered@fZg i ~ ß Gv ~a| n d-fhg i ~a| Æ ß vG ~ kd-fhg i~ Æ ß vv ~ í Slide 30Remark 5.1.1. One consequence of the pumping lemma property of Õ and » is that if thereis any string w in Õ of length ªL» , then Õ contains arbitrarily long strings. (We just ‘pumpup’ w by increasing © .)If you did Exercise 3.3.3, you will know that Õ if is a finite set of strings then it is regular.In this case, what is the » number with the property on Slide 29? The answer is that we cantake » any strictly greater than the length of any string in the finite Õ set . Then the Pumping†ˆ‡`‰Šv‹ ŒŽ w• ª€» with forLemma property is trivially satisfied because there are w nowhich we have to check the condition!5.2 Using the Pumping LemmaThe Pumping Lemma (Slide 5.1) says that every regular language has a certain property—namely that there exists a » number with the pumping lemma property. So to show thata Õ language is not regular, it suffices to show that » ª no possesses the pumpinglemma property for the Õ language . Because the pumping lemma property involves quite acomplicated alternation of quantifiers, it will help to spell out explicitly what is its negation.This is done on Slide 31. Slide 32 gives some examples.

ÍÍÌÎÏÍÍÀ ß d@fZg i ³"~(i)Õßthere is some [Å\^] for which k ß nNí€í///ÕÕ&&í240 5 THE PUMPING LEMMAHow to use the Pumping Lemma to provethat a language Àis not regularFor each ³’\´µ , find some |«oUÀof length \K³ so that(Ë )no matter how | is split into three, | i k ß nk à,with ´ ýû·¬ ·n¥¢•\´µ ,daµdaµû·¬ ·kn¥¢8¹K³ and ´ ýk àis not in À .Slide 31Examples[For »•ª eachon Slide 31.],is of length ª6» and has property (Ñ )¡ [Å\^] is not regular.DÐ>EÐB‘[For each »•ª(Ñ ).],is of length ª6» and has propertyd@fZg(ii) À ài ³ |ó ³"~ l € b ¡ | a palindrome is not regular.DÐ>È DÐB‘À á d@fZg i(iii)[For each »•ªD× ‘, we can find a Ô prime ÔÖÕ withhas ª6» length and has (Ñ property ).]³"~OÒ“¡?Óprime is not regular.» and thenSlide 32

w¥2¥Õ–&2o//&s/Õ/&qq&&(because » £s×‘¥ÕqNtq£&s¥W(rD Ú D oÛ× A/q¥¥(Hq/Ú&q2¥/HÚ&s¥Žq–&AH/Út×Õ(@¾/×¥&`/t–&&q&s&×q¥A(D(q&&(&/Aq&A5.2 Using the Pumping Lemma 41Proof of the examples on Slide 32. We use the method on Slide 31.»ÑªwÕªØ» ( DIÐ7E9Ð(Ñ ww s"q& (²DÚwq s q© qsÕq(i) For any , consider the string . It is in and has length . We showthat property ) holds for this . For suppose is split as with. Then must consist entirely of s, soand say, and hence . Then the case of is not in sinceŒŽ q sÙǸ» and †ˆ‡a‰Šv‹ ŒŽ s ª(rDA (rDÐ†ˆ‡a‰ŠM‹( DOÐ7E9ÐEÐ(r,H oA (A (rD ÚD Ð(rD ÐH o E ÐH o E ÐandDÐ‘ EÐÂ÷Š(» (since¥'().†ˆ‡`‰Šv‹ Œ¥Ž s)ªH o(ii) The argument is very similar to that for example (i), but ( DÐ>È DÐstarting (²,with A the (²DÐ palindrome©. qOnce again, the qH oEaDÐof yields a string which is). »not a palindrome (because » £Š((iii) Given » ª, since there are infinitely many Ô primes , we can certainly find one satisfyingis split w ashas property (Ñ ). For suppose w» . I wÔ¥Õ claim thatq s8`¼» with †ˆ‡a‰Šv‹ ŒŽ sPªand A †ˆ‡a‰Šv‹ ( ŒŽq that , we ŒŽ †ˆ‡a‰Šv‹ have. Letting†ˆ‡a‰Šv‹ ŒŽ qandsqŒŽ s , so †ˆ‡a‰Šv‹(—D ×(D ×ò »½¼ò »½¼Ô £H o (rD ÛÝo ¾H osÜ HH oH o Ü D ×ÜÛH o (rD oH oNow ŽTherefore qÔ ££ VŽÞÔ£ » A’÷ »» ªÔßÕ »©is not prime, because(since by choice, andwhen .(since( ¥( ¥ŒŽ s¤ª †ˆ‡a‰Šv‹ŒŽ q †ˆ‡a‰Šv‹) ands`à» ).Ô £Remark 5.2.1. Unfortunately, the method on Slide 31 can’t cope with every non-regularlanguage. This is because the pumping lemma property is a necessary, but not a sufficientcondition for a language to be regular. In other words there do exist Õ languages for which a»}ª number can be found satisfying the pumping lemma property on Slide 29, but whichnonetheless, are not regular. Slide 33 gives an example of such Õ an .

TAqT‘%ãíí/€&íqTAq(T&42 5 THE PUMPING LEMMAExample of a non-regular languagethat satisfies the ‘pumping lemma property’³"‚

ERDEŽDE5.4 Exercises 43Lemma If a è DFA accepts any string at all, it accepts onewhose length is less than the number of states è in .Proof. è Supposeempty, then we can find an element of it of shortest length,has ³ states (so ³Ô\´µ ). If ÀÃ ·èé¢ is notß ~ à Gv ~ísay (where [Å\^] ). Thus there is a transition~sequenceÿ i Û Ü ÝMî ë5ì Û ß Ý@ï ë5ìŸà ñvñvñ Ý@ò ë1ì Û í o¡£¢¤¢Vý¦¥”û ÿ .ÛIf [Å\K³ , then not all the [åäµ states in this sequence can bedistinct and we can shorten it as on Slide 30. But then we wouldobtain a strictly shorter string in À¸ hè¿¢ contradicting the choice of~ ß ~ à Gv ~í. So we must have [ Ç ³ .Slide 345.4 ExercisesExercise 5.4.1. Show that the first language mentioned on Slide 28 is not regular.Exercise 5.4.2. Show that there is no õ DFA for Õ'Ž õ^ which is the language on Slide 33.[Hint: argue by contradiction. If there were such õ an , consider the õ ô DFA with the samestates õ asthose õ of, with alphabet of input symbols just consisting ofwhich are labelled by or , with start stateand , with transitions all(where is the start© © ¥ F ¥ .5F ©*-D õ ,?õ state of ), and with the same accepting states as . Show that the language accepted byõ¯ô has to beand deduce that no such õcan exist.]Exercise 5.4.3. Check the claim made on Slide 33 that the language mentioned there satisfiesthe pumping lemma property of Slide 29 » with .(¯/©¤ªTripos questions 2001.2.7 1999.2.7 1998.2.7 1996.2.1(j) 1996.2.8 1995.2.271993.6.12 1991.4.6 1989.6.12

44 5 THE PUMPING LEMMA

.ñçç.æçññË.ççç.çÆØØçç.æææææææææöññ.çËöÆöÆÈÆçÍØ..öÍ..æç.IØØæÈ.I.ëH456 GrammarsWe have seen that regular languages can be specified in terms of finite automata that acceptor reject strings, and equivalently, in terms of patterns, or regular expressions, which stringsare to match. This section briefly introduces an alternative, ‘generative’ way of specifyinglanguages.6.1 Context-free grammarsSome production rules for ‘English’ sentencesæê0ëaìëðëìØOçè"ØOçIé1ØØ0éèîí"ØïØ0éèØ0éèïIèaòéó"ØççôOõïæêIëìðêñ æØ0éèïIèaòéó"ØççôOõïðëìðêñ æïOèòéó"Øæ÷öøïOèòéó"ØçôõOïðêñ æðêçôõOïØIéèaò

æ,æÈÈÈÈÈÈÈÈÈöööööööÈööööööööÍÍ%ÍñöÍöÍÆçØçØ46 6 GRAMMARSThe idea is that we begin with the start symbol æ ØIçè"ØOçOézØ and use the productions tocontinually replace non-terminal symbols by strings. At successive stages in this process wehave a string which may contain both terminals and non-terminals. We choose one of thenon-terminals in the string and a production which has that non-terminal as its left-hand side.Replacing the non-terminal by the right-hand side of the production we obtain the next stringin the sequence, or derivation as it is called. The derivation stops when we obtain a stringcontaining only terminals. The set of strings over that may be obtained in this way fromthe start symbol is by definition the language generated the context-free grammar.A derivationæê0ëaìëð•ëaìØOçOèzØOçIé1ØØIéè í"ØOïØ0éèæ÷ñðêñ æëð•ëaìïIèaòé•ó"Ø ççôOõïØí"ØOïØIéèÈ‡ççôõïØÿíØïØIéèæPöøðêñ æë ðëaìÈ‡ççôõïØ ÈËØIéèæPöøðêñ æðëaìØ0éèaò

Ëæ¢Æö6.2 Backus-Naur Form 47book, for example) has productions of the q formterminals and non-terminals. For example a production of the forms where q and s are arbitrary strings ofñù ìö æý1þwould allow occurrences of ‘ìØIéèaòEíØ ’ that occur between ‘Ë ’ and ‘ÌMË ’ to be replacedby ‘ ý1þ ñù ’, deleting the surrounding symbols at the same time. This kind of production is notÆpermitted in a context-free grammar.ØIéèaò

Éþþ–æ£ æÙô ûý(5) Ù…–rŽ Ù ôÏô(6) Ù…–rŽ Ù ô ôÉÉþNon-terminal: ¢Start symbol: ¢Productions: ¢íææææ€í±þ¡ [Å\^]¢€48 6 GRAMMARSnamely:ûý}æûý}æûýÈ1ÙzÉÈ1ÙzÉÈzÙ1ÉÉ È1Ù1ÉŽ½ÈzÙ1É8The language generated by this grammar is supposed to represent certain arithmetic expressions.For exampleÈ1ÙzÉis in the language, butis not. (See Exercise 6.4.2.)A context-free grammar for the language³"~Terminals:~ €) ) i j ¡"~Slide 38

¥ © æset of terminals = $ ÿâ&æEset of non-terminals = úJûZüûZý-þ ÿR%ââ&Aææ%©H6.3 Regular grammars 496.3 Regular grammarsA language over an alphabet is Õcontext-free iff is the set of strings generated Õbysome context-free grammar (with set of terminals ). The context-free grammar on *-DSlide ,?38©Úªgenerates the language. We saw in Section 5.2 that this is not a regularlanguage. So the class of context-free languages is not the same as the class of regularlanguages. Nevertheless, as Slide 39 points out, every regular language is context-free. Forthe grammar defined on that slide clearly has the property that derivations from % the „startsymbol to a string in must be of the form of a finite number of productions of the firstkind followed by a single production of the second kind, i.e.D&D&5DAD&5DA”H0H0H>DD&5DA!H0H0H>D(?(?(where in õthe following transition sequence holds‘J58797© çE> ¥æ £çA@æ £çABæ ££(?(?(âVæâV‡%:‹Thus a string is in the language generated by the grammar iff it is accepted by õ .Every regular language is context-freeGiven a DFA è , the set À¸ hè¿¢ of strings accepted by è can begenerated by the following context-free grammar:start symbol = start state è ofproductions of two kinds:ì ~ Ûwhenever Û Ý ëìÛin èÛì j whenever Û o3¡£¢¤¢Vý¦¥”û ÿÛSlide 39

ââ&ææâââ£ ì k¥¤Nâç§¦æ £&&/H(50 6 GRAMMARSDefinition A context-free grammar is regular iff all itsproductions are of the formorì k £k where is a string of terminals and £and ¤ are non-terminals.Theorem(a) Every language generated by a regular grammar is aregular language (i.e. is the set of strings accepted by someDFA).(b) Every regular language can be generated by a regulargrammar.Slide 40It is possible to single out context-free grammars of a special form, called regular (orright linear), which do generate regular languages. The definition is on Slide 40. Indeed, asthe theorem on that slide states, this type of grammar generates all possible regular languages.Proof of the Theorem on Slide 40. First note that part (b) of the theorem has already beenproved, because the context-free grammar generating ÕPŽ õ^ on Slide 39 is a regular grammar(of a special kind).To prove part (a), given a regular grammar we have to construct a õ DFA whose setof accepted strings coincides with the strings generated by the grammar. By the SubsetConstruction (Theorem on Slide 18), it is enough to construct NFAó an with this property.This makes the task much easier. We take the states õ of to be the non-terminals, augmentedby some extra states described below. Of course the alphabet of input symbols õ of shouldbe the set of terminal symbols of the grammar. The start state is the start symbol. Finally, thetransitions and the accepting states õ of are defined as follows.(i) For each production of the â formfresh â states, we add © £&0.with †ˆ‡a‰ŠM‹ ŒŽ q’ªAG. NM.0H0H0H3. qâMô, q say withto the automaton and transitions(¿D&7DA”H0H0H>D©ÖªâVIHçE>æ £çA@æ £çABæ ££(?(?(hâVIHâ ô(ii) For each production of the â form-transition ¡¡ , we add anqâ ô with †ˆ‡a‰Šv‹ ŒŽ q (°, , i.e. with qHó £ æâ ô

(iii) For each production of the form ââââæâæ&ââââWWW&âAææææç¨¦æ £¡FââW& âNW/#!H(©/6.4 Exercises 51we add © fresh states â, say q(rD¥&7DA”H0H0H>D†ˆ‡`‰Šv‹ ŒŽ q¥ ªq withto the automaton and transitions ầ, with ©pª&0.A-.NM.0H0H0Ha.ç¤Bæ £ˆ£ç?>æ £ç'@Moreover we make the state â@ accepting.(iv) For each production of the â formin any new states or transitions, but we do â make an accepting state.q with †ˆ‡`‰Šv‹ Œ¥Ž q (+, , i.e. with q¡ , we do not addIf we have a transition sequence õ in of the â formâ ‡:‹with , wecan divide it up into pieces according to where non-terminals occur and then convert eachpiece into a use of one of the production rules, thereby forming a derivation q of in thegrammar. Reversing this process, every derivation of a string of terminals can be convertedinto a transition sequence in the automaton from the start state to an accepting state. ThusNFAó this does indeed accept exactly the set of strings generated by the given regulargrammar.¥ ©‘ 5D77£ æ(?(?(âV6.4 ExercisesExercise 6.4.1. Why is the string (4) not in the language generated by the context-freegrammar in Section 6.1?Exercise 6.4.2. Give a derivation showing that (5) is in the language generated by thecontext-free grammar on Slide 37. Prove that (6) is not in that language. [Hint: show thatif q is a string of terminals and non-terminals occurring in a derivation of this grammar andthat ‘ô ’ occurs in q , then it does so in a substring of the form s8ô , or sô ô , or sô ôÏô , etc., where s iseither Ù or ûý .]Exercise 6.4.3. Give a context-free grammar generating all the palindromes over the *-D.5E@?alphabet(cf. Slide 28).Exercise 6.4.4. Give a context-free grammar generating all the regular expressions over thealphabet .*-D.5E-?Exercise 6.4.5. Using the construction given in the proof of part (a) of the Theorem onSlide 40, convert the regular grammar with start â symbol and productionsDEDEinto an NFAó whose language is that generated by the grammar.Exercise 6.4.6. Is the language generated by the context-free grammar on Slide 35 a regularlanguage? What about the one on Slide 37?Tripos questions 1997.2.7 1996.2.1(k) 1994.4.3 1991.3.8 1990.6.10

52 6 GRAMMARS

1. Name of Lecturer: ÇùæöòñôÍ©ÈñöLectures Appraisal FormIf lecturing standards are to be maintained where they are high, and improved where theyare not, it is important for the lecturers to receive feedback about their lectures.Consequently, we would be grateful if you would complete this questionnaire, and eitherreturn it to the lecturer in question, or to the Student Administrator, Computer Laboratory,William Gates Building. Thank you.2. Title of Course: éË3. How many lectures have you attended in this series so far? . . . . . . . . . . . . . . . . . . . . . . .è¼ôËzÇïÈzÆ1üË1Ç&ó8ËDo you intend to go to the rest of them?Æ8ËzÆÈ8ÍrË©0ýYû©û

N Regular Languages and Finite Automata - The Computer Science ...

Create successful ePaper yourself

Delete template?

Save as template?