10.07.2015 Views

N Regular Languages and Finite Automata - The Computer Science ...

N Regular Languages and Finite Automata - The Computer Science ...

N Regular Languages and Finite Automata - The Computer Science ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

NLecture Notes on<strong>Regular</strong> <strong>Languages</strong><strong>and</strong> <strong>Finite</strong> <strong>Automata</strong>for Part IA of the <strong>Computer</strong> <strong>Science</strong> TriposProf. Andrew M. PittsCambridge University <strong>Computer</strong> Laboratoryc A. M. Pitts, 1998-2002


First Edition 1998.Revised 1999, 2000, 2001, 2002.


ContentsLearning Guideii1 <strong>Regular</strong> Expressions 11.1 Alphabets, strings, <strong>and</strong> languages . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Pattern matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Some questions about languages . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8¡2 <strong>Finite</strong> State Machines 112.1 <strong>Finite</strong> automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Determinism, non-determinism, <strong>and</strong> -transitions . . . . . . . . . . . . . . . 142.3 A subset construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 <strong>Regular</strong> <strong>Languages</strong>, I 213.1 <strong>Finite</strong> automata from regular expressions . . . . . . . . . . . . . . . . . . . . 213.2 Decidability of matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 <strong>Regular</strong> <strong>Languages</strong>, II 294.1 <strong>Regular</strong> expressions from finite automata . . . . . . . . . . . . . . . . . . . 294.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3 Complement <strong>and</strong> intersection of regular languages . . . . . . . . . . . . . . . 324.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 <strong>The</strong> Pumping Lemma 375.1 Proving the Pumping Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Using the Pumping Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.3 Decidability of language equivalence . . . . . . . . . . . . . . . . . . . . . . 425.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Grammars 456.1 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Backus-Naur Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.3 <strong>Regular</strong> grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Lectures Appraisal Form 53


11 <strong>Regular</strong> ExpressionsDoubtless you have used pattern matching in the comm<strong>and</strong>-line shells of various operatingsystems (Slide 1) <strong>and</strong> in the search facilities of text editors. Another important example ofthe same kind is the ‘lexical analysis’ phase in a compiler during which the text of a programis divided up into the allowed tokens of the programming language. <strong>The</strong> algorithms whichimplement such pattern-matching operations make use of the notion of a finite automaton(which is Greeklish for finite state machine). This course reveals (some of!) the beautifultheory of finite automata (yes, that is the plural of ‘automaton’) <strong>and</strong> their use for recognisingwhen a particular string matches a particular pattern.Pattern matchingWhat happens if, at a Unix/Linux shell prompt, you type£¥¤§¦<strong>and</strong> press return?£Suppose the current directory contains ¨©files called, <strong>and</strong> (strangely)¨©£¥ , ¨©££ , ¨© . What happens if you type£¥!© ,"£¤#¦§"<strong>and</strong> press return?Slide 11.1 Alphabets, strings, <strong>and</strong> languages<strong>The</strong> purpose of Section 1 is to introduce a particular language for patterns, called regularexpressions, <strong>and</strong> to formulate some important problems to do with pattern-matching whichwill be solved in the subsequent sections. But first, here is some notation <strong>and</strong> terminology todo with character strings that we will be using throughout the course.


Y%2D2 1 REGULAR EXPRESSIONSAlphabetsAn alphabet is specified by giving a finite set, $ , whose elementsare called symbols. For us, any set qualifies as a possiblealphabet, so long as it is finite.%'&)(+*-,.0/1.32.54.768.39.5:.5;.7=?Examples:%BAC(+*-D.5E-.5FG.0H0H0H7.>IJ.7K.5L?—characters of the English language.of decimal digits.%ONP(+*GQSRGQUTS%'&V?Non-example:(+*-,.0/1.32.54.0H0H0HZ?—an alphabet, because it is infinite.&XW2M:—/@,-element set of decimal digits.-element set of lower-case-element set of all subsets of the alphabet— set of all non-negative whole numbers is notSlide 2Strings over an alphabetA string of [ (\^] length ) over an $ alphabet-tuple of elements of $ , written without punctuation.[is just an orderedExample: if%_(+*-D.5EG.5F@?, then,DE,DDF, <strong>and</strong>of lengths one, two, three <strong>and</strong> four respectively.are strings overÈ EaDF$cbed-fhg iset of all strings over $ of any finite length.N.B. there is a unique string of length zero over $ , called the nullstring (or empty string) <strong>and</strong> denoted j (no matter which $ weare talking about).Slide 3


¡.s.w(q(¡%%%%((¡?1.1 Alphabets, strings, <strong>and</strong> languages 3Concatenation of strings<strong>The</strong> concatenation of two kml`npoS$ b strings is the kJn stringobtained by joining the strings end-to-end.Examples: q IfwPs <strong>and</strong>(rDÈ DE, s(utvD(rF`DxztMD.<strong>and</strong> w, then s"q,(rDE(rF`Dx(ytMDDEThis generalises to the concatenation of three or more strings.qqE.g. k{n|}k{n i~€-"~‚-~ƒ~¥€@"~.Slide 4Slides 2 <strong>and</strong> 3 define the notions of an alphabet, <strong>and</strong> the setof finite strings over an%…„alphabet. <strong>The</strong> length of a q string will be denoted †ˆ‡a‰Šv‹ ŒŽ q¥ by . Slide 4 defines theD’‘“%operationof concatenation of strings. We make no notational distinction between a symbol %c„ <strong>and</strong>the corresponding string of length one over : so can be regarded as a subset of . % „Notethat is never empty—it always contains the ¡ null string, , the unique string of length zero.Note also that q for any‘“%P„q¡¡-q <strong>and</strong> Ž qshwqs"wq”Ž s"w•<strong>and</strong> †ˆ‡a‰Šv‹ ŒŽ qs ( †ˆ‡`‰Šv‹ ŒŽ q¥–r†ˆ‡a‰Šv‹ ŒŽ s .Example 1.1.1. Examples of% „for different:(i) If%_(+*-D?, then%P„contains.5D.5DD.5DDD.5DDDD.0H`HVH(ii) If%_(+*-D.5E@?, then%P„contains.5D.5EG.5DD.5DE-.5È D.5E3EG.5D"DD.5D"DEG.>DÈ D.7DÈ EG.>È D"D.7EaDE@.5EaEaD.7EaÈ E@.0HVHVH(iii) If (the empty set — the unique set with no elements), thencontaining the null string.%—(˜%•„š(›*, the set just


£„œœtt<strong>Regular</strong> expressions over an alphabet $œif <strong>and</strong> Ÿ are regular expressions, then so is¡ Ÿz¢binds more tightly than £¤£ , binds more tightly than £means Ž˜Ž¡%˜, not ŽŽ˜%%, or Ž>ŽR%(%4 1 REGULAR EXPRESSIONS1.2 Pattern matchingSlide 5 defines the patterns, or regular expressions, over an alphabet that we will use.Each such regular expression, , represents a whole set (possibly an infinite set) of strings%P„in that match . <strong>The</strong> precise definition of this matching relation is given on Slide 6. Itmight seem odd to include a regular expression that is matched by no strings at all—but itis technically convenient to do so. Note that the regular ¡ expression is in fact equivalent to, in the sense that a q string ¡ matches).˜ „˜ „iff it matches ¡ (iff qœeach symbol ~ o_$ is a regular expressionis a regular expressionjœž is a regular expressionif <strong>and</strong> Ÿ are regular expressions, then so is Ÿœif is a regular expression, then so ¢ b isEvery regular expression is built up inductively, by finitely manyapplications of the above rules.(N.B. we assume ¡ ,, Ž , ,R „, <strong>and</strong>are not symbols in.)Slide 5Remark 1.2.1 (Binding precedence in regular expressions). In the definition on Slide 5we assume implicitly that the alphabet does not contain the six symbolsR „<strong>The</strong>n, concretely speaking, the regular expressions over form a certain set of strings overthe alphabet obtained by adding these six symbols to . However it makes things morereadable if we adopt a slightly more abstract syntax, dropping as many brackets as possible<strong>and</strong> using the convention thatSo, for example,tR¦¥V§5„tR¦¥§ „tR¦¥§ „tR¨¥0§> „ , etc.£ .VŽ


%q&q­œœœœœœt?two strings, k i n| , with n matching <strong>and</strong> | matching Ÿ,(¡Žtt„(1.2 Pattern matching 5Matching strings to regular expressionsk matches ~ oS$ iff k i—~no string matchesk matches j iff k i jk matches ¡ Ÿ iff k matches eitheror Ÿk matchesŸ iff it can be expressed as the concatenation ofk matches b iff either k i j , or k matches , or k can beexpressed as the concatenation of two or more strings, eachof which matchesSlide 6<strong>The</strong> definition of ‘q matchest „’ on Slide 6 is equivalent to sayingfor ©«ª A¬H0H0H someq , can be expressed as a concatenation © of q strings,, where each q® matches .q­©qq<strong>The</strong> case just means thatthat matches (so any string matching also matches<strong>and</strong> , then the strings matching aret…(rDE(r,t „(so always t „¡ matches¡t„); <strong>and</strong> the © case). For example, ifjust means(¯/%e(°*-D.5EG.5F@?.5DEG.5DEaDE-.5DÈ DÈ DEG.HetcNote that we didn’t include a regular expression ‘± for the ’ occurring in the %²( UNIXexamples on Slide 1. However, once we know which alphabet we are referring to,say, we can get the ± effect of using the regular expression*-D&`.5DAG.0H0H0H3.5DD&vR¨DA"R5H0H0H@R¨Dwhich is indeed matched by any string inin ).%•„(becauseD&vR¨DA"R5H0H0H@R¨D­ is matched by any symbol­8


t¾œœœœœExamples of matching, with $ í ³ ]lvµœž µ ¡ ] is just matched by ]t­tt,–t¥­tW&(¡t­Ž,Htt¾6 1 REGULAR EXPRESSIONS] ¡ µ is matched by each symbol in $µ8 ·] ¡ µ"¢7b is matched by any string in $¸b that starts with a ‘µ ’a X] ¡ µ"¢0 ·] ¡ µ"¢3¢ b is matched by any string of even length in $ b·] ¡ µ"¢ b X] ¡ µ"¢ b is matched by any string in $ b¡ ]¢0 Xj ¡ µ"¢ ¡ µ8µ is matched by just the strings j , ] , µ , ]µ , <strong>and</strong>Xjµ8µSlide 7Notation 1.2.2. <strong>The</strong> notation is quite often used for what we write as .<strong>The</strong> notation , ©+ª for , is an abbreviation for the regular expression obtained by© concatenating copies of . Thus:tR¦¥ò »½¼¹ t(¿tò »½¼q Thus q matches iff matches ©pª for some tvt"„ .We use as an abbreviation for q . Thus matchesconcatenation of one or more strings, each one matching .t „iff it can be expressed as the­z¾1.3 Some questions about languagesSlide 8 defines the notion of a formal language over an alphabet. We take a very extensionalview of language: a formal language is completely determined by the ‘words in thedictionary’, rather than by any grammatical rules. Slide 9 gives some important questionsabout languages, regular expressions, <strong>and</strong> the matching relation between strings <strong>and</strong> regularexpressions.


d-fhg i ³ kÄoÅ$ b ¡ k matches 1.3 Some questions about languages 7<strong>Languages</strong>A (formal) language À over an alphabet $ is just a set of stringsin $cb .Thus any subset À^Á$Âb determines a language over $ .<strong>The</strong> language determined by a regular expression over $ isÀà ¢Two regular expressions<strong>and</strong> Ÿ (over the same alphabet) areequivalent iff Àà ¢ <strong>and</strong> Àà ZŸz¢ are equal sets (i.e. have exactly thesame members).Slide 8Some questions(a) Is there an algorithm which, given a k string <strong>and</strong> a regularexpression (over the same alphabet), computes whether ork not matches ?(b) In formulating the definition of regular expressions, have wemissed out some practically useful notions of pattern?(c) Is there an algorithm which, given two regular expressionsŸ <strong>and</strong> (over the same alphabet), computes whether or notthey are equivalent? (Cf. Slide 8.)(d) Is every language of the form À¸ ¢ ?Slide 9


*qR%%t%t/­E­Rttt%8 1 REGULAR EXPRESSIONS<strong>The</strong> answer to question (a) on Slide 9 is ‘yes’. Algorithms for deciding such patternmatchingquestions make use of finite automata. We will see this during the next few sections.If you have used the UNIX utility ÆÇÈMÉ , or a text editor with good facilities for regularexpression based search, like È@ÊËÌÍ , you will know that the answer to question (b) on Slide 9is also ‘yes’—the regular expressions defined on Slide 5 leave out some forms of patternthat one sees in such applications. However, the answer to the question is also ‘no’, in thesense that (for a fixed alphabet) these extra forms of regular expression are definable, upto equivalence, from the basic forms given on Slide 5. For example, if the symbols of thealphabet are ordered in some st<strong>and</strong>ard way, it is common to provide a form of pattern fornaming ranges of symbols—for example ÎÏË £ÑÐ"Ò might denote a pattern matching any lowercaseletter. It is not hard to see how to define a regular expression (albeit a rather long one)which achieves the same effect. However, some other commonly occurring kinds of patternare much harder to describe using the rather minimalist syntax of Slide 5. <strong>The</strong> principleexample is Ó}Žcomplementation, :iffq does not match.ÓÔŽ £%•„%•„Õ'ŽIt will be a corollary of the work we do on finite automata (<strong>and</strong> a good measure of its power)that every pattern making use of the complementation operation can be replaced byan equivalent regular expression just making use of the operations on Slide 5. But why dowe stick to the minimalist syntax of regular expressions on that slide? <strong>The</strong> answer is that itreduces the amount of work we will have to do to show that, in principle, matching stringsagainst patterns can be decided via the use of finite automata.<strong>The</strong> answer to question (c) on Slide 9 is ‘yes’ <strong>and</strong> once again this will be a corollary ofthe work we do on finite automata. (See Section 5.3.)Finally, the answer to question (d) is easily seen to be ‘no’, provided the alphabetcontains at least one symbol. For in that case is countably infinite; <strong>and</strong> hence the number oflanguages over , i.e. the number of subsets of is uncountable. (Recall Cantor’s diagonalargument.) But since is a finite set, there are only countably many regular expressionsover . (Why?) So the answer to (d) is ‘no’ for cardinality reasons. However, even amongstthe countably many languages that are ‘finitely describable’ (an intuitive notion that we willnot formulate precisely) many are not of the form for any regular expression . Forexample, in Section 5.2 we will use the ‘Pumping Lemma’ to see thatq matches ÓÔŽ*-D,?is not of this form.1.4 Exercises©ÖªExercise 1.4.1. Write down an ML data type declaration for a type ×>ËyÇÈzÆzØÙzÉconstructorwhose values correspond to the regular expressions over an ×>Ë alphabet .Exercise 1.4.2. Find regular expressions over*-,.0/M?that determine the following languages:(a)q contains an even number of?’s


*qR*¡?%,¡?%%1.4 Exercises 9(b)q contains an odd number of?’sExercise 1.4.3. For which alphabetsalphabet?is the set% „of all finite strings overitself anExercise 1.4.4. If an alphabet contains only one symbol, what does look %Å(Ú*like? %…„ (Answer:see Example 1.1.1(i).) So if is the set consisting of just the null string, what is ?(<strong>The</strong> answer is not .) Moral: the punctuation-free notation we use for the concatenation oftwo strings can sometimes be confused with the punctuation-free notation we use to denotestrings of individual symbols.Tripos questions 1999.2.1(s) 1997.2.1(q) 1996.2.1(i) 1993.5.12%…„


10 1 REGULAR EXPRESSIONS


%ÞÝÞÞWÝ&AÝNÝÞ%112 <strong>Finite</strong> State MachinesWe will be making use of mathematical models of physical systems called finite automata,or finite state machines to recognise whether or not a string is in a particular language.This section introduces this idea <strong>and</strong> gives the precise definition of what constitutes a finiteautomaton. We look at several variations on the definition (to do with the concept ofdeterminism) <strong>and</strong> see that they are equivalent for the purpose of recognising whether or nota string is in a given language.2.1 <strong>Finite</strong> automataExample of a finite automatonÛvÜÛßÛMàÛváStates: ÛvÜ , Ûß , ÛMà , Ûvá .Input symbols: ~ , € .Transitions: as indicated above.Start state: .ÛvÜAccepting state(s): Û á .Slide 10<strong>The</strong> key features of this abstract notion of ‘machine’ are listed below <strong>and</strong> are illustratedby the example on Slide 10.example there are four states, labelled â, â, â, <strong>and</strong> â.¢<strong>The</strong>re are only finitely many different states that a finite automaton can be in. In theWe do not care at all about the internal structure of machine states. All we care about¢is which transitions the machine can make between the states. A symbol from somefixed alphabet is associated with each transition: we think of the elements ofas input symbols. Thus all the possible transitions of the finite automaton can bespecified by giving a finite graph whose vertices are the states <strong>and</strong> whose edges have


âW&âD&âWN%çââNAâ(WEN12 2 FINITE STATE MACHINESboth a direction <strong>and</strong> a label (drawn frompossible transitions from â state are). In the example%ã(^*-D.5E-?<strong>and</strong> the only&åäæ£<strong>and</strong>&åç£æAvHIn other words, in â state the machine can either input the symbol <strong>and</strong> enter state, or it can input the symbol <strong>and</strong> enter â state . (Note that transitions from a stateback to the same state are allowed: â e.g. in the example.)£æ. In the graphicalrepresentation of a finite automaton, the start state is usually indicated by means ofa unlabelled arrow.¢<strong>The</strong>re is a distinguished start state. 1 In the example it is â<strong>The</strong> states are partitioned into two kinds: accepting states 2 <strong>and</strong> non-accepting states.¢In the graphical representation of a finite automaton, the accepting states are indicatedby double circles round the name of each such state, <strong>and</strong> the non-accepting states areindicated using single circles. In the example there is only one accepting â state, ; theother three states are non-accepting. (<strong>The</strong> two extreme possibilities that all states areaccepting, or that no states are accepting, are allowed; it is also allowed for the startstate to be accepting.)<strong>The</strong> reason for the partitioning of the states of a finite automaton into ‘accepting’ <strong>and</strong>‘non-accepting’ has to do with the use to which one puts finite automata—namely to recognisewhether or not a q string is in a particular language ( subset of ). q Given we„% „‘e%begin in the start state of the automaton <strong>and</strong> traverse its graph of transitions, using up thesymbols q in in the correct order reading the string from left to right. If we can use up all thesymbols q in in this way <strong>and</strong> reach an accepting state, q then is in the language ‘accepted’(or ‘recognised’) by this particular automaton; q otherwise is not in that language. This issummed up on Slide 11.1 <strong>The</strong> term initial state is a common synonym for ‘start state’.2 <strong>The</strong> term final state is a common synonym for ‘accepting state’.


âââcase ©WAl„„„âNâSó £ æç :£æD„„((RââNAŽ(ŽDt2.1 <strong>Finite</strong> automata 13·èé¢ , language accepted by a finite automaton èÀÃconsists of all k strings over its alphabet of input symbolssatisfying b with the start state <strong>and</strong> some acceptingÛ ÛvÜ Û ëì ÛvÜyêstate. HereÛ Üëì b Û êmeans, if ki—~ ß ~ à Gv ~ísay, that for some statestransitions of the forml Û í iÛ(not necessarily all distinct) there areÛ ßGvl Û àÛ Üë7ì Û ß Ý@ï ë7ì Û à Ý@ð ë5ìÝMîÝ@ò ëzì Û í i Û ñGñvñN.B.case ©: ââvô .(r,âvô iff ââvôç£æ(¯/âvô iff âSlide 11Example 2.1.1. õ Let be the finite automaton pictured on Slide 10. Using the notationintroduced on Slide 11 we have:Wžçaç`ç`äæ £·£8£(soDDDEC‘ÕPŽ õ^ )çaäöç`çæ £·£8£(soDD“÷ ‘ DÈÕPŽ õ^ )äöçaç`çæ £·£8£(no conclusion about Õ'Ž õ^ ).â iff âIn fact in this caseâ iff âq contains three consecutive?zH’sâV® (ø (For ) corresponds to the state in the process of reading a string in which theø last symbols read were alldetermined bythe regular expressionõ^ (+* q ÕPŽ(ù,.0/1.32’s.) So ÕPŽ õ^ coincides with the language Õ'Žt…(D{R E „ DDDD{R¨E „(cf. Slide 8).


œœœDââi.e. in õâD©, no state can be reached from â&&DWE‘14 2 FINITE STATE MACHINESA non-deterministic finite automaton (NFA), è ,is specified byœa finite set úJû>üûZý-þMÿ (of states)a finite set $ ÿ(the alphabet of input symbols)for each Û oSúJû>üûZý-þ ÿ <strong>and</strong> each ~ o_$ ÿ , a subsetÿÛ l ~ ¢'ÁùúJû>üûZý-þ ÿ (the set of states that can bereached from Û with a single transition labelled ~ )an element ŸMÿ oãúJûZüûZý-þMÿ (the start state)œa ¡£¢¤¢Vý¦¥”û ÿ Áùú!ûZüûZý-þMÿ subset (of accepting states)Slide 122.2 Determinism, non-determinism, <strong>and</strong> § -transitionsSlide 12 gives the formal definition of the notion of finite automaton.¨©Note ç that the functiongives a precise way of specifying õthe allowed â £æâ ô transitions â ô¨©of .5D, via: iffâ . Ž<strong>The</strong> reason for the qualification ‘non-deterministic’ on Slide 12 is ‘because D“‘ž% in©general,‹v‹·‡for each state <strong>and</strong> each input symbol , we allow the possibilities thatâthere are no, one, or many states that can be reached in a â single transition labelled from ,has no, one, or many õelements. For example, ifcorresponding to the cases that ¨© Ž âis the NFA pictured on Slide 13, then.5D&0.5E(r˜with a transition labelled;¨©Ž ⨩Ž âlabelled&0.5D (+*;AG?i.e. in õ, precisely one state can be reached from âwith a transition¨©Ž âi.e. õ intransition labelled .WM.5D (+*WM.&`?, precisely two states can be reached from âwith a


ÝÞ³ kUo ³"~ l € b ¡ k contains three consecutive ~ ’sâHÝÞ2.2 Determinism, non-determinism, <strong>and</strong> ¡ -transitions 15Example of a non-deterministic finite automatonInput alphabet: ³"~ l € .States, transitions, start state, <strong>and</strong> accepting states as shown:Û ÜÝ Û ß Ý Û à Ý Û á<strong>The</strong> language accepted by this automaton is the same as for theautomaton on Slide 10, namelySlide 13When each subset has exactly one element we say that õ is deterministic.This is a particularly important case <strong>and</strong> is singled out for definition on Slide 14.¨©Ž â.5D<strong>The</strong> finite automaton pictured on Slide 10 is deterministic. But note that if we *-D.5EG.5F-? took thesame graph of transitions but insisted that the alphabet of input symbols WM.5F was (#˜ say,then we have specified an NFA not a DFA, since Ž âfor example . <strong>The</strong> moral ofthis is: when specifying an NFA, as well as giving the graph of state transitions, it is importantto say what is the alphabet of input symbols (because some input symbols may not ¨©appear inthe graph at all).When constructing machines for matching strings with regular expressions (as we willdo in Section 3) it is useful to consider finite state machines exhibiting an ‘internal’ formof non-determinism in which the machine is allowed to change state without consuming anyinput symbol. One calls such transitions ¡ -transitions <strong>and</strong> writes them asâ ôó £ æThis leads to the definition on Slide 15. Note NFAó that õ in an , , we ¡always % assume©thatis not an element of the alphabet of input symbols.


ÝÞÛëìÛ Ý Û i Mÿ Û l ~ ¢iffÛÞÞÝÞ16 2 FINITE STATE MACHINESA deterministic finite automaton (DFA)is an NFA è with the property that for each Û oSúJû>üûZý-þ ÿ <strong>and</strong>~ o_$ ÿ , the finite set ÿÛ l ~ ¢ contains exactly oneelement—call it ÿÛ l ~ ¢ .Thus in this case transitions è in are essentially specified by anext-state 1ÿ function, , mapping each (state, input symbol)-pairl ~ ¢ to the unique state ÿ Û l ~ ¢ which can be reached from ÛÛby a transition labelled :~Slide 14An NFA j with (NFA -transitions )is specified by an è NFA together with a binary relation, calledthe j -transition relation, on the set ú!ûZüûZý-þzÿ. We writeÛto indicate that the pair of statesÛ l Û ¢ is in this relation.ë ì Example (with input alphabet = ³"~ l € ):Ûß Ý ÛMà Ý ÛváÛ ÜÛÛÛÛSlide 15


EÛÛÛQ ê iff or there is a sequence Û ë ì Ûmore -transitions Û in from to i Û Û è j ÛÝÞ Û Ý(for l € oÅ$…ÿ ) iff Û ~Q©W#!QWñÞDôŽ©2.3 A subset construction 17·èé¢ , language accepted NFA èÀà by anconsists k of all strings $ ÿover the alphabet of input symbolswith the initial state <strong>and</strong> some acceptingÛ ÛvÜ Ûsatisfying ÛvÜstate. Here ñ£ñ is defined by:ñGñvñ Û of one orÛ Û (for ~ oÅ$…ÿ ) iff Û Ý ëìñ ññ ìë<strong>and</strong> similarly for longer stringsÝ ëìñ ñÛ When using an NFAó õSlide 16to accept a string qof input symbols, we are interested in„ ‘“%sequences of transitions in which the symbols q in occur in the correct order, but with zeroor ¡ more -transitions before or after each one. We write#â ô â"!to indicate that such a sequence exists from â state to âzô state in NFAó the . <strong>The</strong>n, by definitionis accepted by the NFAó õ iff â â holds for â the start state <strong>and</strong> â some accepting state:qsee Slide 16. For example, for NFAó the on Slide 15, it is not too hard to see that the languageaccepted consists of all strings which either contain two consecutive ’s or contain twoconsecutive ’s, i.e. the language determined by the regular Ž expression .DR¨E „DD{R È ED{R E „`Ž2.3 A subset constructionNote that every DFA is an NFA (whose transition relation is deterministic) <strong>and</strong> that everyNFA is an NFAó (whose ¡ -transition relation is empty). It might seem that non-determinism<strong>and</strong> ¡ -transitions allow a greater range of languages to be characterised as recognisable by afinite automaton, but this is not so. We can use a construction, called the subset construction,to convert an NFAó õ into a DFA $cõ accepting the same language (at the expense ofincreasing the number of states, possibly exponentially). Slide 17 gives an example of thisconstruction. <strong>The</strong> name ‘subset construction’ refers to the fact that there is one state of $Âõfor each subset of the setis a transitionç£æ‹%v‹·‡of states õ ofjust in case. Given two subsets, thereâMô -states reachable fromQ”.3QT&‹%v‹X‡'ô in $Âõô consists of all the õ


Qç#ÝÞ*ââÝâDÛvÜ ³ ÛGÜ l Ûß l ÛMà ³ ÛMà ³Ûß ³ Ûß ³ÛMà ³ ÛMà ³ÛvÜ l Û"ß ³ ÛGÜ l Ûß l ÛMà ³ ÛMà ³Û ß l Û à ³ Û ß ³ Û à ³W*âââW£ æóââA*ââ©AâQâ*â©18 2 FINITE STATE MACHINESâ states in via ( thevia finitely ¡ many -transitions followed by an-transitions. ¡â ô in õ-transition followed by finitely many( relation defined on Slide 16, i.e. such that we can get from â toExample of the subset construction*mÿ ) ~ €è )Û ßÛvÜl Û à ³ Û Ül Û ßl Û à ³ Û à³ Û ÜÛ àÛ ßl Û à ³ Û Ül Û ßl Û à ³ Û àl³ Û ÜSlide 17By definition, the start state of $cõis the ‹%v‹X‡'subset of whose elements are QrT+ the; <strong>and</strong> ‹%v‹·‡a subsetis an. Thus in the example<strong>and</strong>¡ õ$Âõ õ WM. &V. AG?states reachable by -transitions from the start state ofaccepting state of iff some accepting state of is an element ofon Slide 17 the start state isç£æäæ£DEP‘Õ'Ž õÚ because in õ : âDEP‘Õ'Ž,$cõ^ because in $Âõ :WM.&0.A@?ç£æWM.&V.AG?_äæ£AG?zHIndeed, in this case Õ'Ž õ^ (language in this case is no accident, as the <strong>The</strong>orem on Slide 18 shows. That slide also givesthe definition of the subset construction in general.D „ E „ (Õ'ŽÕ'Ž,$cõ^ . <strong>The</strong> fact that õ <strong>and</strong> $Âõ accept the same


õœœœëd@fZg i ³0/Ú¡1/ Á°úJûZüûZý-þ ÿ d@fZg i ³ Û ¡ Ÿvÿ Û ¡£¢¤¢Vý¦¥”û *mÿ d@fZg iœoãúJûZüûZý-þ *mÿ ¡02 Û o / Û o3¡£¢¤¢Vý¦¥”û ÿ ¢a³0/Ž©¥ ©¥E; ©so âso âV­ââ‘GF¤; © &© A•‘GF¤;‘GF¤; ©ŽŽŽQŽQ­‘©H‘#ó2.3 A subset construction 19<strong>The</strong>orem. For each NFABè there is a DFA -šè with thesame alphabet of input symbols <strong>and</strong> accepting exactly the samestrings è as , i.e. Àà .-šè¿¢ i Àà ·èé¢withDefinition -šè of (refer to Slide 12):úJû>üûZý-þ *mÿ$ *mÿ d-fhg i$ ÿœ / Ýì / in -šè iff / i *mÿ_ / l ~ ¢ , where/ l ~ ¢)d@fZg i ³ Û ¡12 Û o / Û Ý *mÿ_Û in è¿¢`Ÿ* ÿSlide 18To prove the theorem on Slide 18, given any NFAó õ we have to show that Õ'Ž õ^ (Proof õ^ Õ'Ž4$Âõ^ that . Consider the case ¡ of first: ¡‘658797if ¥D , hence ‡:‹Õ'Ž4$Âõ^ . We split the proof into two halves.¥ ©, â then Õ'Ž õ^ for. Now given any non-Õ'Ž,$cõ^â somenull q stringof the form<strong>and</strong> thus ¡­ , if q is accepted by õ then there is a sequence of transitions in(1)ç?>#çA@ &#çAB#‘C5D77Since it is deterministic, feedingD&7DA H0H0HZD­ to $cõ results in the sequence of transitions(?(?(âV­‡:‹(2)ç?>æ £çA@æ £ç¤Bæ £ˆ£Q &where(1) we deduceQ &O(&F¤; ©¥E; © .5D&,QAP(&F¤; ©Q”&0.5DA, etc. By definition ofF?; ©(?(?((Slide 18), from¥E; © .5D&(uQ”&Q”&0.5DA(uQ{AH0H0H&@.5D­ (uQ ­­IH


Õ(QQQ‘­&QŽ© ><strong>and</strong> âA&õAŽ©©©&©&Q‘A­%#óQ­ââ&QŽâA©&20 2 FINITE STATE MACHINES<strong>and</strong> hence$cõ by .‘J58797‡:¥‹; ©(because âV­‘J58797‡%:‹). <strong>The</strong>refore (2) shows that q is acceptedProof T © ‘ ¥;that . Consider the case ¡ of first: ¡ ÕPŽ,$ÂõÚ if , 58797then; ÕPŽ,$ÂõÚ ‘e¥­IHsince©âV­IH#âV­OHwith .­IH­IH ­OH­IH âV­IH‘J58797‡%:‹2.4 Summary<strong>The</strong> important concepts in Section 2 are those of a deterministic finite automaton (DFA) <strong>and</strong>the language of strings that it accepts. Note that if we know that a Õ language is of the formõÚ for some DFA õ , then we have a method for deciding whether or not any givenÕ'Žq string (over the alphabet Õ of ) is Õ in or not: begin in the start state õ of <strong>and</strong> carry outthe sequence of transitions given by q reading from left to right (at each step the next state isuniquely determined õ because is deterministic); if the final state reached is accepting, thenis in Õ , otherwise it is not.qWe also introduced other kinds of finite automata (with non-determinism ¡ <strong>and</strong> -transitions) <strong>and</strong> proved that they determine exactly the same class of languages as DFAs.2.5 ExercisesExercise 2.5.1. For each of the two languages mentioned in Exercise 1.4.2 find a DFA thataccepts exactly that set of strings.Exercise 2.5.2. <strong>The</strong> example of the subset construction given on Slide 17 constructs a D„aE@„ DFAwith eight states whose language of accepted strings happens Õ'Ž to be . Give a DFAwith the same language of accepted strings, but fewer states. Give an NFA with even fewerstates that does the same job.Exercise 2.5.3. Given õ a DFA , construct a õ^ô new DFA with the same alphabet of % ©‘Ú%'„inputsymbol <strong>and</strong> with the property that q©for q all , is õ¯ô accepted q by iff is notõ accepted by .Exercise 2.5.4. õGiven two DFAs with the same alphabet of ‘ù% input „symbols,construct a õthird such DFA with qthe property that õis accepted by iff it isõaccepted õby both <strong>and</strong> of&V.. [Hint: take the states of õ to be ordered pairs Ž â&0.states with â© @.]&C‘SA'‘TTripos questions 2001.2.1(d) 2000.2.1(b) 1998.2.1(s) 1995.2.19 1992.4.9(a)‹%v‹X‡'‹v‹·‡


õ(%(tt%t%(%(%ttt213 <strong>Regular</strong> <strong>Languages</strong>, ISlide 19 defines the notion of a regular language, which is a set of strings of the form ÕPŽ õ^for some õ DFA (cf. Slides 11 <strong>and</strong> 14). <strong>The</strong> slide also gives the statement of Kleene’s<strong>The</strong>orem, which connects regular languages with the notion of matching strings to regularexpressions introduced in Section 1: the collection of regular languages coincides with thecollection of languages determined by matching strings with regular expressions. <strong>The</strong> aim ofthis section is to prove part (a) of Kleene’s <strong>The</strong>orem. We will tackle part (b) in Section 4.DefinitionA language is regular iff it is the set of strings accepted by somedeterministic finite automaton.Kleene’s <strong>The</strong>orem(a) For any regular expression(cf. Slide 8)., À¸ ¢ is a regular language(b) Conversely, every regular language is the form Àà ¢ forsome regular expression .Slide 193.1 <strong>Finite</strong> automata from regular expressionsõ‘“%…„q q q t (õ Õ'Ž ÕPŽ õ^NFAóVUÕPŽ U Õ'ŽU$ U ÕPŽ õ^ (ÕPŽ,$ U ÕPŽ U Õ'Ž¡NFAóGiven a regular expression , over an alphabet say, we wish to construct a DFA withalphabet of input symbols <strong>and</strong> with the property that for each , matches iff isaccepted by —so that .Note that by the <strong>The</strong>orem on Slide 18 it is enough to construct an with theproperty . For then we can apply the subset construction to to obtain a DFAwith . Working with finite automata thatare non-deterministic <strong>and</strong> have -transitions simplifies the construction of a suitable finiteautomaton from .Let us fix on a particular alphabet <strong>and</strong> from now on only consider finite automatawhose set of input symbols is . <strong>The</strong> construction of an for each regular expressionover proceeds by recursion on the syntactic structure of the regular expression, as follows.


Thus Õ'ŽThus Õ'Žfor all ©åªõ,7&&7õAõAA&AqõõAA&qRRqq‘D,&&or q<strong>and</strong> q<strong>and</strong> each q®&&‘˜<strong>and</strong> Õ'Ž7<strong>and</strong> ÕPŽ‘ÕPŽ õ^ ?zH, <strong>and</strong> for all regular expressions of size `Å© , there exists an NFAóR?˜„õõAAAAt22 3 REGULAR LANGUAGES, IDU‘y%(i) For each atomic form of regular expression, ¡ ( ), , <strong>and</strong>accepting just the strings matching that regular expression., we give an NFAó(ii) Given NFAó any õ sproperty<strong>and</strong> õ, we construct a new NFAó , W!‰YX[ZG‰”Ž õ&V.with the&V.A ?zHÕPŽW!‰YX[Zv‰!Ž õÕPŽ õ> (+* q.Õ'Ž õtGA (t1&vR t-A (&V.t1& (ÕPŽ õÕ'ŽW!‰YX[ZG‰”Ž õ> when Õ'ŽÕ'Ž õ(iii) Given NFAó any õ sproperty<strong>and</strong> õ, we construct a new NFAó , \]ZG‰v‹VŽ õ&V.with the& ‘A ?zHA'‘&0.AÃRÕPŽ¤\^ZG‰v‹3Ž õÕ'Ž õ> (Ú* q.Õ'Ž õt-A (t1&>t-A (&V.tz& (> when Õ'ŽÕPŽ¤\^Zv‰ÕPŽ õv‹`Ž õÕ'Ž õ(iv) Given any NFAó õ , we construct a new NFAó ,‹_Ž õ^ with the propertyA”H0H0Hq­Õ'Ž‹_Ž õÚ> (+* q©¤ª‹%_Ž õ^> when Õ'ŽÕPŽThus Õ'Žtz„ (( tõÚ . Õ'ŽThus starting with step (i) <strong>and</strong> applying the constructions in steps (ii)–(iv) over <strong>and</strong> overagain, we eventually NFAó build s with the required property for every regular expression .Put more formally, one can prove the statementsuch that ÕPŽt (ÕPŽ õ^by mathematical induction © on , using step (i) for the base case <strong>and</strong> steps (ii)–(iv) for theinduction steps. Here we can take the size of a regular expression to be the number ofoccurrences of (£ union ) in it.£ ), concatenation (£Ñ£ ), or star (£Step (i) Slide 20 gives NFAs whose languages of accepted strings are Õ'Žrespectively(any, Õ'Ž <strong>and</strong> .*-D?D’‘“%), Õ'Ž ¡M (¯* ¡(r˜D (


õ&W£ æóA‘õõjust accepts the one-symbol string ~just accepts the null string, jAõA‘&A#WWâ&&‘W&AõAAõ‘A&Rq&W#!‘#!Aâ&&&&Aor qõõAAA‘‘Aõ&WAõAAA3.1 <strong>Finite</strong> automata from regular expressions 23NFAs for atomic regular expressionsÛvÜ Ý ÛßÛvÜÛvÜaccepts no stringsSlide 20NFAó õ õW!‰YX[ZG‰”Ž © õ© > ‹%v‹·‡ ‹%v‹·‡ @ &V.õ W!‰YX[ZG‰”Ž õ &V.õWJ‰aX[ZG‰”Ž ââõõ&V.õõ¡ õõ õâStep (ii) Given s <strong>and</strong> , the construction of is pictured onSlide 21. First, renaming states if necessary, we assume that <strong>and</strong> aredisjoint. <strong>The</strong>n the states of are all the states in either or , togetherwith a new state, called say. <strong>The</strong> start state of is this <strong>and</strong> its acceptingstates are all the states that are accepting in either or . Finally, the transitions ofare given by all those in either or , together with two new -transitions out of to the start states of <strong>and</strong> .W!‰YX[ZG‰”Ž õq ÕPŽ âõ‡:‹© >¥ © &V.!ÕPŽW!‰YX[ZG‰”Ž â > qõõ&V.‘J58797ÕPŽ Õ'Ž q ‘J5D77õ> õâ‡:‹©‡%:‹@ ©â> â â¡â&0.õ> qq qThus if , i.e. if we havefor some , then weget showing that . Similarly for . Socontains the union of <strong>and</strong> . Conversely if is accepted by, there is a transition sequence with or .Clearly, in either case this transition sequence has to begin with one or other of the -transitions from , <strong>and</strong> thereafter we get a transition sequence entirely in one or other ofor finishing in an acceptable state for that one. So if , theneither or . So we do indeed haveõ &`. ÕPŽW!‰YX[ZG‰”Žõ W!‰YXZG‰!Ž¥ © >&V.& ‘b58797ÕPŽW!‰YX[Zv‰JŽ õÕPŽ õÕPŽ õ&V.A ?zHÕPŽW!‰YX[ZG‰”Ž õ> (+* qÕ'Ž õÕ'Ž õ


77&õ&AA7õA&AA&©@ ¥#! @AõA7A Ÿvÿ ï è⥠>© #! >&AAõ&AAâqA‘A&Aq&©@ ¥#! @&qA7âàA&õ77<strong>and</strong> qAõ7õA&(Aq7#&õAq>AâA&&õ&A&&24 3 REGULAR LANGUAGES, Ic^dfe%gId·è ß l`è à ¢Ÿ ÿ î è ßÛ ÜSet of accepting states is union of ¡£¢A¢Vý¦¥”û ÿ î <strong>and</strong> ¡£¢¤¢Vý¦¥”û ÿ ï .Slide 21Step NFAó (iii)õGiven õ s <strong>and</strong> , the \^ZG‰ M‹`Ž õconstruction ofSlide 22. First, renaming states if necessary, we ‹%v‹·‡assume &V.thatdisjoint. <strong>The</strong>n the \^Zv‰ states of are all the states õ in eitheris the start õ state of&`.or õ© ><strong>and</strong>is pictured on. <strong>The</strong> start‹%v‹·‡© @arestate of \^ZG‰. <strong>The</strong> accepting states of \^Zv‰õ\^Zv‰ v‹VŽ õõ õõqqare the accepting states of . Finally, the transitions ofthose in either orthe start state of (only one such new transition is shown in the picture).Thus if <strong>and</strong> , there are transition sequencesv‹VŽ õ&c‘&V.are given by allto, together with ¡ new -transitions from each accepting state õ of¥ © AÑ! >v‹VŽ õin õ&V.v‹`Ž õ&V.with â© >, <strong>and</strong>in õwith â© @. <strong>The</strong>se combine to yield&C‘J5D77Õ'Ž õÕPŽ õA…‘J5D77‡:‹‡:‹ó £ æ&`.&V.õ in witnessing the fact q that is accepted \]ZG‰ v‹`Ž õ by &`.\^ZG‰ v‹aŽÕPŽ¤\^ZG‰ v‹aŽ õtransition sequence witnessing s the fact that is accepted starts out in õthe states of. Con-is of this form. For any>but. At some point in the sequence one of the newversely, it is not hard to see that every sfinishes in the disjoint set of states of õto õ<strong>and</strong> thus we can split s as swith qaccepted by õ<strong>and</strong> qaccepted by õ¡ -transitions occurs to get from õ. So we do indeed have& ‘A ?zHA•‘&V.A¸RÕ'Ž¤\^ZG‰Õ'Ž õM‹aŽ õ> (+* qÕ'Ž õ


WWèß Ÿ ÿ ï è&qWR,<strong>and</strong> each q®à‘WWÕ'Ž õÚ ?zH3.1 <strong>Finite</strong> automata from regular expressions 25h gOd¢VüûG ·è ß l`è à ¢Ÿ ÿ îSet of accepting states is ¡£¢A¢Vý¦¥”û ÿ ï .Slide 22NFAó õ ‹%_Ž õÚõ^ õâ‹_Žõ^ ‹_Ž õÚ ‹%_Ž âõ¡ âõÚ õ‹%_Žõ âStep (iv) Given an , the construction of is pictured on Slide 23. <strong>The</strong>states of are all those of together with a new state, called say. <strong>The</strong> start stateof is <strong>and</strong> this is also the only accepting state of . Finally, the transitionsof are all those of together with new -transitions from to the start state of<strong>and</strong> from each accepting state of to (only one of this latter kind of transition is shownin the picture).Clearly,one or more strings accepted õ byâ of in a transition sequence witnessing this fact allow us to s split into the concatenation ofzero or more strings, each of which is accepted õ by . So we do indeed haveõ^ accepts ¡ (since its start state is accepting) <strong>and</strong> any concatenation of‹%_Ž. Conversely, s if is accepted ‹%_Ž õÚ by , the occurrencesA”H0H0HÕPŽ‹%_Ž õ^> (+* qq­©pª


t‘ ÷Û Ü Ÿvÿ èttt‘ttt26 3 REGULAR LANGUAGES, IúJû>üOi hè¿¢<strong>The</strong> only accepting úJû>üOi hè¿¢ state of is .Ü ÛSlide 23DR¨E „3DŽNFAó õ Õ'Ž õ^ ( D{R E „3DÕ'ŽZŽ¡This completes the proof of part (a) of Kleene’s <strong>The</strong>orem (Slide 19). Figure 1 shows howthe step-by-step construction applies in the case of the regular expression to producean satisfying . Of course an automaton with fewer states <strong>and</strong>-transitions doing the same job can be crafted by h<strong>and</strong>. <strong>The</strong> point of the construction is thatit provides an automatic way of producing automata for any given regular expression.3.2 Decidability of matching<strong>The</strong> proof of part (a) of Kleene’s <strong>The</strong>orem provides us with a positive answer to question (a)on Slide 9. In other words, it provides a method that, given any q string <strong>and</strong> regular expression, decides whether or q not matches . <strong>The</strong> method is:¢construct a DFA õ satisfying Õ'Ž õÚ ( Õ'Ž;¢ õ õâ õ õ qbeginning in ’s start state, carry out the sequence of transitions in correspondingto the string , reaching some state of (because is deterministic, there is aunique such transition sequence);otherwise q¢check whether â is accepting or not: if it is, then qÕ'Ž õÚ (, so q does not match.ÕPŽ õ^ (ÕPŽ, so q matches;Õ'ŽNote. <strong>The</strong> subset construction used to convert the NFAó resulting from steps (i)–(iv) ofSection 3.1 to a DFA produces an exponential blow-up of the number of states. ($Âõ has


Step of type (iv): ŽStep of type (iii): ŽDEçóóFigure 1: Steps in constructing an NFAó for Žóóóóóóóçäçäóóóóçäçä3.2 Decidability of matching 27Step of type (i):Step of type (i):Step of type (ii):D{R ED{R¨E „D{R E „3DDR¨E „aD


2­28 3 REGULAR LANGUAGES, Istates õ if © has .) This makes the method described above very inefficient. (Much moreefficient algorithms exist.)3.3 ExercisesExercise 3.3.1. Why can’t the automatonconstructed simply by õ taking <strong>and</strong> adding ¡ new -transitions back from each accepting stateto its start state?‹%_Ž õÚ required in step (iv) of Section 3.1 beExercise 3.3.2. Work through the steps in Section 3.1 to construct NFAó õ anõ^ ( R E „ DDE „. Do the same for some other regular expressions.Õ'ŽsatisfyingExercise 3.3.3. Show that any finite set of strings is a regular language.Õ'Ž>Ž ¡Tripos question 1992.4.9(b) [needs Slide 26 from Section 4]


Lemma Given an NFA è(In case t¥tR¥!˜„t, for each subset uéÁùú!ûZüûZý-þ ÿ˜©294 <strong>Regular</strong> <strong>Languages</strong>, II<strong>The</strong> aim of this section is to prove part (b) of Kleene’s <strong>The</strong>orem (Slide 19).4.1 <strong>Regular</strong> expressions from finite automataõ t (õ Õ'Ž Õ'Ž õÚ tjk¤l k9mâ ol k9m qGiven any DFA , we have to find a regular expression (over the alphabet of input symbolsof ) satisfying . In fact we do something more general than this, as describedin the Lemma on Slide 24. 1 Note that if we can find such regular expressions for anyto be the whole of<strong>and</strong> to be the start state, say, then by definition of , a string matches this regularchoice n of â , , â ô <strong>and</strong> , then the problem is solved. For n tj takingexpression iff there is £æa &0.0H0H0H3. transition sequenceâmany t accepting jstates,In other words o9lqk > (?(?(the (Ú,regular expressionof Kleene’s t<strong>The</strong>orem. (In caseempty <strong>and</strong> so we can use the regular expression. As âvô ranges over the finitely‹M‹X‡in õ âGôsay, then we match exactly all the strings accepted by õ .R t jâ?pksr has the property we want for part (b)ol, i.e. there are no accepting states õ in , Õ'Ž õÚ then is.)<strong>and</strong> each pair of states Û l Û oãúJûZüû>ý-þ ÿexpression w?xqwy satisfyingIv, there is a regularv w?xqw y ¢ í ³ kžoy >$ ÿ ¢ b ¡ Û+ê ëì b Û in èÀÃwith all intermediatestates of the sequencein u .Àà hè¿¢ i Àà ¢ Hence , whereß ¡ ñGñvñ ¡ z<strong>and</strong>{ ši number of accepting states,ii v }xqw¦~ with u i úJûZüûZý-þ ÿ ,|start state,i Ÿ| i€th accepting state.Û, taketo be the regular expression.)(r,Slide 24tjcan be constructed by induc-k¤lqk9mProof of the Lemma on Slide 24. <strong>The</strong> regular expressiontion on the number of elements in the n subset .1 <strong>The</strong> lemma works just as well whether is deterministic or non-deterministic; it also works forNFA‚ s, provided we replace ƒ „…‡† by ƒ ˆ(cf. Slide 16).


â.„Ž©(¿t jŽ' k%’‘*qR(„*qR„»½¼(ét j“'ò k%9‘k¤lqk.tW/*tâtŽâŽpp‘n/((»½¼( t j“'ò k.’‘k%Alqk%R(((.âW.DW‘Wò »½¼(¿t j“' k.’‘ t?”m k%Alqk/H/n30 4 REGULAR LANGUAGES, IIBase n case, is empty. In this case, for each pair of â statesexpression to describe the set of stringsâ ô , we are looking for a regular?zHwith no intermediate statesô ââ ! £æSo each element of this set is either a single input symbol¡ possibly , in â casecan simply take(if âholds in õ âGô. If there are no input symbols that take us from â to â1ô in õâMôif â‹Šç£æ) or, wet‰m k¤lqkâvôô . ⻽¼(ò¹ ˜On the other h<strong>and</strong>, if there are some such input symbols,D&0.0H0H0H3.5Dp say, we can take¡ if ât‰k¤lqk9mif â‹ŠRif â ¡âvô. âvô¹ D&vRR¨D(ºa»½¼D&vRR¨D(?(?(Induction step.Suppose we have defined the required regular expressions for all subsets(?(?(is a subset ©Â– with elements, choose some âWv?’(°*Wv?element. <strong>The</strong>n for any pair of states, by inductive hypothesis we have already constructed the regular expressionsof states © with elements. n If<strong>and</strong> consider © the -element n&Œ‘SsetâTŠW…‘âGô<strong>and</strong>‹%v‹·‡t1&ò »½¼. t-A. t@NConsider the regular expressionk¤lqk m(¿t1&MR t-Aºa»½¼t@N „ t?”1HClearly every string matchingis in the set?zHqt"&âb! £æâ ô âq ÕPŽt1&t’ªtP–t-AÕ'Žâ t t-N£ âât?” Õ'Žq t-A t@N â„3t?” ÕPŽâvôÕ'Žq Õ'Žk¤lqk9mConversely, if is in this set, consider the number of times the sequence of transitionspasses through state . If this number is zero then (by definition of). Otherwise this number is <strong>and</strong> the sequence splits into pieces: the first pieceis in (as the sequence goes from to the first occurrence of ), the next piecesare in (as the sequence goes from one occurrence of to the next), <strong>and</strong> the last pieceis in (as the sequence goes from the last occurrence of to ). So in this case is in. So in either case is in . So to complete the induction step we can defineto be this regular expression.tjt…(yt"&MR t-At@N „3t?”â ô with all intermediate states in this sequence in nâ ! £æ4.2 An examplePerhaps an example will help to underst<strong>and</strong> the rather clever argument in Section 4.1. <strong>The</strong>example will also demonstrate that we do not have to pursue the inductive construction of (r˜theregular expression to the bitter end (the n base case ): often it is possible to find some ofthe k¤lqk m regular expressions one needs by ad hoc arguments.t•j


tW A‘ & & llttW| x ] µ]j ~µ– ~~ b €¡ ~~ b € jW&lW A‘ & l‘tW &lA‘ W lltW A‘ & llWlA‘ & tttltW A‘ & llW A‘ & llWW A‘ W ll]ÝtWWtt–&WWl& ‘l‘‘ÝÞW& WlW& A ‘Wl& ‘„„ŽŽÞ–µ| x ] µ–/W&‘ t &l Ž2t AlWŽt AŽWA ‘‘‘tWÝWll&AWA & ‘WlA ‘‘WH‘H–,4.2 An example 31Note also that at the inductive steps in the construction of a regular expression õ forwe are free to choose which â state to remove from the current state n set . A good rule ofthumb is: choose a state that disconnects the automaton as much as possible.ExampleDirect inspection yields:Ü?˜Ü x à?˜0—1—~ b ~ b € ]µSlide 25As an example, consider the NFA shown on Slide 25. Since the start state is <strong>and</strong> thisis also the only accepting state, the language of accepted strings is that determined byAthelregular expression . Choosing to remove state from the state set, we havelWlAlAlAlAW (R t„ t(3) Õ'ŽlWlWDirect inspection shows ÕPŽ Õ'Žthat ÕPŽ <strong>and</strong>, we choose to remove state :, <strong>and</strong> ÕPŽlWÕ'ŽW (D8„& (ÕPŽD„aE. To calculateÕ'Ž„ t& (& R t„ tÕ'ŽÕ'ŽlAlAlWW (W R t<strong>The</strong>se regular expressions can all be determined by inspection, as shown on Slide 25. ThusÕ'ŽÕ'Ž& (R¨DE"R DD „ E<strong>and</strong> it’s not hard to see that this is equal to Õ'Ž ¡R DD„aE; <strong>and</strong>ÕPŽÕPŽ ¡Ž ¡M>W (˜R¨DDD „ÕPŽÕPŽŽ ¡v>


œœœœ%tW &lA‘ W lltttransitions of šg û- hè¿¢ = transitions of èstart state of šg û@ hè¿¢ = start state of èqtt32 4 REGULAR LANGUAGES, IIwhich is equal to ÕPŽDDD „. Substituting all these values into (3), we getW (D „ R D „ ER DD „ E „ DDD „ HÕ'ŽÕPŽŽ ¡Ž ¡ Sois a regular expression whose matching strings comprise thelanguage accepted by the NFA on Slide 25. (Clearly, one could simplify this to a smaller, butequivalent regular expression (in the sense of Slide 8), but we do not bother to do so.)D„zR¨D8„aER DD„aE „aDDD„4.3 Complement <strong>and</strong> intersection of regular languagesWe saw in Section 3.2 that part (a) of Kleene’s <strong>The</strong>orem allows us to answer question (a)on Slide 9. Now that we have proved the other half of the theorem, we can say more aboutquestion (b) on that slide.Complementation Recall that on page 8 we mentioned that for each regular expressionover an alphabet , we can find a regular Ó}Žexpression that determines the complementof the language determined by :‘ % „ R‘ ÷t ?zHÕPŽ½Ó}Ž> (¯* qÕ'ŽAs we now show, this is a consequence of Kleene’s <strong>The</strong>orem.šgûG ·èé¢úJû>üûZý-þ› œž Ÿ ÿG¡ d@fZg iúJû>üûZý-þMÿ$…ÿ$¢› œž Ÿ ÿG¡ d@fZg iProvided èis a deterministic finite automaton, then k is¡£¢¤¢Vý¦¥”û ›£œ'ž.Ÿ ÿG¡ í ³ Û oãúJûZüûZý-þ1ÿ ¡ Û¥¤ o¡£¢A¢Vý¦¥”û ÿ .œaccepted by šg û@ hè¿¢ iff it is not accepted by è :šgÀ¸·èé¢3¢ í ³ kÄoÅ$ b ¡ k ¤ ožÀà ·èé¢a .ûGSlide 26


q‘ ÷this Ó}Ž5(Õ?tõA&&ÕAt5ÕAAt5qõqA‘ ÷õtA&q‘Õ&A<strong>and</strong> q%*q‘Õ*qqtÕ(q&‘ ÷q5&‘ ÷%ÕA*?qAõA4.3 Complement <strong>and</strong> intersection of regular languages 33Lemma 4.3.1. Õ If is a regular language over alphabetis also regular., then its complement‘ % „ RProof. Õ Since is regular, by definition there is a õ DFAof strings accepted ¦VZG‹`Ž õ^ by <strong>and</strong> hence is regular.be the DFA constructed from õsuch Õ thatas indicated on Slide 26. <strong>The</strong>nis the setõ^ . Let ¦VZG‹VŽ õ^‘p%…„ÃR Õ'Žthat ÕPŽGiven a regular expression, by part (a) of Kleene’s <strong>The</strong>orem there is a DFA õ such( tõÚ . <strong>The</strong>n by part (b) of the theorem applied to the DFA ¦VZG‹VŽ õ^ , we canÕ'Žfind a regular Ó Žexpression so Õ'Ž½Ó}Ž > ( Õ'Ž%¦‡Zv‹aŽ õÚ> that . Since‘“% „ R‘“% „ R‘ ÷t ?z.is the regular expression we need for the complement of.Õ'Ž%¦‡Zv‹aŽ õÚ> (+* qÕ'Ž õ^ ?C(+* qÕPŽNote. <strong>The</strong> construction given on Slide 26 can be õapplied ‘ to % a „ finite R automatonnot ÕPŽ%¦VZG‹`Ž õ^> it is deterministic. However, for to equalto be deterministic. See Example 4.4.2.õÕ'Ž õ^ ?whether orwe needAs another example of the power of Kleene’s <strong>The</strong>orem, given regular expres-we can show the existence of a regular Ž expression with the property:Intersectionsions <strong>and</strong>t1&t-At&9§•t-Atz&’§•t-Aiff q matchestz&<strong>and</strong> q matchestGA.q matches ŽThis can be deduced from the following lemma.Lemma 4.3.2. Õ Ifintersection<strong>and</strong> Õare a regular languages over an alphabet, then their&©¨‘“% „ RAG?»½¼( *òis also regular.Õ Õõ( /1.32&V.õ® Õ'Ž õž® Õ õõ õ ‘“%P„(ø‰«ªŽ &V.&`.&«¨‰«ª¥Ž AB(õ&`. qõ ÕõProof. Since <strong>and</strong> are regular languages, there are DFA <strong>and</strong> such that). Letbe the DFA constructed from <strong>and</strong> as onSlide 27. It is not hard to see thathas the property that any is acceptedbyiff it is accepted by both <strong>and</strong> . Thusis a regular language.‰«ª¥Ž õÕ'Ž‰«ªŽ õ>


õ*q&5q‘ ÷œà ßoSúJû>üûZý-þMÿ î <strong>and</strong> ÛMà oãúJûZüûZý-þMÿ ïÛß<strong>and</strong> èalphabet of ¡ œàœœœA<strong>and</strong> Û à¢ë ì Ýstart state of ¡õAtÝ®(%à¡ßà&ß2 / /2 / 2ßß,àà¢àA5õAßßß%234 4 REGULAR LANGUAGES, IId­¬hèlVèd“¬¢ are all ordered pairsÛ ßl Û àstates of ¡¢ with·èlVèd“¬hèlVè¢ is the common alphabet of èÛ ßëì Û ß l Û àin èl Û àà ¢ in ¡iff Û ß Ý ë ì ¢Û ßin è·èlVèd“¬Û d“¬·è ß l`è à ¢ is ZŸ ÿ î l0Ÿ ÿ ï ¢d“¬Û ßl Û à¢ accepting in ¡hèlVè¢ iff Û ß accepting in è<strong>and</strong> Û à accepting in èà.Thus given regular expressions <strong>and</strong>DFA õ <strong>and</strong> Õ'Ž ÕPŽ õž® (øwith tz&’§•t-Aõfind a regular expressionÕPŽt1&9§•t-Aso &V.thatiff, as required.t-A‰«ªŽ õt"&accepts q , iff both õSlide 27, by part (a) of Kleene’s <strong>The</strong>orem we can find). <strong>The</strong>n by part (b) of the theorem we cant-A(«/1.32<strong>and</strong> õtz&¦§…t-A (&V.õ > . Thus q matchesÕPŽtz&‰®ª¥Žaccept , q iff matches bothq <strong>and</strong>4.4 ExercisesExercise 4.4.1. Use the construction in Section 4.1 to find a regular expression for the *-,.0/1.32?DFAwhose state set is , whose start state is , whose only accepting state is , whosealphabet of input symbols is , <strong>and</strong> whose next-state function is given by the followingF table. D E ©°¯/ 2 ,*-D.5E-?Exercise 4.4.2. <strong>The</strong> õ ± construction ¦VZG‹VŽ õÚ given on Slide 26 applies to both DFA <strong>and</strong>NFA; but for æ õÚ> to be the complement of Õ'Ž õÚ we need õÕ'Ž%¦‡Zv‹aŽGive an example of an alphabet‘“%C„¸RÕPŽ õ^ ?<strong>and</strong> a NFA õis not the same set as ÕPŽ%¦VZG‹`Ž õ^> .to be deterministic.with set of input symbols , such that


q‘ ÷ŽŽt%tt‘4.4 Exercises 35Exercise 4.4.3. Let. Find a complement for over the %r( *-D.5E-?alphabet, i.e. a regular ÓÔŽexpressions over the alphabet ÕPŽ½ÓÔŽsatisfying. Do the same for the alphabet .% „ Rt ?Õ'Žt´(D{R E „ DED{R E „*-D.5EG.5F@?Tripos questions 1995.2.20 1994.3.3 1988.2.3 2000.2.7> ( * q


36 4 REGULAR LANGUAGES, II


œœí€í¡ [Å\^]DDED­E­375 <strong>The</strong> Pumping LemmaIn the context of programming languages, a typical example of a regular language (Slide 19)is the set of all strings of characters which are well-formed tokens (basic keywords, identifiers,etc) in a particular programming language, Java say. By contrast, the set of all strings whichrepresent well-formed Java programs is a typical example of a language that is not regular.Slide 28 gives some simpler examples of non-regular languages. For example, there is noway to use a search based on matching a regular expression to find all the palindromes in apiece of text (although of course there are other kinds of algorithm for doing this).Examples of non-regular languagesparentheses ‘ ’ <strong>and</strong> ‘¢ ’ occur well-nested.Gv<strong>The</strong> set of strings over ³ 5l@¢Vl ~ l € llA²¥ in which thei.e. which read the same backwards as forwards.Gv<strong>The</strong> set of strings over ³"~ l € ll'² which are palindromes,œ ³"~Slide 28<strong>The</strong> intuitive reason why the languages listed on Slide 28 are not regular is that a machinefor recognising whether or not any given string is in the language would need infinitely manydifferent states (whereas a characteristic feature of the machines we have been using is thatthey have only finitely many states). For example, to recognise that a string is of the formone would need to remember how many s had been seen before the first is encountered,requiring countably many states of the form ‘just © seen s’. This section make this intuitiveargument rigorous <strong>and</strong> describes a useful way of showing that languages such as these arenot regular.<strong>The</strong> fact that a finite automaton does only have finitely many states means that as we lookat longer <strong>and</strong> longer strings that it accepts, we see a certain kind of repetition—the pumpinglemma property given on Slide 29.


wqõ&s­qA&s­qall |´oUÀœœœ&s(with ´ ýfor [Å\^] all k & ,q­q&qÕßßqn && q‘íkà,©/ßà&ß,&(s­qAà&38 5 THE PUMPING LEMMA<strong>The</strong> Pumping LemmaFor every regular language À , there is a number ³ \´µ satisfyingthe pumping lemma property:concatenation of three strings, | i k, where k, n <strong>and</strong> kdµsatisfy:û·¬ X| ¢•\K³ can be expressed as ankdaµ´ ý·n¥¢•\´µ û·¬ş Š ¡ (i.e. )daµû·¬ ·k(i.e. qÕ ,Õ [but we knew that anyway],n¥¢º¹K³´ ýA'‘A•‘oUÀA'‘s"qs"s"qÕ ,Õ ,etc).A'‘ss"sqSlide 295.1 Proving the Pumping LemmaÕ Since is regular, it is equal to the Õ'Ž õ^ set of strings accepted by some õ DFAcan take the » number mentioned on Slide 29 to be the number of states õ(ÚD&5DA”H0H0HZDin­ with ©Uª¼» . If w. <strong>The</strong>n we. For supposews (½¸`L»q©Äªthe top of Slide 30. <strong>The</strong>n can be split into three pieces as shown on that slide. Note that. So it just remainsto check that for all . As shown on the lower half of Slide 30, the stringby choice ø of ½ <strong>and</strong> †ˆ‡`‰Šv‹ ŒŽ s ( ½ £ øPªA}‘ ,<strong>and</strong> †ˆ‡`‰Šv‹ ŒŽ qÕPŽ õ^ , then there is a transition sequence as shown ats õ â0® â¾ © ¥ â0®© (â?¿ âV® © â@® ‘658797‡%:‹ âV­ A'‘ â0®takes the machine from state back to the same state (since ). So for any ,takes us from the initial state to , then times round the loop from toitself, <strong>and</strong> then from to is accepted by, i.e. qÕ .. <strong>The</strong>refore for any ©Uª, qNote. In the above construction it is perfectly possible ø that¡ null-string, .(«,, in which case qis the


klßÊà‘Õ,5.2 Using the Pumping Lemma 39If [Å\K³ inumber of states of è , then instatesŸvÿ i Û Ü ÝMîÝ@ò ëzì Û í o¡£¢A¢Vý¦¥”û ÿñvñGñë7ì Û ß Ý@ïÛ à ñGñvñ ݤÀë5ìëXìÛEÁ ÃÄ ÅÁYÆ ßÛ for someÛ ÜGv]G¹Û Ácan’t all be distinct states. So Û | il¹K³ . So the above transition sequence looks likeÈǼÉÛ |ÿ i Û Ürê î ëVì b Ÿi Û ê ï ëVì b Û í o¡£¢A¢Vý¦¥”û ÿbwhered@fZg i ~ ß Gv ~a| n d-fhg i ~a| Æ ß vG ~ kd-fhg i~ Æ ß vv ~ í Slide 30Remark 5.1.1. One consequence of the pumping lemma property of Õ <strong>and</strong> » is that if thereis any string w in Õ of length ªL» , then Õ contains arbitrarily long strings. (We just ‘pumpup’ w by increasing © .)If you did Exercise 3.3.3, you will know that Õ if is a finite set of strings then it is regular.In this case, what is the » number with the property on Slide 29? <strong>The</strong> answer is that we cantake » any strictly greater than the length of any string in the finite Õ set . <strong>The</strong>n the Pumping†ˆ‡`‰Šv‹ ŒŽ w• ª€» with forLemma property is trivially satisfied because there are w nowhich we have to check the condition!5.2 Using the Pumping Lemma<strong>The</strong> Pumping Lemma (Slide 5.1) says that every regular language has a certain property—namely that there exists a » number with the pumping lemma property. So to show thata Õ language is not regular, it suffices to show that » ª no possesses the pumpinglemma property for the Õ language . Because the pumping lemma property involves quite acomplicated alternation of quantifiers, it will help to spell out explicitly what is its negation.This is done on Slide 31. Slide 32 gives some examples.


ÍÍÌÎÏÍÍÀ ß d@fZg i ³"~(i)Õßthere is some [Å\^] for which k ß nNí€í///ÕÕ&&í240 5 THE PUMPING LEMMAHow to use the Pumping Lemma to provethat a language Àis not regularFor each ³’\´µ , find some |«oUÀof length \K³ so that(Ë )no matter how | is split into three, | i k ß nk à,with ´ ýû·¬ ·n¥¢•\´µ ,daµdaµû·¬ ·kn¥¢8¹K³ <strong>and</strong> ´ ýk àis not in À .Slide 31Examples[For »•ª eachon Slide 31.],is of length ª6» <strong>and</strong> has property (Ñ )¡ [Å\^] is not regular.DÐ>EÐB‘[For each »•ª(Ñ ).],is of length ª6» <strong>and</strong> has propertyd@fZg(ii) À ài ³ |´o ³"~ l € b ¡ | a palindrome is not regular.DÐ>È DÐB‘À á d@fZg i(iii)[For each »•ªD× ‘, we can find a Ô prime ÔÖÕ withhas ª6» length <strong>and</strong> has (Ñ property ).]³"~OÒ“¡?Óprime is not regular.» <strong>and</strong> thenSlide 32


w¥2¥Õ–&2o//&s/Õ/­&qq&&(because » £sב¥ÕqNtq£&s¥W(rD Ú D oÛ× A/q¥¥(Hq/Ú&q2¥/HÚ&s­¥Žq–&AH/Út×Õ(@¾/×¥&`/t–&&q&s&­×q¥A(D(q&&(&/Aq&A5.2 Using the Pumping Lemma 41Proof of the examples on Slide 32. We use the method on Slide 31.»Ñªwժػ ( DIÐ7E9Ð(Ñ ww s"q& (²DÚwq s q© qsÕq(i) For any , consider the string . It is in <strong>and</strong> has length . We showthat property ) holds for this . For suppose is split as with. <strong>The</strong>n must consist entirely of s, so<strong>and</strong> say, <strong>and</strong> hence . <strong>The</strong>n the case of is not in sinceŒŽ q sÙǸ» <strong>and</strong> †ˆ‡a‰Šv‹ ŒŽ s ª(rDA (rDІˆ‡a‰ŠM‹( DOÐ7E9ÐEÐ(r,H oA (A (rD ÚD Ð(rD ÐH o E ÐH o E Ð<strong>and</strong>DБ EÐÂ÷Š(» (since¥'().†ˆ‡`‰Šv‹ Œ¥Ž s)ªH o(ii) <strong>The</strong> argument is very similar to that for example (i), but ( DÐ>È DÐstarting (²,with A the (²DÐ palindrome©. qOnce again, the qH oEaDÐof yields a string which is). »not a palindrome (because » £Š((iii) Given » ª, since there are infinitely many Ô primes , we can certainly find one satisfyingis split w ashas property (Ñ ). For suppose w» . I wÔ¥Õ claim thatq s8`¼» with †ˆ‡a‰Šv‹ ŒŽ sPª<strong>and</strong> A †ˆ‡a‰Šv‹ ( ŒŽq that , we ŒŽ †ˆ‡a‰Šv‹ have. Letting†ˆ‡a‰Šv‹ ŒŽ q<strong>and</strong>sqŒŽ s , so †ˆ‡a‰Šv‹(—D ×(D ×ò »½¼ò »½¼Ô £H o (rD ÛÝo ¾H osÜ HH oH o Ü D ×ÜÛH o (rD oH oNow Ž<strong>The</strong>refore qÔ ££ VŽÞÔ£ » A’÷ »» ªÔßÕ »©is not prime, because(since by choice, <strong>and</strong>when .(since( ¥( ¥ŒŽ s¤ª †ˆ‡a‰Šv‹ŒŽ q †ˆ‡a‰Šv‹) <strong>and</strong>s`à» ).Ô £Remark 5.2.1. Unfortunately, the method on Slide 31 can’t cope with every non-regularlanguage. This is because the pumping lemma property is a necessary, but not a sufficientcondition for a language to be regular. In other words there do exist Õ languages for which a»}ª number can be found satisfying the pumping lemma property on Slide 29, but whichnonetheless, are not regular. Slide 33 gives an example of such Õ an .


TAqT‘%ãíí/€&íqTAq(T&42 5 THE PUMPING LEMMAExample of a non-regular languagethat satisfies the ‘pumping lemma property’³"‚


­E­RDEŽDE5.4 Exercises 43Lemma If a è DFA accepts any string at all, it accepts onewhose length is less than the number of states è in .Proof. è Supposeempty, then we can find an element of it of shortest length,has ³ states (so ³Ô\´µ ). If Àà ·èé¢ is notß ~ à Gv ~ísay (where [Å\^] ). Thus there is a transition~sequenceÿ i Û Ü ÝMî ë5ì Û ß Ý@ï ë5ìŸà ñvñvñ Ý@ò ë1ì Û í o¡£¢¤¢Vý¦¥”û ÿ .ÛIf [Å\K³ , then not all the [åäµ states in this sequence can bedistinct <strong>and</strong> we can shorten it as on Slide 30. But then we wouldobtain a strictly shorter string in À¸ hè¿¢ contradicting the choice of~ ß ~ à Gv ~í. So we must have [ Ç ³ .Slide 345.4 ExercisesExercise 5.4.1. Show that the first language mentioned on Slide 28 is not regular.Exercise 5.4.2. Show that there is no õ DFA for Õ'Ž õ^ which is the language on Slide 33.[Hint: argue by contradiction. If there were such õ an , consider the õ ô DFA with the samestates õ asthose õ of, with alphabet of input symbols just consisting ofwhich are labelled by or , with start state<strong>and</strong> , with transitions all(where is the start© © ¥ F ¥ .5F ©*-D õ ,?õ state of ), <strong>and</strong> with the same accepting states as . Show that the language accepted byõ¯ô has to be<strong>and</strong> deduce that no such õcan exist.]Exercise 5.4.3. Check the claim made on Slide 33 that the language mentioned there satisfiesthe pumping lemma property of Slide 29 » with .(¯/©¤ªTripos questions 2001.2.7 1999.2.7 1998.2.7 1996.2.1(j) 1996.2.8 1995.2.271993.6.12 1991.4.6 1989.6.12


44 5 THE PUMPING LEMMA


.ñçç.æçññË.ççç.çÆØØçç.æææææææææöññ.çËöÆöÆÈÆçÍØ..öÍ..æç.IØØæÈ.I.ëH456 GrammarsWe have seen that regular languages can be specified in terms of finite automata that acceptor reject strings, <strong>and</strong> equivalently, in terms of patterns, or regular expressions, which stringsare to match. This section briefly introduces an alternative, ‘generative’ way of specifyinglanguages.6.1 Context-free grammarsSome production rules for ‘English’ sentencesæê0ëaìëðëìØOçè"ØOçIé1ØØ0éèîí"ØïØ0éèØ0éèïIèaòéó"ØççôOõïæêIëìðêñ æØ0éèïIèaòéó"ØççôOõïðëìðêñ æïOèòéó"Øæ÷öøïOèòéó"ØçôõOïðêñ æðêçôõOïØIéèaò


æ,æÈÈÈÈÈÈÈÈÈöööööööÈööööööööÍÍ%ÍñöÍöÍÆçØçØ46 6 GRAMMARS<strong>The</strong> idea is that we begin with the start symbol æ ØIçè"ØOçOézØ <strong>and</strong> use the productions tocontinually replace non-terminal symbols by strings. At successive stages in this process wehave a string which may contain both terminals <strong>and</strong> non-terminals. We choose one of thenon-terminals in the string <strong>and</strong> a production which has that non-terminal as its left-h<strong>and</strong> side.Replacing the non-terminal by the right-h<strong>and</strong> side of the production we obtain the next stringin the sequence, or derivation as it is called. <strong>The</strong> derivation stops when we obtain a stringcontaining only terminals. <strong>The</strong> set of strings over that may be obtained in this way fromthe start symbol is by definition the language generated the context-free grammar.A derivationæê0ëaìëð•ëaìØOçOèzØOçIé1ØØIéè í"ØOïØ0éèæ÷ñðêñ æëð•ëaìïIèaòé•ó"Ø ççôOõïØí"ØOïØIéèȇççôõïØÿíØïØIéèæPöøðêñ æë ðëaìȇççôõïØ ÈËØIéèæPöøðêñ æðëaìØ0éèaò


Ëæ¢Æö6.2 Backus-Naur Form 47book, for example) has productions of the q formterminals <strong>and</strong> non-terminals. For example a production of the forms where q <strong>and</strong> s are arbitrary strings ofñù ìö æý1þwould allow occurrences of ‘ìØIéèaòEíØ ’ that occur between ‘Ë ’ <strong>and</strong> ‘ÌMË ’ to be replacedby ‘ ý1þ ñù ’, deleting the surrounding symbols at the same time. This kind of production is notÆpermitted in a context-free grammar.ØIéèaò


Éþþ–æ£ æÙô ûý(5) Ù…–rŽ Ù ôÏô(6) Ù…–rŽ Ù ô ôÉÉþNon-terminal: ¢Start symbol: ¢Productions: ¢íææææ€í±þ¡ [Å\^]¢€48 6 GRAMMARSnamely:ûý}æûý}æûýÈ1ÙzÉÈ1ÙzÉÈzÙ1ÉÉ È1Ù1ÉŽ½ÈzÙ1É8<strong>The</strong> language generated by this grammar is supposed to represent certain arithmetic expressions.For exampleÈ1ÙzÉis in the language, butis not. (See Exercise 6.4.2.)A context-free grammar for the language³"~Terminals:~ €) ) i j ¡"~Slide 38


¥ © æset of terminals = $ ÿâ&­æE­set of non-terminals = úJûZüûZý-þ ÿR%ââ&Aææ%©H­6.3 <strong>Regular</strong> grammars 496.3 <strong>Regular</strong> grammarsA language over an alphabet is Õcontext-free iff is the set of strings generated Õbysome context-free grammar (with set of terminals ). <strong>The</strong> context-free grammar on *-DSlide ,?38©Úªgenerates the language. We saw in Section 5.2 that this is not a regularlanguage. So the class of context-free languages is not the same as the class of regularlanguages. Nevertheless, as Slide 39 points out, every regular language is context-free. Forthe grammar defined on that slide clearly has the property that derivations from % the „startsymbol to a string in must be of the form of a finite number of productions of the firstkind followed by a single production of the second kind, i.e.D&D&5DAD&5DA”H0H0H>DD&5DA!H0H0H>D(?(?(where in õthe following transition sequence holds‘J58797© çE> ¥æ £çA@æ £çABæ ££(?(?(­âVæâV­‡%:‹Thus a string is in the language generated by the grammar iff it is accepted by õ .Every regular language is context-freeGiven a DFA è , the set À¸ hè¿¢ of strings accepted by è can begenerated by the following context-free grammar:start symbol = start state è ofproductions of two kinds:ì ~ Ûwhenever Û Ý ëìÛin èÛì j whenever Û o3¡£¢¤¢Vý¦¥”û ÿÛSlide 39


ââ&ææââ⣠ì k¥¤Nâç§¦æ £&&/H(50 6 GRAMMARSDefinition A context-free grammar is regular iff all itsproductions are of the formorì k £k where is a string of terminals <strong>and</strong> £<strong>and</strong> ¤ are non-terminals.<strong>The</strong>orem(a) Every language generated by a regular grammar is aregular language (i.e. is the set of strings accepted by someDFA).(b) Every regular language can be generated by a regulargrammar.Slide 40It is possible to single out context-free grammars of a special form, called regular (orright linear), which do generate regular languages. <strong>The</strong> definition is on Slide 40. Indeed, asthe theorem on that slide states, this type of grammar generates all possible regular languages.Proof of the <strong>The</strong>orem on Slide 40. First note that part (b) of the theorem has already beenproved, because the context-free grammar generating ÕPŽ õ^ on Slide 39 is a regular grammar(of a special kind).To prove part (a), given a regular grammar we have to construct a õ DFA whose setof accepted strings coincides with the strings generated by the grammar. By the SubsetConstruction (<strong>The</strong>orem on Slide 18), it is enough to construct NFAó an with this property.This makes the task much easier. We take the states õ of to be the non-terminals, augmentedby some extra states described below. Of course the alphabet of input symbols õ of shouldbe the set of terminal symbols of the grammar. <strong>The</strong> start state is the start symbol. Finally, thetransitions <strong>and</strong> the accepting states õ of are defined as follows.(i) For each production of the â formfresh â states, we add © £&0.with †ˆ‡a‰ŠM‹ ŒŽ q’ªAG. NM.0H0H0H3. qâMô, q­ say withto the automaton <strong>and</strong> transitions(¿D&7DA”H0H0H>D©ÖªâV­IHçE>æ £çA@æ £çABæ ££(?(?(hâV­IHâ ô(ii) For each production of the â form-transition ¡¡ , we add anqâ ô with †ˆ‡a‰Šv‹ ŒŽ q (°, , i.e. with qHó £ æâ ô


(iii) For each production of the form ââââæâæ&ââââWWW&âAææææç¨¦æ £¡FââW& âNW/#!H(©/6.4 Exercises 51we add © fresh states â, say q(rD¥&7DA”H0H0H>D†ˆ‡`‰Šv‹ ŒŽ q¥ ªq withto the automaton <strong>and</strong> transitions­ ầ,­ with ©pª&0.A-.NM.0H0H0Ha.ç¤Bæ £ˆ£ç?>æ £ç'@Moreover we make the state â@­ accepting.(iv) For each production of the â formin any new states or transitions, but we do â make an accepting state.q with †ˆ‡`‰Šv‹ Œ¥Ž q (+, , i.e. with q¡ , we do not addIf we have a transition sequence õ in of the â formâ ‡:‹with , wecan divide it up into pieces according to where non-terminals occur <strong>and</strong> then convert eachpiece into a use of one of the production rules, thereby forming a derivation q of in thegrammar. Reversing this process, every derivation of a string of terminals can be convertedinto a transition sequence in the automaton from the start state to an accepting state. ThusNFAó this does indeed accept exactly the set of strings generated by the given regulargrammar.¥ ©‘ 5D77£ æ(?(?(âV­6.4 ExercisesExercise 6.4.1. Why is the string (4) not in the language generated by the context-freegrammar in Section 6.1?Exercise 6.4.2. Give a derivation showing that (5) is in the language generated by thecontext-free grammar on Slide 37. Prove that (6) is not in that language. [Hint: show thatif q is a string of terminals <strong>and</strong> non-terminals occurring in a derivation of this grammar <strong>and</strong>that ‘ô ’ occurs in q , then it does so in a substring of the form s8ô , or sô ô , or sô ôÏô , etc., where s iseither Ù or ûý .]Exercise 6.4.3. Give a context-free grammar generating all the palindromes over the *-D.5E@?alphabet(cf. Slide 28).Exercise 6.4.4. Give a context-free grammar generating all the regular expressions over thealphabet .*-D.5E-?Exercise 6.4.5. Using the construction given in the proof of part (a) of the <strong>The</strong>orem onSlide 40, convert the regular grammar with start â symbol <strong>and</strong> productionsDEDEinto an NFAó whose language is that generated by the grammar.Exercise 6.4.6. Is the language generated by the context-free grammar on Slide 35 a regularlanguage? What about the one on Slide 37?Tripos questions 1997.2.7 1996.2.1(k) 1994.4.3 1991.3.8 1990.6.10


52 6 GRAMMARS


1. Name of Lecturer: ÇùæöòñôÍ©ÈñöLectures Appraisal FormIf lecturing st<strong>and</strong>ards are to be maintained where they are high, <strong>and</strong> improved where theyare not, it is important for the lecturers to receive feedback about their lectures.Consequently, we would be grateful if you would complete this questionnaire, <strong>and</strong> eitherreturn it to the lecturer in question, or to the Student Administrator, <strong>Computer</strong> Laboratory,William Gates Building. Thank you.2. Title of Course: éË3. How many lectures have you attended in this series so far? . . . . . . . . . . . . . . . . . . . . . . .è¼ôËzÇïÈzÆ1üË1Ç&ó8ËDo you intend to go to the rest of them?Æ8ËzÆÈ8ÍrË©0ýYû©û

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!