Pumping Lemma, closure and decision properties of regular ...

Extended RE's• UNIX pioneered the use of additionaloperators and notation for RE's:– E? = 0 or 1 occurrences of E = ε + E– E+ = 1 or more occurrences of E = EE*– Character classes [a-zGX] the union of all(ASCII) characters from a to z, plus thecharacters G and X, for example.Algebraic Laws for RE's• If two expressions E and F have no variables, thenE = F means that L(E) = L(F) (not that E and F areidentical expressions).– Example: 1+ = 11*• If E and F are RE's with variables, then E = F (E isequivalent to F) means that whatever languageswe substitute for the variables (provided wesubstitute the same language everywhere thesame variable appears), the resulting expressionsdenote the same language.– Example: R+ = RR*• With two notable exceptions, we can think ofunion (+) as if it were addition with Φ in place ofthe identity 0, and concatenation, with ε in placeof the identity 1, as multiplication.– + and concatenation are both associative.– + is commutative.– Laws of the identities hold for both.– Φ is the annihilator for concatenation.– The exceptions:1. Concatenation is not commutative: ab ≠ ba.2. + is idempotent: E + E = E for any expression E.Checking a Law• Suppose we are told that the law (R + S)* = (R*S*)*holds for RE's. How would we check that thisclaim is true?– Think of R and S as if they were single symbols, ratherthan placeholders for languages, i.e., R = {0} and S = {1}.– Then the left side is clearly "any sequence of 0's and 1's"– The right side also denotes any string of 0's and 1's,since 0 and 1 are each in L(0*1*).• That test is necessary (i.e., if the test fails, then thelaw does not hold.– We have particular languages that serve as acounterexample.• But is it sufficient (if the test succeeds, the lawholds)?Proof of Sufficiency• The book has a fairly simple argument forwhy, when the "concretized" expressionsdenote the same language, then the languageswe get by substituting any languages for thevariables are also the same.• But if you think that's obvious, the book alsohas an example of "RE's with intersection"where the same statement is false.– Check it outClosure Properties• Not every language is a regular language.• However, there are some rules that say "if theselanguages are regular, so is this one derived fromthem"• There is also a powerful technique -- the pumpinglemma -- that helps us prove a language not to beregular.• Key tool: Since we know RE's, DFA's, NFA's,NFA-ε's all define exactly the regular languages,we can use whichever representation suits uswhen proving something about a regularlanguage.1

Pumping Lemma• If L is a regular language, then thereexists a constant n such that everystring w in L, of length n or more, can bewritten as w = xyz, where:– 0 < |y|.– |xy| ≤ n.– For all i ≥ 0, xy i z is also in L.• Note y i = y repeated i times; y 0 = ε.Intuitive ExplanationThe automaton below has n states and no loops.Expressed in terms of n, what is the longest string thisautomaton can accept?Generally, in an automaton (graph) with n states(vertices), any "walk" of length n or greater mustrepeat some state (vertex)--that is, contain acycle.Proof of Pumping Lemma• Since we claim L is regular, there must be a DFAA such that L = L(A).• Let A have n states; choose this n for thepumping lemma.• Let w be a string of length ≥ n in L, say w =a 1 a 2…a m , where m ≥ n.• Let q i be the state A is in after reading the first isymbols of w.– q 0 = start state, q 1 = δ(q 0 , a 1 ), q 2 = ˆδ (q 0 , a 1 a 2 ), etc.• Since there are only n different states, two of q 0 ,q 1 ,… q n must be the same; say q i = q j , where 0 ≤ i < j≤ n.• Let x = a 1 …a i ; y = a i+1 …a j ; z = a j +1 … a m .• Then by repeating the loop from q i to q j with labela i+1 …a j zero times, once, or more, we can showthat xy i z is accepted by A.ya i+1 … a ja 1 …xq j…. a iq iza j +1 … a m• PL gets its name because the repeated string is"pumped"– Note that because of the nature of FA's, cannot controlthe number of times it is pumped– So, regular language with strings of length ≥ n isalways infinite• PL only interesting for infinite languages– but works for finite languages, which are alwaysregular--in this case n is larger than the longest stringso nothing can be pumped• The PL is an application of the "pidgeon- holeprinciple"PL Use• We use the PL to show a language L isnot regular.– Start by assuming L is regular.– Then there must be some n that serves asthe PL constant.• We may not know what n is, but we can workthe rest of the "game" with n as a parameter.– We choose some w that is known to be in L.• Typically, w depends on n.2

• Applying the PL, we know w can bebroken into xyz, satisfying the PLproperties.• Again, we may not know how to breakw, so we use x, y, z as parameters.• We derive a contradiction by picking i(which might depend on n, x, y, and/orz) such that xy i z is not in L.Example• Consider the set of strings of 0's whoselength is a perfect square; formally L ={0 i | i is a square}.– We claim L is not regular.– Suppose L is regular. Then there is aconstant n satisfying the PL conditions.– Consider w = 0 n2 , which is surely in L.– Then w = xyz, where |xy| ≤ n and y ≠ ε.– By PL, xyyz is in L. But the length of xyyz isgreater than n 2 and no greater than n 2 + n.(Why?)– However, the next perfect square after n 2 is(n+ 1) 2 = n 2 + 2n + 1.– Thus, xyyz is not of square length and isnot in L.– Since we have derived a contradiction, theonly unproved assumption -- that L isregular -- must be at fault, and we have a"proof by contradiction" that L is notregular.The PL "game"• Goal: win the PL game against ouropponent by establishing acontradiction of the PL, while theopponent tries to foil us.• Four steps:1.The number of states in the automaton is n. Notethat we don't have to know what n is, since weuse the variable to define our string.2.Given n, we pick a string w in L of length equal toor greater than n.• We are free to choose any w, subject to w ∈ L and |w|≥ n.• We usually define the string in terms of n.3.Our opponent chooses the decomposition xyz,subject to |xy| ≤ n, |y| ≥ 1.4.We try to pick i (the power factor in xy i z) in sucha way that the pumped string w i is not in L. If wecan do so, we win the game.Example 1• Σ = {a,b}; consider L = {ww R | w ∈ Σ*}.– Whatever n the opponent chooses in step 1, we canalways choose a w as follows:n n n na…ab…bb…ba…a|-------|-------|--------|-------||---|---|------------------------|x y– Because of this choice and the requirement that|xy| ≤ n, the opponent is restricted in step 3 tochoosing a y that consists entirely of a's.– In step 4, we use i=2.The string xy 2 z has more a'son the left than on the right, so it cannot be of formww R. So L is not regular.z3

Example 2• Consider L = {a i b i | i ≥ 0}.• Given n, we choose the string a n b n for ourargument. If the language is regular, ouropponent can break this string into xyzwhere for any j ≥ 0 xy j z is in L.• Because |xy|≤ n and |y|> 0, the string yhas to consist of a’s only. So pumping yadds to the number of a’s and hence thereare more a’s than b’sExample 3• Consider L = {w | w has an equal numberof 1's and 0's}• Given n, we choose the string (01) n• We need to show splitting this string intoxyz where xy i z is in L is impossible…• But it is possible!– If x = ε, y = 01, and z = (01) n-1 , xy i z is in L forevery value of i.• Are we out of luck?First law of PL use:If your string does not succeed, try another!Not this time…• Let's try 1 n 0 n .• Again, we need to show splitting this stringinto xyz where xy i z is in L is impossible…• But it is possible!– If x and y are the empty string and y is 1 n 0 n , thenxy i z always has an equal number of 0's and 1's.• Are we still in trouble?• … the PL says that our string has to bedivided so that |xy| ≤ n and |y|.• If |xy|≤ n then y must consist only of 0's, soxyyz ∉ L.• Contradiction! We win.Example 4• Consider L = {ww | w ∈ Σ*}• We choose the string a n ba n b, where n is thenumber of states in the FA. We now show thatthere is no decomposition of this string into xyzwhere for any j ≥ 0 xy j z is in L.• Again, it is crucial that the PL insists that |xy|≤ n,because without it we could could pump the stringif we let x and z be the empty string.• With this condition, it's easy to show that the PLwon't apply because y must consist only of a's, soxyyz is not in L.• In the previous example as before, thechoice of string is critical: had wechosen a n a n (which is a member of L)instead of a n ba n b, it wouldn't workbecause it can be pumped and stillsatisfy the PL.MORALChoose your strings wisely.4

Example 5"Pumping down"• L = {0 i 1 j | i > j}• Given n, choose s = 0 n+1 1 n.• Split into xyz… etc.• Because by the PL|xy|≤ n, y consistsonly of 0's.• Is xyyz in L?• The PL states that xy i z is in L even wheni = 0• So, consider the string xy 0 z– Removing string y decreases the number of0's in s– s has only one more 0 than 1– Therefore, xz cannot have more 0's than 1's,and is not a member of L.• Contradiction!The Pumping Lemma PoemAny regular language L has a magic number pAnd any long-enough word in L has the following property:Amongst its first p symbols is a segment you can findWhose repetition or omission leaves x amongst its kind.So if you find a language L which fails this acid test,And some long word you pump becomes distinct from all the rest,By contradiction you have shown that language L is notA regular guy, resilient to the damage you have wrought.But if, upon the other hand, x stays within its L,Then either L is regular, or else you chose not well.For w is xyz, and y cannot be null,And y must come before p symbols have been read in full.As mathematical postscript, an addendum to the wise:The basic proof we outlined here does certainly generalize.So there is a pumping lemma for all languages context-free,Although we do not have the same for those that are r.e.Remember• You need to find only one string forwhich the PL does not hold to prove alanguage is not regular• But you must show that for anydecomposition into xyz the PL holds– This sometimes means considering severaldifferent casesExample• L = {a 3 b i c i-3 | i > 3}• Assume L is regular, PL holds.• Choose w = a 3 b n c n-3 with n as the PL constant.• Three ways to partition w into xyz:1. y contains only a’s2. y contains only b’s3. y contains a’s and b’s• Have to show that each of these partitions leadsto a contradiction, i.e. that there is no possibleway to divide w into xyz so that the PL holdsCase1: y contains only a’s• Then x contains 0 to 2 a’s, y contains 1 to 3 a’s,and z contains 0 to 2 a’s concatenated onto therest of the string b n c n-3 , such that there areexactly 3 a’s. So the partition isx = a k y = a j z = a 3-k-j b n c n-3where k ≥ 0, j > 0, and k+j ≤ 3.• It should be true that xy i z ∈ L for all i ≥ 0.• xy 2 z = (x)(y)(y)(z) = (a k )(a j )(a j )(a 3-j-k b n c n-3 ) =a 3+j b n c n-3 ∉ L since j > 0, there are too many a’s• CONTRADICTION!5

Case 2: y contains only b’s• Then x contains 3 a’s followed by 0 or more b’s,y contains 1 to n-3 b’s, and z contains 3 to n-3b’s concatenated onto the rest of the string c n-3 .So the partition isx = a 3 b k y = b j z = b n-k-j c n-3where k ≥ 0, j > 0, and k+j ≤ n-3.• It should be true that xy i z ∈ L for all i ≥ 0.• xy 0 z = a 3 b n-j c n-3 ∉ L since j > 0, there are too fewb’s• CONTRADICTION!Case 3: y contains a’s and b’s• Then x contains 0 to 2 a’s, y contains 1 to 3 a’sand 1 to n-3 b’s, and z contains 3 to n-1 b’sconcatenated onto the rest of the string c n-3 . Sothe partition isx = a 3-k y = a k b j z = b n-j c n-3where 3 ≥ k ≥ 0, and n-3 ≥ j > 0.• It should be true that xy i z ∈ L for all i ≥ 0.• xy 2 z =a 3 b j a k b n c n-3 ∉ L since j, k > 0, and there areb’s before a’s• There is no partition of w• L is not regular!Closure Properties• Certain operations on regular languagesare guaranteed to produce regularlanguages.– Example: the union of regular languages isregular; start with RE's, and apply + to getan RE for the union.Substitution• Take a regular language L over some alphabetΣ.• For each a in Σ, let L a be a regular language.• Let s be the substitution defined by s(a) = L a foreach a.– Extend s to strings by s(a 1 a 2 … a n ) = s(a 1 )s(a 2 ) … s(a n );i.e., concatenate the languages L a1L a2...L an.– Extend s to languages by s(M) =∪ w in M s(w).• Then s(L) is regular.Proof That Substitution of RegularLanguages Into a Regular Language isRegular• Let R be a regular expression for language L.• Let R a be a regular expression for language s(a)= L a , for all symbols a in Σ.• Construct a RE E for s(L) by starting with Rand replacing each symbol a by the RE L a .• Proof that L(E) = s(L) is an induction on theheight of (the expression tree for) RE R.• Basis: R is a single symbol, a. Then E = R a , L ={a}, and s(L) = s({a}) = L(R a ).• Cases where R is ε or Φ are easy.• Induction: There are three cases, depending onwhether R = R 1 + R 2 , R = R 1 R 2 , or R = R 1 *.• We'll do only R = R 1 R 2 .• L = L 1 L 2 , where L 1 = L(R 1 ) and L 2 = L(R 2 ).• Let E 1 be R 1 , with each a replaced by R a ; samefor E 2 .• By the IH, L(E 1 ) = s(L 1 ) and L(E 2 ) = s(L 2 ).• Thus, L(E) = s(L 1 )s(L 2 ) = s(L).6

Applications of the SubstitutionTheorem• If L 1 and L 2 are regular, so is L 1 L 2 .– Let s(a) = L 1 and s(b) = L 2 . Substitute into theregular language {ab}.• So is L 1 ∪ L 2 .– Substitute into {a, b}.• Ditto L*– Substitute into L(a*).• Closure under homomorphism = substitutionof one string for each symbol.– Special case of a substitution.• Example: Homomorphism– Let L = L(0*1*), and let h be a homomorphismdefined by h(0) = aa and h(1) = ε.• Then h(L) = (L(aa)*) all strings with an evennumber of a's• Closure Under Inverse Homomorphism– h -1 (L)={w | h(w) is in L}– See the argument in the book. Briefly:• Given homomorphism h and regular language L,start with a DFA A for L.• Construct DFA B for h -1 (L) by having B go fromstate q to state p on input a if ˆδ A(q,h(a)) = p.Closure Under Reversal• The reverse of a string w = a 1 a 2 . . . a n isa n . . . a 2 a 1 .– Denoted w R .– Note ε R = ε.• The reverse of a language L is the setcontaining the reverse of each string inL.• If L is regular, so is L R .– Proof: use RE's, recursive reversal as in thebook.Decision Properties of Regular Languages• Given a (representation, e.g., RE, FA) of a regularlanguage L, what can we tell about L?– Since there are algorithms to convert between any tworepresentations, we can choose the representation thatmakes the test easiest.• Membership– Is string w in regular language L?• Choose DFA representation for L.• Simulate the DFA on input w.• Emptiness– Is L = Φ?• Use DFA representation.• Use a graph-reachability algorithm to test if at least oneaccepting state is reachable from the start state.Finiteness• Is L a finite language?– Note every finite language is regular (why?), but aregular language is not necessarily finite.• DFA method:– Given a DFA for L, eliminate all states that are notreachable from the start state and all states that donot reach an accepting state.– Test if there are any cycles in the remaining DFA;if so, L is infinite, if not, then L is finite.RE method:• Almost, we can look for a * in the RE and say itslanguage is infinite if there is one, finite if not. However,there are exceptions, e.g. 0ε*1 or 0*Φ. Thus:1. Find sub-expressions equivalent to Φ by:• (Basis) Φ is; ε and a are not.• (Induction) E+F is iff both E and F are; EF is ifeither E or F is; E* never is.2. Eliminate sub-expressions equivalent to Φ by:• Replace E + F or F + E by F whenever E is and Fisn't.• Replace E* by ε whenever E is equivalent to Φ.7

3. Now, find sub-expressions that areequivalent to ε by:– (Basis) ε is; a isn't.– (Induction) E+F is iff both E and F are; ditto EF; E*is iff E is.4. Now, we can tell if L(R) is infinite by lookingfor a sub-expression E* such that E is notequivalent to ε.Example• Consider (0 + 1Φ)* + 1Φ*.– Step 1: Φ (twice) and 1Φ are sub-expressionsequivalent to Φ.– Step 2: Remove these sub-expressions• Replace (0 + 1Φ) by 0• Replace Φ* by ε• 0* + 1ε remains.– Step 3: only sub-expression ε is equivalent to ε;left with 0* + 1.– Since 0 is starred, language is infinite.Minimization of States• Real goal is testing equivalence of(representations of) two regularlanguages.• Interesting fact: DFA's have unique(up to state names) minimum-stateequivalents.Distinguishable States• Key idea: find states p and q that aredistinguishable because there is some input wthat takes exactly one of p and q to an acceptingstate.• Basis: any non-accepting state is distinguishablefrom any accepting state (w = ε).• Induction: p and q are distinguishable if there issome input symbol a such that δ(p, a) isdistinguishable from δ(q, a).– All other pairs of states are indistinguishable, andcan be merged into one state.Example (very simple)Consider:0q0p1 11r0Can we distinguish q from r?• No string beginning with 0 works.• both states go to p, and therefore any string of the form0x takes q and r to the same state.• No string beginning with 1 works.• Starting in either q or r, as long as we have input 1, weare in one of the accepting states. When a 0 is read, wego to the same state (p) and then regardless of input,the same states forever after.0q• p is distinguishable from q and r by basisp011 1r08

Constructing the Minimum-State DFA• For each group of indistinguishable states,pick a "representative."– Note a group can be large, e.g., q 1 , q 2 ,…, q k , if allpairs are indistinguishable.– Indistinguishability is transitive (why?) soindistinguishability partitions states.• If p is a representative, and δ(p, a) = q, inminimum-state DFA the transition from p ona is the representative of q's group (or to qitself, if q is either alone in a group or arepresentative).• Start state is representative of theoriginal start state.• Accepting states are representatives ofgroups of accepting states.– Notice we could not have a "mixed"(accepting + non-accepting) group (why?).• Delete any state that is not reachablefrom the start state.Example• For the DFA given earlier, p is in a group byitself; {q, r} is the other group.0q0p1 11r00pq0,11Why Above Minimization Can't Be Beaten• Suppose we have a DFA A, and we minimize it toconstruct a DFA M. But there is another DFA Nthat accepts the same language as A and M, yet hasfewer states than M.• Proof by contradiction that this can't happen:– Run the state-distinguishability process on the states ofM and N together.– Start states of M and N are indistinguishable becauseL(M) = L(N).– If {p, q} are indistinguishable, then their successors onany one input symbol are also indistinguishable.– Thus, since neither M nor N could have an inaccessiblestate, every state of M is indistinguishable from at leastone state of N.– Since N has fewer states than M, there are twostates of M that are indistinguishable from thesame state of N, and therefore indistinguishablefrom each other.– But M was designed so that all its states aredistinguishable from each other.– We have a contradiction, so the assumption that Nexists is wrong, and M in fact has as few states asany equivalent DFA for A.– In fact (stronger), there must be a 1-1correspondence between the states of any otherminimum-state N and the DFA M, showing thatthe minimum-state DFA for A is unique up torenaming of the states.End of Part I• So ends the first part of the course:Regular Languages and FiniteAutomata• Exam I (Oct. 8, 7pm in this room) coversmaterial up to here• Coming next week: Context-FreeLanguages and Push-Down Automata(stay tuned for more exciting adventures…)9

Pumping Lemma, closure and decision properties of regular ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?