12.07.2015 Views

Pumping Lemma, closure and decision properties of regular ...

Pumping Lemma, closure and decision properties of regular ...

Pumping Lemma, closure and decision properties of regular ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Extended RE's• UNIX pioneered the use <strong>of</strong> additionaloperators <strong>and</strong> notation for RE's:– E? = 0 or 1 occurrences <strong>of</strong> E = ε + E– E+ = 1 or more occurrences <strong>of</strong> E = EE*– Character classes [a-zGX] the union <strong>of</strong> all(ASCII) characters from a to z, plus thecharacters G <strong>and</strong> X, for example.Algebraic Laws for RE's• If two expressions E <strong>and</strong> F have no variables, thenE = F means that L(E) = L(F) (not that E <strong>and</strong> F areidentical expressions).– Example: 1+ = 11*• If E <strong>and</strong> F are RE's with variables, then E = F (E isequivalent to F) means that whatever languageswe substitute for the variables (provided wesubstitute the same language everywhere thesame variable appears), the resulting expressionsdenote the same language.– Example: R+ = RR*• With two notable exceptions, we can think <strong>of</strong>union (+) as if it were addition with Φ in place <strong>of</strong>the identity 0, <strong>and</strong> concatenation, with ε in place<strong>of</strong> the identity 1, as multiplication.– + <strong>and</strong> concatenation are both associative.– + is commutative.– Laws <strong>of</strong> the identities hold for both.– Φ is the annihilator for concatenation.– The exceptions:1. Concatenation is not commutative: ab ≠ ba.2. + is idempotent: E + E = E for any expression E.Checking a Law• Suppose we are told that the law (R + S)* = (R*S*)*holds for RE's. How would we check that thisclaim is true?– Think <strong>of</strong> R <strong>and</strong> S as if they were single symbols, ratherthan placeholders for languages, i.e., R = {0} <strong>and</strong> S = {1}.– Then the left side is clearly "any sequence <strong>of</strong> 0's <strong>and</strong> 1's"– The right side also denotes any string <strong>of</strong> 0's <strong>and</strong> 1's,since 0 <strong>and</strong> 1 are each in L(0*1*).• That test is necessary (i.e., if the test fails, then thelaw does not hold.– We have particular languages that serve as acounterexample.• But is it sufficient (if the test succeeds, the lawholds)?Pro<strong>of</strong> <strong>of</strong> Sufficiency• The book has a fairly simple argument forwhy, when the "concretized" expressionsdenote the same language, then the languageswe get by substituting any languages for thevariables are also the same.• But if you think that's obvious, the book alsohas an example <strong>of</strong> "RE's with intersection"where the same statement is false.– Check it outClosure Properties• Not every language is a <strong>regular</strong> language.• However, there are some rules that say "if theselanguages are <strong>regular</strong>, so is this one derived fromthem"• There is also a powerful technique -- the pumpinglemma -- that helps us prove a language not to be<strong>regular</strong>.• Key tool: Since we know RE's, DFA's, NFA's,NFA-ε's all define exactly the <strong>regular</strong> languages,we can use whichever representation suits uswhen proving something about a <strong>regular</strong>language.1


<strong>Pumping</strong> <strong>Lemma</strong>• If L is a <strong>regular</strong> language, then thereexists a constant n such that everystring w in L, <strong>of</strong> length n or more, can bewritten as w = xyz, where:– 0 < |y|.– |xy| ≤ n.– For all i ≥ 0, xy i z is also in L.• Note y i = y repeated i times; y 0 = ε.Intuitive ExplanationThe automaton below has n states <strong>and</strong> no loops.Expressed in terms <strong>of</strong> n, what is the longest string thisautomaton can accept?Generally, in an automaton (graph) with n states(vertices), any "walk" <strong>of</strong> length n or greater mustrepeat some state (vertex)--that is, contain acycle.Pro<strong>of</strong> <strong>of</strong> <strong>Pumping</strong> <strong>Lemma</strong>• Since we claim L is <strong>regular</strong>, there must be a DFAA such that L = L(A).• Let A have n states; choose this n for thepumping lemma.• Let w be a string <strong>of</strong> length ≥ n in L, say w =a 1 a 2…a m , where m ≥ n.• Let q i be the state A is in after reading the first isymbols <strong>of</strong> w.– q 0 = start state, q 1 = δ(q 0 , a 1 ), q 2 = ˆδ (q 0 , a 1 a 2 ), etc.• Since there are only n different states, two <strong>of</strong> q 0 ,q 1 ,… q n must be the same; say q i = q j , where 0 ≤ i < j≤ n.• Let x = a 1 …a i ; y = a i+1 …a j ; z = a j +1 … a m .• Then by repeating the loop from q i to q j with labela i+1 …a j zero times, once, or more, we can showthat xy i z is accepted by A.ya i+1 … a ja 1 …xq j…. a iq iza j +1 … a m• PL gets its name because the repeated string is"pumped"– Note that because <strong>of</strong> the nature <strong>of</strong> FA's, cannot controlthe number <strong>of</strong> times it is pumped– So, <strong>regular</strong> language with strings <strong>of</strong> length ≥ n isalways infinite• PL only interesting for infinite languages– but works for finite languages, which are always<strong>regular</strong>--in this case n is larger than the longest stringso nothing can be pumped• The PL is an application <strong>of</strong> the "pidgeon- holeprinciple"PL Use• We use the PL to show a language L isnot <strong>regular</strong>.– Start by assuming L is <strong>regular</strong>.– Then there must be some n that serves asthe PL constant.• We may not know what n is, but we can workthe rest <strong>of</strong> the "game" with n as a parameter.– We choose some w that is known to be in L.• Typically, w depends on n.2


• Applying the PL, we know w can bebroken into xyz, satisfying the PL<strong>properties</strong>.• Again, we may not know how to breakw, so we use x, y, z as parameters.• We derive a contradiction by picking i(which might depend on n, x, y, <strong>and</strong>/orz) such that xy i z is not in L.Example• Consider the set <strong>of</strong> strings <strong>of</strong> 0's whoselength is a perfect square; formally L ={0 i | i is a square}.– We claim L is not <strong>regular</strong>.– Suppose L is <strong>regular</strong>. Then there is aconstant n satisfying the PL conditions.– Consider w = 0 n2 , which is surely in L.– Then w = xyz, where |xy| ≤ n <strong>and</strong> y ≠ ε.– By PL, xyyz is in L. But the length <strong>of</strong> xyyz isgreater than n 2 <strong>and</strong> no greater than n 2 + n.(Why?)– However, the next perfect square after n 2 is(n+ 1) 2 = n 2 + 2n + 1.– Thus, xyyz is not <strong>of</strong> square length <strong>and</strong> isnot in L.– Since we have derived a contradiction, theonly unproved assumption -- that L is<strong>regular</strong> -- must be at fault, <strong>and</strong> we have a"pro<strong>of</strong> by contradiction" that L is not<strong>regular</strong>.The PL "game"• Goal: win the PL game against ouropponent by establishing acontradiction <strong>of</strong> the PL, while theopponent tries to foil us.• Four steps:1.The number <strong>of</strong> states in the automaton is n. Notethat we don't have to know what n is, since weuse the variable to define our string.2.Given n, we pick a string w in L <strong>of</strong> length equal toor greater than n.• We are free to choose any w, subject to w ∈ L <strong>and</strong> |w|≥ n.• We usually define the string in terms <strong>of</strong> n.3.Our opponent chooses the decomposition xyz,subject to |xy| ≤ n, |y| ≥ 1.4.We try to pick i (the power factor in xy i z) in sucha way that the pumped string w i is not in L. If wecan do so, we win the game.Example 1• Σ = {a,b}; consider L = {ww R | w ∈ Σ*}.– Whatever n the opponent chooses in step 1, we canalways choose a w as follows:n n n na…ab…bb…ba…a|-------|-------|--------|-------||---|---|------------------------|x y– Because <strong>of</strong> this choice <strong>and</strong> the requirement that|xy| ≤ n, the opponent is restricted in step 3 tochoosing a y that consists entirely <strong>of</strong> a's.– In step 4, we use i=2.The string xy 2 z has more a'son the left than on the right, so it cannot be <strong>of</strong> formww R. So L is not <strong>regular</strong>.z3


Example 2• Consider L = {a i b i | i ≥ 0}.• Given n, we choose the string a n b n for ourargument. If the language is <strong>regular</strong>, ouropponent can break this string into xyzwhere for any j ≥ 0 xy j z is in L.• Because |xy|≤ n <strong>and</strong> |y|> 0, the string yhas to consist <strong>of</strong> a’s only. So pumping yadds to the number <strong>of</strong> a’s <strong>and</strong> hence thereare more a’s than b’sExample 3• Consider L = {w | w has an equal number<strong>of</strong> 1's <strong>and</strong> 0's}• Given n, we choose the string (01) n• We need to show splitting this string intoxyz where xy i z is in L is impossible…• But it is possible!– If x = ε, y = 01, <strong>and</strong> z = (01) n-1 , xy i z is in L forevery value <strong>of</strong> i.• Are we out <strong>of</strong> luck?First law <strong>of</strong> PL use:If your string does not succeed, try another!Not this time…• Let's try 1 n 0 n .• Again, we need to show splitting this stringinto xyz where xy i z is in L is impossible…• But it is possible!– If x <strong>and</strong> y are the empty string <strong>and</strong> y is 1 n 0 n , thenxy i z always has an equal number <strong>of</strong> 0's <strong>and</strong> 1's.• Are we still in trouble?• … the PL says that our string has to bedivided so that |xy| ≤ n <strong>and</strong> |y|.• If |xy|≤ n then y must consist only <strong>of</strong> 0's, soxyyz ∉ L.• Contradiction! We win.Example 4• Consider L = {ww | w ∈ Σ*}• We choose the string a n ba n b, where n is thenumber <strong>of</strong> states in the FA. We now show thatthere is no decomposition <strong>of</strong> this string into xyzwhere for any j ≥ 0 xy j z is in L.• Again, it is crucial that the PL insists that |xy|≤ n,because without it we could could pump the stringif we let x <strong>and</strong> z be the empty string.• With this condition, it's easy to show that the PLwon't apply because y must consist only <strong>of</strong> a's, soxyyz is not in L.• In the previous example as before, thechoice <strong>of</strong> string is critical: had wechosen a n a n (which is a member <strong>of</strong> L)instead <strong>of</strong> a n ba n b, it wouldn't workbecause it can be pumped <strong>and</strong> stillsatisfy the PL.MORALChoose your strings wisely.4


Example 5"<strong>Pumping</strong> down"• L = {0 i 1 j | i > j}• Given n, choose s = 0 n+1 1 n.• Split into xyz… etc.• Because by the PL|xy|≤ n, y consistsonly <strong>of</strong> 0's.• Is xyyz in L?• The PL states that xy i z is in L even wheni = 0• So, consider the string xy 0 z– Removing string y decreases the number <strong>of</strong>0's in s– s has only one more 0 than 1– Therefore, xz cannot have more 0's than 1's,<strong>and</strong> is not a member <strong>of</strong> L.• Contradiction!The <strong>Pumping</strong> <strong>Lemma</strong> PoemAny <strong>regular</strong> language L has a magic number pAnd any long-enough word in L has the following property:Amongst its first p symbols is a segment you can findWhose repetition or omission leaves x amongst its kind.So if you find a language L which fails this acid test,And some long word you pump becomes distinct from all the rest,By contradiction you have shown that language L is notA <strong>regular</strong> guy, resilient to the damage you have wrought.But if, upon the other h<strong>and</strong>, x stays within its L,Then either L is <strong>regular</strong>, or else you chose not well.For w is xyz, <strong>and</strong> y cannot be null,And y must come before p symbols have been read in full.As mathematical postscript, an addendum to the wise:The basic pro<strong>of</strong> we outlined here does certainly generalize.So there is a pumping lemma for all languages context-free,Although we do not have the same for those that are r.e.Remember• You need to find only one string forwhich the PL does not hold to prove alanguage is not <strong>regular</strong>• But you must show that for anydecomposition into xyz the PL holds– This sometimes means considering severaldifferent casesExample• L = {a 3 b i c i-3 | i > 3}• Assume L is <strong>regular</strong>, PL holds.• Choose w = a 3 b n c n-3 with n as the PL constant.• Three ways to partition w into xyz:1. y contains only a’s2. y contains only b’s3. y contains a’s <strong>and</strong> b’s• Have to show that each <strong>of</strong> these partitions leadsto a contradiction, i.e. that there is no possibleway to divide w into xyz so that the PL holdsCase1: y contains only a’s• Then x contains 0 to 2 a’s, y contains 1 to 3 a’s,<strong>and</strong> z contains 0 to 2 a’s concatenated onto therest <strong>of</strong> the string b n c n-3 , such that there areexactly 3 a’s. So the partition isx = a k y = a j z = a 3-k-j b n c n-3where k ≥ 0, j > 0, <strong>and</strong> k+j ≤ 3.• It should be true that xy i z ∈ L for all i ≥ 0.• xy 2 z = (x)(y)(y)(z) = (a k )(a j )(a j )(a 3-j-k b n c n-3 ) =a 3+j b n c n-3 ∉ L since j > 0, there are too many a’s• CONTRADICTION!5


Case 2: y contains only b’s• Then x contains 3 a’s followed by 0 or more b’s,y contains 1 to n-3 b’s, <strong>and</strong> z contains 3 to n-3b’s concatenated onto the rest <strong>of</strong> the string c n-3 .So the partition isx = a 3 b k y = b j z = b n-k-j c n-3where k ≥ 0, j > 0, <strong>and</strong> k+j ≤ n-3.• It should be true that xy i z ∈ L for all i ≥ 0.• xy 0 z = a 3 b n-j c n-3 ∉ L since j > 0, there are too fewb’s• CONTRADICTION!Case 3: y contains a’s <strong>and</strong> b’s• Then x contains 0 to 2 a’s, y contains 1 to 3 a’s<strong>and</strong> 1 to n-3 b’s, <strong>and</strong> z contains 3 to n-1 b’sconcatenated onto the rest <strong>of</strong> the string c n-3 . Sothe partition isx = a 3-k y = a k b j z = b n-j c n-3where 3 ≥ k ≥ 0, <strong>and</strong> n-3 ≥ j > 0.• It should be true that xy i z ∈ L for all i ≥ 0.• xy 2 z =a 3 b j a k b n c n-3 ∉ L since j, k > 0, <strong>and</strong> there areb’s before a’s• There is no partition <strong>of</strong> w• L is not <strong>regular</strong>!Closure Properties• Certain operations on <strong>regular</strong> languagesare guaranteed to produce <strong>regular</strong>languages.– Example: the union <strong>of</strong> <strong>regular</strong> languages is<strong>regular</strong>; start with RE's, <strong>and</strong> apply + to getan RE for the union.Substitution• Take a <strong>regular</strong> language L over some alphabetΣ.• For each a in Σ, let L a be a <strong>regular</strong> language.• Let s be the substitution defined by s(a) = L a foreach a.– Extend s to strings by s(a 1 a 2 … a n ) = s(a 1 )s(a 2 ) … s(a n );i.e., concatenate the languages L a1L a2...L an.– Extend s to languages by s(M) =∪ w in M s(w).• Then s(L) is <strong>regular</strong>.Pro<strong>of</strong> That Substitution <strong>of</strong> RegularLanguages Into a Regular Language isRegular• Let R be a <strong>regular</strong> expression for language L.• Let R a be a <strong>regular</strong> expression for language s(a)= L a , for all symbols a in Σ.• Construct a RE E for s(L) by starting with R<strong>and</strong> replacing each symbol a by the RE L a .• Pro<strong>of</strong> that L(E) = s(L) is an induction on theheight <strong>of</strong> (the expression tree for) RE R.• Basis: R is a single symbol, a. Then E = R a , L ={a}, <strong>and</strong> s(L) = s({a}) = L(R a ).• Cases where R is ε or Φ are easy.• Induction: There are three cases, depending onwhether R = R 1 + R 2 , R = R 1 R 2 , or R = R 1 *.• We'll do only R = R 1 R 2 .• L = L 1 L 2 , where L 1 = L(R 1 ) <strong>and</strong> L 2 = L(R 2 ).• Let E 1 be R 1 , with each a replaced by R a ; samefor E 2 .• By the IH, L(E 1 ) = s(L 1 ) <strong>and</strong> L(E 2 ) = s(L 2 ).• Thus, L(E) = s(L 1 )s(L 2 ) = s(L).6


Applications <strong>of</strong> the SubstitutionTheorem• If L 1 <strong>and</strong> L 2 are <strong>regular</strong>, so is L 1 L 2 .– Let s(a) = L 1 <strong>and</strong> s(b) = L 2 . Substitute into the<strong>regular</strong> language {ab}.• So is L 1 ∪ L 2 .– Substitute into {a, b}.• Ditto L*– Substitute into L(a*).• Closure under homomorphism = substitution<strong>of</strong> one string for each symbol.– Special case <strong>of</strong> a substitution.• Example: Homomorphism– Let L = L(0*1*), <strong>and</strong> let h be a homomorphismdefined by h(0) = aa <strong>and</strong> h(1) = ε.• Then h(L) = (L(aa)*) all strings with an evennumber <strong>of</strong> a's• Closure Under Inverse Homomorphism– h -1 (L)={w | h(w) is in L}– See the argument in the book. Briefly:• Given homomorphism h <strong>and</strong> <strong>regular</strong> language L,start with a DFA A for L.• Construct DFA B for h -1 (L) by having B go fromstate q to state p on input a if ˆδ A(q,h(a)) = p.Closure Under Reversal• The reverse <strong>of</strong> a string w = a 1 a 2 . . . a n isa n . . . a 2 a 1 .– Denoted w R .– Note ε R = ε.• The reverse <strong>of</strong> a language L is the setcontaining the reverse <strong>of</strong> each string inL.• If L is <strong>regular</strong>, so is L R .– Pro<strong>of</strong>: use RE's, recursive reversal as in thebook.Decision Properties <strong>of</strong> Regular Languages• Given a (representation, e.g., RE, FA) <strong>of</strong> a <strong>regular</strong>language L, what can we tell about L?– Since there are algorithms to convert between any tworepresentations, we can choose the representation thatmakes the test easiest.• Membership– Is string w in <strong>regular</strong> language L?• Choose DFA representation for L.• Simulate the DFA on input w.• Emptiness– Is L = Φ?• Use DFA representation.• Use a graph-reachability algorithm to test if at least oneaccepting state is reachable from the start state.Finiteness• Is L a finite language?– Note every finite language is <strong>regular</strong> (why?), but a<strong>regular</strong> language is not necessarily finite.• DFA method:– Given a DFA for L, eliminate all states that are notreachable from the start state <strong>and</strong> all states that donot reach an accepting state.– Test if there are any cycles in the remaining DFA;if so, L is infinite, if not, then L is finite.RE method:• Almost, we can look for a * in the RE <strong>and</strong> say itslanguage is infinite if there is one, finite if not. However,there are exceptions, e.g. 0ε*1 or 0*Φ. Thus:1. Find sub-expressions equivalent to Φ by:• (Basis) Φ is; ε <strong>and</strong> a are not.• (Induction) E+F is iff both E <strong>and</strong> F are; EF is ifeither E or F is; E* never is.2. Eliminate sub-expressions equivalent to Φ by:• Replace E + F or F + E by F whenever E is <strong>and</strong> Fisn't.• Replace E* by ε whenever E is equivalent to Φ.7


3. Now, find sub-expressions that areequivalent to ε by:– (Basis) ε is; a isn't.– (Induction) E+F is iff both E <strong>and</strong> F are; ditto EF; E*is iff E is.4. Now, we can tell if L(R) is infinite by lookingfor a sub-expression E* such that E is notequivalent to ε.Example• Consider (0 + 1Φ)* + 1Φ*.– Step 1: Φ (twice) <strong>and</strong> 1Φ are sub-expressionsequivalent to Φ.– Step 2: Remove these sub-expressions• Replace (0 + 1Φ) by 0• Replace Φ* by ε• 0* + 1ε remains.– Step 3: only sub-expression ε is equivalent to ε;left with 0* + 1.– Since 0 is starred, language is infinite.Minimization <strong>of</strong> States• Real goal is testing equivalence <strong>of</strong>(representations <strong>of</strong>) two <strong>regular</strong>languages.• Interesting fact: DFA's have unique(up to state names) minimum-stateequivalents.Distinguishable States• Key idea: find states p <strong>and</strong> q that aredistinguishable because there is some input wthat takes exactly one <strong>of</strong> p <strong>and</strong> q to an acceptingstate.• Basis: any non-accepting state is distinguishablefrom any accepting state (w = ε).• Induction: p <strong>and</strong> q are distinguishable if there issome input symbol a such that δ(p, a) isdistinguishable from δ(q, a).– All other pairs <strong>of</strong> states are indistinguishable, <strong>and</strong>can be merged into one state.Example (very simple)Consider:0q0p1 11r0Can we distinguish q from r?• No string beginning with 0 works.• both states go to p, <strong>and</strong> therefore any string <strong>of</strong> the form0x takes q <strong>and</strong> r to the same state.• No string beginning with 1 works.• Starting in either q or r, as long as we have input 1, weare in one <strong>of</strong> the accepting states. When a 0 is read, wego to the same state (p) <strong>and</strong> then regardless <strong>of</strong> input,the same states forever after.0q• p is distinguishable from q <strong>and</strong> r by basisp011 1r08


Constructing the Minimum-State DFA• For each group <strong>of</strong> indistinguishable states,pick a "representative."– Note a group can be large, e.g., q 1 , q 2 ,…, q k , if allpairs are indistinguishable.– Indistinguishability is transitive (why?) soindistinguishability partitions states.• If p is a representative, <strong>and</strong> δ(p, a) = q, inminimum-state DFA the transition from p ona is the representative <strong>of</strong> q's group (or to qitself, if q is either alone in a group or arepresentative).• Start state is representative <strong>of</strong> theoriginal start state.• Accepting states are representatives <strong>of</strong>groups <strong>of</strong> accepting states.– Notice we could not have a "mixed"(accepting + non-accepting) group (why?).• Delete any state that is not reachablefrom the start state.Example• For the DFA given earlier, p is in a group byitself; {q, r} is the other group.0q0p1 11r00pq0,11Why Above Minimization Can't Be Beaten• Suppose we have a DFA A, <strong>and</strong> we minimize it toconstruct a DFA M. But there is another DFA Nthat accepts the same language as A <strong>and</strong> M, yet hasfewer states than M.• Pro<strong>of</strong> by contradiction that this can't happen:– Run the state-distinguishability process on the states <strong>of</strong>M <strong>and</strong> N together.– Start states <strong>of</strong> M <strong>and</strong> N are indistinguishable becauseL(M) = L(N).– If {p, q} are indistinguishable, then their successors onany one input symbol are also indistinguishable.– Thus, since neither M nor N could have an inaccessiblestate, every state <strong>of</strong> M is indistinguishable from at leastone state <strong>of</strong> N.– Since N has fewer states than M, there are twostates <strong>of</strong> M that are indistinguishable from thesame state <strong>of</strong> N, <strong>and</strong> therefore indistinguishablefrom each other.– But M was designed so that all its states aredistinguishable from each other.– We have a contradiction, so the assumption that Nexists is wrong, <strong>and</strong> M in fact has as few states asany equivalent DFA for A.– In fact (stronger), there must be a 1-1correspondence between the states <strong>of</strong> any otherminimum-state N <strong>and</strong> the DFA M, showing thatthe minimum-state DFA for A is unique up torenaming <strong>of</strong> the states.End <strong>of</strong> Part I• So ends the first part <strong>of</strong> the course:Regular Languages <strong>and</strong> FiniteAutomata• Exam I (Oct. 8, 7pm in this room) coversmaterial up to here• Coming next week: Context-FreeLanguages <strong>and</strong> Push-Down Automata(stay tuned for more exciting adventures…)9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!