Introduction to Computational Linguistics

More documents

Recommendations

Info

19. Parsing and Recognition 72 The trees used in linguistic analysis are often ordered. The ordering is here implicitly represented in the string. Let C = 〈⃗γ 1 , ⃗γ 2 〉 be an occurrences of ⃗σ and D = 〈⃗η 1 , ⃗η 2 〉 be an occurrence of τ. We write C ⊏ D and say that C precedes D if ⃗γ 1 ⃗σ is a prefix of ⃗η 1 (the prefix need not be proper). If one underlines C and D this definition amounts to saying that the line of C ends before the line of D starts. (190) abccddx Here C = 〈a, cddx〉 and D = 〈abcc, x〉. 19 Parsing and Recognition Given a grammar G, and a string ⃗x, we ask the following questions: • (Recognition:) Is ⃗x ∈ L(G)? • (Parsing:) What derivation(s) does ⃗x have? Obviously, as the derivations give information about the meaning associated with an expression, the problem of recognition is generally not of interest. Still, sometimes it is useful to first solve the recognition task, and then the parsing task. For if the string is not in the language it is unnecessary to look for derivations. The parsing problem for context free languages is actually not the one we are interested in: what we really want is only to know which constituent structures are associated with a given string. This vastly reduces the problem, but still the remaining problem may be very complex. Let us see how. Now, in general a given string can have any number of derivations, even infinitely many. Consider by way of example the grammar (191) A → A | a It can be shown that if the grammar has no unary rules and nor rules of the form X → ε then a given string ⃗x has an exponential number of derivations. We shall show that it is possible to eliminate these rules (this reduction is not semantically innocent!). Given a rule X → ε and a rule that contains X on the right, say
19. Parsing and Recognition 73 Z → UXVX, we eliminate the first rule (X → ε); furthermore, we add all rules obtained by replacing any number of occurrences of X on the right by the empty string. Thus, we add the rules Z → UVX, Z → UXV and Z → UV. (Since other rules may have X on the left, it is not advisable to replace all occurrences of X uniformly.) We do this for all such rules. The resulting grammar generates the same set of strings, with the same set of constituents, excluding occurrences of the empty string. Now we are still left with unary rules, for example, the rule X → Y. Let ρ be a rule having Y on the left. We add the rule obtained by replacing Y on the left by X. For example, let Y → UVX be a rule. Then we add the rule X → UVX. We do this for all rules of the grammar. Then we remove X → Y. These two steps remove the rules that do not expand the length of a string. We can express this formally as follows. If ρ = X → ⃗α is a rule, we call |⃗α| − 1 the productivity of ρ, and denote it by p(ρ). Clearly, p(ρ) ≥ −1. If p(ρ) = −1 then ⃗α = ε, and if p(ρ) = 0 then we have a rule of the form X → Y. In all other cases, p(ρ) > 0 and we call ρ productive. Now, if ⃗η is obtained in one step from ⃗γ by use of ρ, then |⃗η| = |⃗γ| + p(ρ). Hence |⃗η| > |⃗γ| if p(ρ) > 0, that is, if ρ is productive. So, if the grammar only contains productive rules, each step in a derivation increases the length of the string, unless it replaces a nonterminal by a terminal. It follows that a string of length n has derivations of length 2n−1 at most. Here is now a very simple minded strategy to find out whether a string is in the language of the grammar (and to find a derivation if it is): let ⃗x be given, of length n. Enumerate all derivations of length < 2n and look at the last member of the derivation. If ⃗x is found once, it is in the language; otherwise not. It is not hard to see that this algorithm is exponential. We shall see later that there are far better algorithms, which are polynomial of order 3. Before we do so, let us note, however, that there are strings which have exponentially many different constituents, so that the task of enumerating the derivations is exponential. However, it still is the case that we can represent them is a very concise way, and this again takes only exponential time. The idea to the algorithm is surprisingly simple. Start with the string ⃗x. Scan the string for a substring ⃗y which occurs to the right of a rule ρ = X → ⃗y. Then write down all occurrences C = 〈⃗u,⃗v〉 (which we now represent by pairs of positions — see above) of ⃗y and declare them constituents of category X. There is an
Page 1 and 2:
Introduction to Computational Lingu
Page 3 and 4:
2. Practical Remarks Concerning OCa
Page 5 and 6:
3. Welcome To The Typed Universe 5
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
4. Function Definitions 11 4 Functi
Page 13 and 14:
5. Modules 13 can find out why the
Page 15 and 16:
5. Modules 15 let length l = length
Page 17 and 18:
6. Sets and Functors 17 write it do
Page 19 and 20:
7. Hash Tables 19 to actually see t
Page 21 and 22: 8. Combinators 21 can be used on an
Page 23 and 24: 9. Objects and Methods 23 of type
Page 25 and 26: 10. Characters, Strings and Regular
Page 31 and 32: 11. Interlude: Regular Expressions
Page 37 and 38: 12. Finite State Automata 37 Suppos
Page 39 and 40: 12. Finite State Automata 39 There
Page 41 and 42: 12. Finite State Automata 41 The pr
Page 43 and 44: 12. Finite State Automata 43 Proof.
Page 45 and 46: 13. Complexity and Minimal Automata
Page 51 and 52: 14. Digression: Time Complexity 51
Page 53 and 54: 14. Digression: Time Complexity 53
Page 55 and 56: 15. Finite State Transducers 55 Con
Page 57 and 58: 15. Finite State Transducers 57 And
Page 59 and 60: 16. Finite State Morphology 59 Then
Page 61 and 62: 17. Using Finite State Transducers
Page 67 and 68: 18. Context Free Grammars 67 18 Con
Page 69 and 70: 18. Context Free Grammars 69 symbol
Page 71: 18. Context Free Grammars 71 which
Page 75 and 76: 19. Parsing and Recognition 75 numb
Page 77 and 78: 20. Greibach Normal Form 77 The red
Page 79 and 80: 20. Greibach Normal Form 79 Definit
Page 81 and 82: 20. Greibach Normal Form 81 Proposi
Page 83 and 84: 20. Greibach Normal Form 83 nonterm
Page 85 and 86: 21. Pushdown Automata 85 pushdown.
Page 87 and 88: 21. Pushdown Automata 87 machine is
Page 89 and 90: 22. Shift-Reduce-Parsing 89 differe
Page 91 and 92: 23. Some Metatheorems 91 If the loo
Page 93 and 94: 23. Some Metatheorems 93 This proof
Page 95 and 96: 23. Some Metatheorems 95 a decompos
show all

Introduction to Computational Linguistics

Create successful ePaper yourself

Delete template?

Save as template?