13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

19. Parsing and Recognition 73<br />

Z → UXVX, we eliminate the first rule (X → ε); furthermore, we add all rules<br />

obtained by replacing any number of occurrences of X on the right by the empty<br />

string. Thus, we add the rules Z → UVX, Z → UXV and Z → UV. (Since other<br />

rules may have X on the left, it is not advisable <strong>to</strong> replace all occurrences of X<br />

uniformly.) We do this for all such rules. The resulting grammar generates the<br />

same set of strings, with the same set of constituents, excluding occurrences of<br />

the empty string. Now we are still left with unary rules, for example, the rule<br />

X → Y. Let ρ be a rule having Y on the left. We add the rule obtained by replacing<br />

Y on the left by X. For example, let Y → UVX be a rule. Then we add the rule<br />

X → UVX. We do this for all rules of the grammar. Then we remove X → Y.<br />

These two steps remove the rules that do not expand the length of a string. We<br />

can express this formally as follows. If ρ = X → ⃗α is a rule, we call |⃗α| − 1 the<br />

productivity of ρ, and denote it by p(ρ). Clearly, p(ρ) ≥ −1. If p(ρ) = −1 then<br />

⃗α = ε, and if p(ρ) = 0 then we have a rule of the form X → Y. In all other cases,<br />

p(ρ) > 0 and we call ρ productive.<br />

Now, if ⃗η is obtained in one step from ⃗γ by use of ρ, then |⃗η| = |⃗γ| + p(ρ).<br />

Hence |⃗η| > |⃗γ| if p(ρ) > 0, that is, if ρ is productive. So, if the grammar only<br />

contains productive rules, each step in a derivation increases the length of the<br />

string, unless it replaces a nonterminal by a terminal. It follows that a string of<br />

length n has derivations of length 2n−1 at most. Here is now a very simple minded<br />

strategy <strong>to</strong> find out whether a string is in the language of the grammar (and <strong>to</strong> find<br />

a derivation if it is): let ⃗x be given, of length n. Enumerate all derivations of<br />

length < 2n and look at the last member of the derivation. If ⃗x is found once,<br />

it is in the language; otherwise not. It is not hard <strong>to</strong> see that this algorithm is<br />

exponential. We shall see later that there are far better algorithms, which are<br />

polynomial of order 3. Before we do so, let us note, however, that there are<br />

strings which have exponentially many different constituents, so that the task of<br />

enumerating the derivations is exponential. However, it still is the case that we<br />

can represent them is a very concise way, and this again takes only exponential<br />

time.<br />

The idea <strong>to</strong> the algorithm is surprisingly simple. Start with the string ⃗x. Scan<br />

the string for a substring ⃗y which occurs <strong>to</strong> the right of a rule ρ = X → ⃗y. Then<br />

write down all occurrences C = 〈⃗u,⃗v〉 (which we now represent by pairs of positions<br />

— see above) of ⃗y and declare them constituents of category X. There is an

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!