Introduction to Computational Linguistics
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
19. Parsing and Recognition 75<br />
number ν of nonterminals <strong>to</strong> the right, this takes O(n ν−1 ) steps. There are O(n 2 )<br />
many steps. Thus, in <strong>to</strong>tal we have O(n ν+1 ) many steps <strong>to</strong> compute, each of which<br />
consumes time proportional <strong>to</strong> the size of the grammar. The best we can hope for<br />
is <strong>to</strong> have ν = 2 in which case this algorithm needs time O(n 3 ). In fact, this can<br />
always be achieved. Here is how. Replace the rule X → AbuXV be the following<br />
set of rules.<br />
(194) X → AbuY, Y → XV.<br />
Here, Y is assumed not <strong>to</strong> occur in the set of nonterminals. We remark without<br />
proof that the new grammar generates the same language. In general, a rule of the<br />
form X → Y 1 Y 2 . . . Y n is replaced by<br />
(195) X → Y 1 Z 1 , Z 1 → Y 2 Z 2 , . . . , Z n−2 → Y n−1 Y n<br />
So, given this we can recognize the language in O(n 3 ) time!<br />
Now, given this algorithm, can we actually find a derivation or a constituent<br />
structure for the string in question? This can be done: start with m(0, n). It contains<br />
a S ∈ Σ. Pick one of them. Now choose i such that 0 < i < n and look up the<br />
entries m(0, i) and m(i, n − i). If they contain A and B, respectively, and if S → AB<br />
is a rule, then begin the derivation as follows:<br />
(196) S , AB<br />
Now underline one of A or B and continue with them in the same way. This is<br />
a downward search in the matrix. Each step requires linear time, since we only<br />
have <strong>to</strong> choose a point <strong>to</strong> split up the constituent. The derivation is linear in the<br />
length of the string. Hence the overall time is quadratic! Thus, surprisingly, the<br />
recognition consumes most of the work. Once that is done, parsing is easy.<br />
Notice that the grammar transformation adds new constituents. In the case<br />
of the rule above we have added a new nonterminal Y and added the rules (??).<br />
However, it is easy <strong>to</strong> return <strong>to</strong> the original grammar: just remove all constituents<br />
of category Y. It is perhaps worth examining why adding the constituents saves<br />
time in parsing. The reason is simply that the task of identifying constituents<br />
occurs over and over again. The fact that a sequence of two constituents has been<br />
identified is knowledge that can save time later on when we have <strong>to</strong> find such a<br />
sequence. But in the original grammar there is no way of remembering that we<br />
did have it. Instead, the new grammar provides us with a constituent <strong>to</strong> handle the<br />
sequence.