13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

18. Context Free Grammars 71<br />

which is of category B. In the last step we get that the entire string is of category<br />

A under this derivation. Thus, each step in a derivation adds a constituent <strong>to</strong> the<br />

analysis of the final string. The structure that we get is as follows. There is a<br />

string ⃗α and a set Γ of occurrences of substrings of ⃗α <strong>to</strong>gether with a map f that<br />

assigns <strong>to</strong> each member of Γ a nonterminal (that is, an element of N). We call the<br />

members of Γ constituents and f (C) the category of C, C ∈ Γ.<br />

By comparison, the derivation (??) imposes a different constituent structure<br />

on the string. The different is that it is not (??) that is a constituent but rather<br />

(187) bcbbca<br />

It is however not true that the constituents identify the derivation uniquely. For<br />

example, the following is a derivation that gives the same constituents as (??).<br />

(188) A, BA, BBA, BBBA, BBBa, BBbca, bBbca, bcbbca<br />

The difference between (??) and (??) are regarded inessential, however. Basically,<br />

only the constituent structure is really important, because it may give rise <strong>to</strong> different<br />

meanings, while different derivations that yield the same structure will give<br />

the same meaning.<br />

The constituent structure is displayed by means of trees. Recall the definition<br />

of a tree. It is a pair 〈T, z. We say that x dominates<br />

y if x > y; and that it immediately dominates y if it dominates y but there is no z<br />

such that x dominates z and z dominates y.<br />

Now, let us return <strong>to</strong> the constituent structure. Let C = 〈⃗γ 1 , ⃗γ 2 〉 and D =<br />

〈⃗η 1 , ⃗η 2 〉 be occurrences of substrings. We say that C is dominated by D, in symbols<br />

C ≺ D, if C D and (1) ⃗η 1 is a prefix of ⃗γ 1 and (2) ⃗η 2 is a suffix of ⃗γ2. (It<br />

may happen that ⃗η 1 = ⃗γ 1 or that ⃗η 2 = ⃗γ 2 , but not both.) Visually, what the definition<br />

amounts <strong>to</strong> is that if one underlines the substring of C and the substring of D<br />

then the latter line includes everything that the former underlines. For example,<br />

let C = 〈abc, dx〉 and 〈a, x〉. Then C ≺ D, as they are different and (1) and (2) are<br />

satisfied. Visually this is what we get.<br />

(189) abccddx<br />

Now, let the tree be defined as follows. The set of nodes is the set Γ. The relation<br />

< is defined by ≺.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!