Introduction to Computational Linguistics
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
18. Context Free Grammars 69<br />
symbols is very useful. There are several basic types of sentences (assertions,<br />
questions, commands, and so on). These represent different sets of strings. It real<br />
grammars (GPSG, HPSG, for example), one does not start from a single symbol<br />
but rather from different symbols, one for each class of saturated utterance. This<br />
is useful also for other reasons. It is noted that in answers <strong>to</strong> questions almost<br />
any constituent can function as a complete utterance. To account for that, one<br />
would need <strong>to</strong> enter a rule of the kind above for each constituent. But there is<br />
a sense in which these constituents are not proper sentences, they are just parts<br />
of constituents. Rather than having <strong>to</strong> write new rules <strong>to</strong> encompass the use of<br />
these constituents, we can just place the burden on the start set. So, in certain<br />
circumstances what shifts is the set of start symbols, the rule set however remains<br />
constant.<br />
A derivation is a full specification of the way in which a string has been generated.<br />
This may be given as follows. It is a sequence of strings, where each<br />
subsequent string is obtained by applying a rule <strong>to</strong> some occurrence of a nonterminal<br />
symbol. We need <strong>to</strong> specify in each step which occurrence of a nonterminal<br />
is targeted. We can do so by underlining it. For example, here is a grammar.<br />
(180) A → BA | AA, B → AB | BB, A → a, B → b | bc | cb<br />
Here is a derivation:<br />
(181) A, BA, BBA, BBBA, BBBa, BcbBa, bcbBa, bcbbca<br />
Notice that without underlining the symbol which is being replaced some information<br />
concerning the derivation is lost. In the second step we could have applied<br />
the rule A → BA instead, yielding the same result. Thus, the following also is a<br />
derivation:<br />
(182) A, BA, BBA, BBBA, BBBa, BcbBa, bcbBa, bcbbca<br />
(I leave it <strong>to</strong> you <strong>to</strong> figure out how one can identify the rule that has been applied.)<br />
This type of information is essential, however. To see this, let us talk about the<br />
second approach <strong>to</strong> CFGs, the analytic one. Inductively, on the basis of a derivation,<br />
we assign constituents <strong>to</strong> the strings. First, however, we need <strong>to</strong> fix some<br />
notation.<br />
Definition 23 Let ⃗x and ⃗y be strings. An occurrence of ⃗y in ⃗x is a pair 〈⃗u,⃗v〉 of<br />
strings such that ⃗x = ⃗u⃗y⃗v. We call ⃗u the left context of ⃗y in ⃗x and ⃗v the right<br />
context.