13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

18. Context Free Grammars 69<br />

symbols is very useful. There are several basic types of sentences (assertions,<br />

questions, commands, and so on). These represent different sets of strings. It real<br />

grammars (GPSG, HPSG, for example), one does not start from a single symbol<br />

but rather from different symbols, one for each class of saturated utterance. This<br />

is useful also for other reasons. It is noted that in answers <strong>to</strong> questions almost<br />

any constituent can function as a complete utterance. To account for that, one<br />

would need <strong>to</strong> enter a rule of the kind above for each constituent. But there is<br />

a sense in which these constituents are not proper sentences, they are just parts<br />

of constituents. Rather than having <strong>to</strong> write new rules <strong>to</strong> encompass the use of<br />

these constituents, we can just place the burden on the start set. So, in certain<br />

circumstances what shifts is the set of start symbols, the rule set however remains<br />

constant.<br />

A derivation is a full specification of the way in which a string has been generated.<br />

This may be given as follows. It is a sequence of strings, where each<br />

subsequent string is obtained by applying a rule <strong>to</strong> some occurrence of a nonterminal<br />

symbol. We need <strong>to</strong> specify in each step which occurrence of a nonterminal<br />

is targeted. We can do so by underlining it. For example, here is a grammar.<br />

(180) A → BA | AA, B → AB | BB, A → a, B → b | bc | cb<br />

Here is a derivation:<br />

(181) A, BA, BBA, BBBA, BBBa, BcbBa, bcbBa, bcbbca<br />

Notice that without underlining the symbol which is being replaced some information<br />

concerning the derivation is lost. In the second step we could have applied<br />

the rule A → BA instead, yielding the same result. Thus, the following also is a<br />

derivation:<br />

(182) A, BA, BBA, BBBA, BBBa, BcbBa, bcbBa, bcbbca<br />

(I leave it <strong>to</strong> you <strong>to</strong> figure out how one can identify the rule that has been applied.)<br />

This type of information is essential, however. To see this, let us talk about the<br />

second approach <strong>to</strong> CFGs, the analytic one. Inductively, on the basis of a derivation,<br />

we assign constituents <strong>to</strong> the strings. First, however, we need <strong>to</strong> fix some<br />

notation.<br />

Definition 23 Let ⃗x and ⃗y be strings. An occurrence of ⃗y in ⃗x is a pair 〈⃗u,⃗v〉 of<br />

strings such that ⃗x = ⃗u⃗y⃗v. We call ⃗u the left context of ⃗y in ⃗x and ⃗v the right<br />

context.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!