13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

18. Context Free Grammars 68<br />

A context free grammar can be seen as a statement about sentential structure,<br />

or as a device <strong>to</strong> generate sentences. We begin with the second viewpoint. We<br />

write ⃗α ⇒ ⃗γ and say that ⃗γ is derived from ⃗α in one step if there is a rule X → ⃗η,<br />

a single occurrence of X in ⃗α such that ⃗γ is obtained by replacing that occurrence<br />

of X in ⃗α by ⃗η:<br />

(176) ⃗α = ⃗σ ⌢ X ⌢ ⃗τ, ⃗γ = ⃗σ ⌢ ⃗η ⌢ ⃗τ<br />

For example, if the rule is < number >→< number >< digit > then replacing<br />

the occurrence of < number > in the string 1 < digit >< number > 0 will give<br />

1 < digit >< number >< digit > 0. Now write ⃗α ⇒ n+1 ⃗γ if there is a ⃗ρ such that<br />

⃗α ⇒ n ⃗ρ and ⃗ρ ⇒ ⃗γ. We say that ⃗γ is derivable from ⃗α in n + 1 steps. Using the<br />

above grammar we have:<br />

(177) < number >⇒ 4 < digit >< digit >< digit >< number ><br />

Finally, ⃗α ⇒ ∗ ⃗γ (⃗γ is derivable from ⃗α) if there is an n such that ⃗α ⇒ n ⃗γ.<br />

Now, if X ⇒ ∗ ⃗γ we say that ⃗γ is a string of category X. The language generated<br />

by G consists of all terminal strings that have category S ∈ Σ. Equivalently,<br />

(178) L(G) := {⃗x ∈ A ∗ : there is S ∈ Σ : S ⇒ ∗ ⃗x}<br />

You may verify that the grammar above generates all strings of digits from the<br />

symbol < number >. If you take this as the start symbol, the language will consist<br />

in all strings of digits. If, however, you take the symbol < digit > as the start<br />

symbol then the language is just the set of all strings of length one. This is because<br />

even though the symbol < number > is present in the nonterminals and the rules,<br />

there is no way <strong>to</strong> generate it by applying the rules in succession from < digit >. If<br />

on the other hand, Σ contains both symbols then you get just the set of all strings,<br />

since a digit also is a number. (A fourth possibility is Σ = ∅, in which case the<br />

language is empty.)<br />

There is a trick <strong>to</strong> reduce the set of start symbols <strong>to</strong> just one. Introduce a new<br />

symbol S ∗ <strong>to</strong>gether with the rules<br />

(179) S ∗ → S<br />

for every S ∈ Σ. This is a different grammar but it derives the same strings.<br />

Notice that actual linguistics is different. In natural language having a set of start

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!