Introduction to Computational Linguistics
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
18. Context Free Grammars 67<br />
18 Context Free Grammars<br />
Definition 22 Let A be as usual an alphabet. Let N be a set of symbols, N disjoint<br />
with A, Σ ⊆ N. A context free rule over N as the set of nonterminal symbols and<br />
A the set of terminal symbols is a pair 〈X, ⃗α〉, where X ∈ N and ⃗α is a string over<br />
N ∪ A. We write X → ⃗α for the rule. A context free grammar is a quadruple<br />
〈A, N, Σ, R〉, where R is a finite set of context free rules. The set Σ is called the set<br />
of start symbols<br />
Often it is assumed that a context free grammar has just one start symbol, but in<br />
actual practice this is mostly not observed. Notice that context free rules allow<br />
the use of nonterminals and terminals alike. The implicit convention we use is as<br />
follows. A string of terminals is denoted by a Roman letter with a vec<strong>to</strong>r arrow,<br />
a mixed string containing both terminals and nonterminals is denoted by a Greek<br />
letter with a vec<strong>to</strong>r arrow. (The vec<strong>to</strong>r is generally used <strong>to</strong> denote strings.) If the<br />
difference is not important, we shall use Roman letters.<br />
The following are examples of context free rules. The conventions that apply<br />
here are as follows. Symbols in print are terminal symbols; they are denoted using<br />
typewriter font. Nonterminal symbols are produced by enclosing an arbitrary<br />
string in brackets: < · · · >.<br />
(174)<br />
< digit >→ 0<br />
< digit >→ 1<br />
< number >→< digit ><br />
< number >→< number >< digit ><br />
There are some conventions that apply <strong>to</strong> display the rules in a more compact<br />
form. The vertical slash ‘|’ is used <strong>to</strong> merge two rules with the same left hand<br />
symbol. So, when X → ⃗α and X → ⃗γ are rules, you can write X → ⃗α | ⃗γ. Notice<br />
that one speaks of one rule in the latter case, but technically we have two rules<br />
now. This allows us <strong>to</strong> write as follows.<br />
(175)<br />
< digit >→ 0 | 1 | 2 | . . . | 9<br />
And this stands for ten rules. Concatenation is not written. It is unders<strong>to</strong>od that<br />
symbols that follow each other are concatenated the way they are written down.