13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

18. Context Free Grammars 67<br />

18 Context Free Grammars<br />

Definition 22 Let A be as usual an alphabet. Let N be a set of symbols, N disjoint<br />

with A, Σ ⊆ N. A context free rule over N as the set of nonterminal symbols and<br />

A the set of terminal symbols is a pair 〈X, ⃗α〉, where X ∈ N and ⃗α is a string over<br />

N ∪ A. We write X → ⃗α for the rule. A context free grammar is a quadruple<br />

〈A, N, Σ, R〉, where R is a finite set of context free rules. The set Σ is called the set<br />

of start symbols<br />

Often it is assumed that a context free grammar has just one start symbol, but in<br />

actual practice this is mostly not observed. Notice that context free rules allow<br />

the use of nonterminals and terminals alike. The implicit convention we use is as<br />

follows. A string of terminals is denoted by a Roman letter with a vec<strong>to</strong>r arrow,<br />

a mixed string containing both terminals and nonterminals is denoted by a Greek<br />

letter with a vec<strong>to</strong>r arrow. (The vec<strong>to</strong>r is generally used <strong>to</strong> denote strings.) If the<br />

difference is not important, we shall use Roman letters.<br />

The following are examples of context free rules. The conventions that apply<br />

here are as follows. Symbols in print are terminal symbols; they are denoted using<br />

typewriter font. Nonterminal symbols are produced by enclosing an arbitrary<br />

string in brackets: < · · · >.<br />

(174)<br />

< digit >→ 0<br />

< digit >→ 1<br />

< number >→< digit ><br />

< number >→< number >< digit ><br />

There are some conventions that apply <strong>to</strong> display the rules in a more compact<br />

form. The vertical slash ‘|’ is used <strong>to</strong> merge two rules with the same left hand<br />

symbol. So, when X → ⃗α and X → ⃗γ are rules, you can write X → ⃗α | ⃗γ. Notice<br />

that one speaks of one rule in the latter case, but technically we have two rules<br />

now. This allows us <strong>to</strong> write as follows.<br />

(175)<br />

< digit >→ 0 | 1 | 2 | . . . | 9<br />

And this stands for ten rules. Concatenation is not written. It is unders<strong>to</strong>od that<br />

symbols that follow each other are concatenated the way they are written down.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!