20.07.2014 Views

The lexicon in syntax - Computational Linguistics and Spoken ...

The lexicon in syntax - Computational Linguistics and Spoken ...

The lexicon in syntax - Computational Linguistics and Spoken ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2008-11-17<br />

How to write a grammar<br />

<strong>The</strong> <strong>lexicon</strong> <strong>in</strong> <strong>syntax</strong><br />

Thorsten Trippel<br />

Fakultät für L<strong>in</strong>guistik und Literaturwissenschaft<br />

Universität Bielefeld<br />

thorsten.trippel@uni-bielefeld.de<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

1<br />

How to write a grammar WS 2008-2009


Agenda<br />

● Heads <strong>and</strong> dependents<br />

● Adjuncts <strong>and</strong> complements<br />

● Some simple classes of grammars<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

2<br />

How to write a grammar WS 2008-2009


Heads <strong>and</strong> dependents<br />

•<br />

Phrase ≔ part of a clause cont<strong>in</strong>u<strong>in</strong>g a head<br />

•<br />

Head ≔ most important word <strong>in</strong> a phrase<br />

•<br />

Wordclass of head determ<strong>in</strong>es wordclass of phrase<br />

•<br />

Bears crucial semantic <strong>in</strong>formation<br />

•<br />

Has the same distribution as the entire phrase<br />

•<br />

Can (usually) not be omitted<br />

•<br />

Sometimes requires (a) complement(s)<br />

•<br />

Dependent part of the phrase which are not the<br />

≔<br />

head<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

3<br />

How to write a grammar WS 2008-2009


Generalization<br />

•<br />

Subject ≔the subject of a cluase is a phrase of<br />

one word or more which is headed by a noun<br />

(NP)<br />

•<br />

Predicate ≔ normally a VP; the VP can conta<strong>in</strong><br />

dependents<br />

•<br />

Heads <strong>in</strong>fluence<br />

•<br />

Determ<strong>in</strong>e dependents (which word classes are possible)<br />

•<br />

Agreement of dependents (gender, number, …)<br />

•<br />

Case determ<strong>in</strong>ation<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

4<br />

How to write a grammar WS 2008-2009


Adjuncts vs. complements<br />

Adjuncts ≔ optional additions to a head provid<strong>in</strong>g<br />

extra <strong>in</strong>formation<br />

Complements ≔ often obligatory phrases selected<br />

by the head<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

5<br />

How to write a grammar WS 2008-2009


Adjunct or complement? (see Tallerman, p. 104)<br />

•<br />

Optional vs. obligatory phrase<br />

•<br />

Adjuncts: always optional<br />

•<br />

Complements: often obligatory, esp. with prepositiosn,<br />

verbs; close relation with the head; selected by the head<br />

•<br />

Limited vs. unlimited number of dependent phrases<br />

•<br />

Possibly unlimited number of adjuncts<br />

•<br />

Strictly limited number of complements<br />

•<br />

Properties of PP dependents<br />

•<br />

Adjuncts: wide range of head prepostions<br />

•<br />

Complements: specific head prepositions<br />

•<br />

Decision impossible based on POS<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

6<br />

How to write a grammar WS 2008-2009


Head <strong>in</strong>itial or head f<strong>in</strong>al languages<br />

•<br />

Typological dist<strong>in</strong>ction<br />

•<br />

English: head <strong>in</strong>itial<br />

•<br />

In VP: NP complements after the V<br />

•<br />

In PP: NP after the P<br />

•<br />

How about NPs?<br />

•<br />

Head f<strong>in</strong>al languages:<br />

•<br />

Turkish<br />

•<br />

Japanese<br />

•<br />

...<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

7<br />

How to write a grammar WS 2008-2009


Ways of writ<strong>in</strong>g a grammar<br />

•<br />

“'ve been wait<strong>in</strong>g for this for ages...”<br />

•<br />

“Sure, how do you write a book”<br />

•<br />

“Is there more to grammar than 'It is correct to<br />

never split an <strong>in</strong>f<strong>in</strong>itive'?”<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

8<br />

How to write a grammar WS 2008-2009


What a grammar is supposed to<br />

provide:<br />

A full set of items describ<strong>in</strong>g all possible correct<br />

constructs of a language. (completeness)<br />

All other constructs should be classified as <strong>in</strong>correct<br />

by the grammar. (soundness)<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

9<br />

How to write a grammar WS 2008-2009


<strong>The</strong> simplest possible grammar: lists<br />

● No formalism<br />

● Easy to create<br />

● List all acceptable sentences:<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

Language loses lots of <strong>in</strong>terest<strong>in</strong>g structures.<br />

Language loses lots <strong>and</strong> lots of <strong>in</strong>terest<strong>in</strong>g structures.<br />

Language loses lots <strong>and</strong> lots <strong>and</strong> lots of <strong>in</strong>terest<strong>in</strong>g structures.<br />

Some teachers try to get rid of more students.<br />

Some teachers try to get rid of more <strong>and</strong> more students.<br />

Some teachers try to get rid of more <strong>and</strong> more <strong>and</strong> more students.<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

10<br />

How to write a grammar WS 2008-2009


Imag<strong>in</strong>e....<br />

● ... a language of the "teacher" k<strong>in</strong>d<br />

● ... every other sentence some <strong>and</strong> teacher are reversed, i.e.:<br />

●<br />

●<br />

●<br />

●<br />

●<br />

Some teachers try to get rid of more students.<br />

Teachers some try to get rid of more <strong>and</strong> more students.<br />

Some teachers try to get rid of more <strong>and</strong> more <strong>and</strong> more<br />

students.<br />

Teachers some try to get rid of more <strong>and</strong> more <strong>and</strong> more<br />

<strong>and</strong> more students.<br />

*Some teachers try to get rid of more <strong>and</strong> more <strong>and</strong> more<br />

<strong>and</strong> more students.<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

11<br />

How to write a grammar WS 2008-2009


What we know...<br />

● <strong>The</strong>re are rules<br />

● Rules are not expressed <strong>in</strong> the lists (!)<br />

● Lists do not "capture a l<strong>in</strong>guistically significant<br />

generalization"<br />

● ...luckily there is no natural language where the<br />

structure depends on the number of words<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

12<br />

How to write a grammar WS 2008-2009


Regular expressions<br />

● Let's pretend:<br />

●<br />

<strong>The</strong>re are parts of speech/grammatical categories<br />

● Verbs, nouns, articles ...<br />

● Proposal: English grammar is<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

A list of patterns:<br />

ARTICLE NOUN VERB<br />

ARTICLE NOUN VERB ARTICLE NOUN<br />

A <strong>lexicon</strong>:<br />

ARTICLE: a, the<br />

NOUN: cat, dog<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

VERB: attacked, scratched<br />

13<br />

How to write a grammar WS 2008-2009


Possible sentences:<br />

<strong>The</strong> cat attacked.<br />

<strong>The</strong> cat scratched.<br />

A cat attacked.<br />

A cat scratched.<br />

<strong>The</strong> dog attacked.<br />

<strong>The</strong> dog scratched.<br />

A dog attacked.<br />

A dog scratched.<br />

A dog scratched a cat<br />

A dog scratched the cat.<br />

A dog scratched a dog.<br />

A dog scratched the dog.<br />

....<br />

= 40 comb<strong>in</strong>ations<br />

(2*2*2+2*2*2*2*2=40)<br />

Not possible to describe:<br />

Teachers some try to get rid of<br />

more <strong>and</strong> more students.<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

14<br />

How to write a grammar WS 2008-2009


Solution: Kleene star <strong>and</strong> plus<br />

Notation<br />

Named after Stephen Kleene<br />

Notation: * or + (superscripts)<br />

Mean<strong>in</strong>g<br />

*<br />

: 0 or arbitrary many (f<strong>in</strong>ite)<br />

+<br />

: 1 or arbitrary many (f<strong>in</strong>ite)<br />

Example:<br />

Some teacher get rid of more [<strong>and</strong><br />

more] * students<br />

Some teacher get rid of more [<strong>and</strong><br />

more] + students.<br />

ARTICLE NOUN VERB (ARTICLE<br />

NOUN)|ADJECTIVE *<br />

| denotes alternatives "or"<br />

() means optional<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

15<br />

How to write a grammar WS 2008-2009


A closer look at regular grammars<br />

patterns mak<strong>in</strong>g use of<br />

Kleene star (0 or more)<br />

Kleene plus (1 or more)<br />

parenthesis (optionality)<br />

NOUN (<strong>and</strong> NOUN) + chase a cat.<br />

NOUN (<strong>and</strong> NOUN) + VERB a cat.<br />

NOUN (<strong>and</strong> NOUN) + VERB a NOUN<br />

(<strong>and</strong> NOUN)*.<br />

DET ADJ NOUN (KONJ NOUN)<br />

VERB DET ADJ NOUN ((DET) KONJ<br />

ADJ NOUN)<br />

....<br />

Problem:<br />

how do you <strong>in</strong>dicate that DET ADJ<br />

NOUN is one “unit”?<br />

<strong>The</strong>se units are called phrases<br />

or (syntactic) constituents<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

16<br />

How to write a grammar WS 2008-2009


Context Free (Phrase Structure)<br />

Grammar<br />

● Conta<strong>in</strong>s rules: A→φ<br />

● Include phrases<br />

● Grammatical categories<br />

● Grammatical categories =<br />

● Parts of speech ('lexical<br />

category')<br />

●<br />

●<br />

● Types of phrase<br />

('nonlexical category' or<br />

'phrasal category')<br />

●<br />

● Conta<strong>in</strong>s a <strong>lexicon</strong>:<br />

● Words<br />

●<br />

A nonlexical category,<br />

φ regular expression<br />

from lexical <strong>and</strong>/or<br />

nonlexical category<br />

“→” means “can consist<br />

of<br />

Rules called phrase<br />

structure rules<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

17<br />

How to write a grammar WS 2008-2009


Example grammar<br />

Rules<br />

S→ NP VP<br />

NP → (D) A* N PP*<br />

VP → V (NP) (PP)<br />

PP → P NP<br />

Lexicon<br />

D: the, some<br />

A: big, brown, old<br />

N: birds, fleas, dog,<br />

hunter<br />

V: attack, ate, watched<br />

P: for, beside, with<br />

What do sentences look<br />

like?<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

18<br />

How to write a grammar WS 2008-2009


CFG as a tree<br />

S<br />

NP<br />

VP<br />

D A A N PP V NP PP<br />

P NP D N P NP<br />

D<br />

N<br />

<strong>The</strong> big brown dog with fleas watched the birds beside the hunter<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

19<br />

How to write a grammar WS 2008-2009


Oh wow, we have seen some of that<br />

before...<br />

•<br />

Branch<strong>in</strong>g <strong>in</strong>to phrases<br />

•<br />

Each phrase has a head<br />

•<br />

Some have adjuncts<br />

•<br />

Some have complements<br />

•<br />

Some have arguments<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

20<br />

How to write a grammar WS 2008-2009


Def<strong>in</strong>ition of grammar<br />

● A Grammar G = consists of<br />

●<br />

an alphabet Φ of non term<strong>in</strong>al symbols (or<br />

variables)<br />

● an alphabet Σ of term<strong>in</strong>al symbols, where Φ∩Σ =∅<br />

● a set R ⊆ Γ<br />

*<br />

‧ Γ *<br />

of replacement rules (mapp<strong>in</strong>gs)<br />

, where Γ is the complete alphabet Φ⋃Σ <strong>and</strong><br />

α ∉ Σ *<br />

● a start symbol S ∈ Φ<br />

● Rules are pairs of , usually written α→β<br />

● Unrestricted http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

(general) grammar (Type 0 grammar) 21<br />

How to write a grammar WS 2008-2009


Example grammar (rep.)<br />

Rules<br />

S→ NP VP<br />

NP → (D) A* N PP*<br />

VP → V (NP) (PP)<br />

PP → P NP<br />

Lexicon<br />

D: the, some<br />

A: big, brown, old<br />

N: birds, fleas, dog, hunter<br />

V: attack, ate, watched<br />

P: for, beside, with<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

Term<strong>in</strong>als: words <strong>in</strong> the<br />

<strong>lexicon</strong><br />

Non-term<strong>in</strong>als: NP, VP,<br />

PP, D, A, V, P, N...<br />

Rules<br />

Start symbol: S<br />

22<br />

How to write a grammar WS 2008-2009


Def<strong>in</strong>ition of language<br />

● Let G = be a grammar. <strong>The</strong>n<br />

● L(G) = {w ∈Σ* | S ⇒* W} is called the language<br />

generated by G.<br />

● Two grammars are equivalent if they generate the<br />

same language<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

23<br />

How to write a grammar WS 2008-2009


<strong>The</strong> Chomsky Hierarchy<br />

● Type 0: unrestricted grammar (see also determ<strong>in</strong>istic <strong>and</strong><br />

non determ<strong>in</strong>istic Tur<strong>in</strong>g mach<strong>in</strong>es)<br />

● Type 1: context sensitive grammars<br />

● Type 2: context free grammars<br />

● Type 3: regular language/regular grammars<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

24<br />

How to write a grammar WS 2008-2009


Homework<br />

•<br />

Take the sentences from your last homework<br />

•<br />

Use the simple sentences/ma<strong>in</strong> clauses, relative clauses<br />

only<br />

•<br />

List all term<strong>in</strong>al symbols!<br />

•<br />

Determ<strong>in</strong>e all phrases!<br />

•<br />

Where is your start symbol?<br />

•<br />

Can you draw a <strong>syntax</strong> tree for them?<br />

http://www.spectrum.uni-bielefeld.de/~ttrippel/htwg/<br />

25<br />

How to write a grammar WS 2008-2009

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!