24.02.2013 Views

Proceedings of the LFG 02 Conference National Technical - CSLI ...

Proceedings of the LFG 02 Conference National Technical - CSLI ...

Proceedings of the LFG 02 Conference National Technical - CSLI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>LFG</strong><strong>02</strong> – Kuhn: Corpus-based Learning in Stochastic OT-<strong>LFG</strong><br />

input) for <strong>the</strong> learning instances. This still leaves open which <strong>of</strong> <strong>the</strong> syntactic analyses<br />

for <strong>the</strong> observed string is <strong>the</strong> right target winner.<br />

(17) Narrowed down set <strong>of</strong> choices in interpretive optimization<br />

meaning¡ meaning¢<br />

5.2 Experimental set-up<br />

cand¢ cand£ cand¤ cand¥<br />

cand¡<br />

string¢<br />

string¡<br />

The training data were extracted from <strong>the</strong> TIGER treebank, a syntactically annotated<br />

newspaper corpus <strong>of</strong> German (cf. Brants et al. (20<strong>02</strong>), Zinsmeister et al. (20<strong>02</strong>)). The<br />

treebank includes full categorial and functional annotations, but this information was<br />

<strong>of</strong> course only partially exploited for training data (as far as justified by non-syntactic<br />

information available to <strong>the</strong> human learner).<br />

The data was split up into single clauses, i.e., ei<strong>the</strong>r matrix clauses or embedded<br />

clauses (presented as separate training instances). Since <strong>the</strong> focus was on <strong>the</strong> learning<br />

<strong>of</strong> clausal syntax, embedded argument/modifier phrases (NPs, PPs, etc.), were prebracketed,<br />

and <strong>the</strong>ir grammatical functions were provided. No syntactic information<br />

was provided about verbal constituents, i.e., verbs and auxiliaries were left as separate,<br />

unconnected units.<br />

For example, sentence (18) would give rise to two training instances (19)—one for<br />

<strong>the</strong> matrix clause, including a single “chunk” for <strong>the</strong> embedded complement clause,<br />

and one for <strong>the</strong> internal structure <strong>of</strong> <strong>the</strong> complement clause.<br />

(18) Der Vorstand der Firma hat gefordert, daß der Geschäftsführer entlassen<br />

<strong>the</strong> board<br />

wird.<br />

is<br />

<strong>of</strong> <strong>the</strong> company has demanded that <strong>the</strong> managing director laid <strong>of</strong>f<br />

(19) a. [Der Vorstand der Firma] hat gefordert, [daß . . . ]<br />

b. daß [der Geschäftsführer] entlassen wird<br />

The candidate analyses The set <strong>of</strong> candidates was generated by a highly underrestricted<br />

<strong>LFG</strong> grammar ( ¢¡¤£¦¥§¡©¨ � ), approximating <strong>the</strong> OT hypo<strong>the</strong>sis that all universally<br />

possible structures should be included in this set. Reflecting inviolable principles, an<br />

extended X-bar scheme is encoded in <strong>the</strong> <strong>LFG</strong> grammar; <strong>the</strong> scheme is very general<br />

however, all positions are optional, functional projections (IP, CP) can be freely filled<br />

251<br />

. . .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!