Dependency Parsing Dependency Grammar Dependency Syntax ...

More documents

Recommendations

Info

Projectivity◮ Most theoretical frameworks do not assume projectivity.◮ Non-projective structures are needed to account for◮ long-distance dependencies,◮ free word order.pcpvgsbjobjnmod nmod nmodIntroductionWhere we’re going◮ Dependency parsing:◮ Input: Sentence x = w1 , . . . w n◮ Output: Dependency graph G◮ Focus today:◮ Computational methods for dependency parsing◮ Resources for dependency parsing (parsers, treebanks)IntroductionWhat did economic news have little effect on ?Dependency Parsing 19(70)Dependency Parsing 20(70)Parsing MethodsParsing MethodsDeterministic ParsingParsing Methods◮ Three main traditions:◮ Deterministic parsing (specifically: Transition-based parsing)◮ Dynamic programming (specifically: Graph-based parsing)◮ Constraint satisfaction (not covered today)◮ Special issue:◮ Non-projective dependency parsing◮ Basic idea:◮ Derive a single syntactic representation (dependency graph)through a deterministic sequence of elementary parsing actions◮ Sometimes combined with backtracking or repair◮ Motivation:◮◮◮Psycholinguistic modelingEfficiencySimplicityDependency Parsing 21(70)Dependency Parsing 22(70)Covington’s Incremental AlgorithmParsing MethodsShift-Reduce Type AlgorithmsParsing Methods◮ Deterministic incremental parsing in O(n 2 ) time by trying tolink each new word to each preceding one [Covington(2001)]:PARSE(x = (w 1 , . . . , w n ))1 for i = 1 up to n2 for j = i − 1 down to 13 LINK(w i , w j )⎧⎨ E ← E ∪ (i, j)LINK(w i , w j ) = E ← E ∪ (j, i)⎩E ← Eif w j is a dependent of w iif w i is a dependent of w jotherwise◮ Different conditions, such as Single-Head and Projectivity, canbe incorporated into the LINK operation.Transition-based parsing◮ Data structures:◮ Stack [. . . , wi ] S of partially processed tokens◮ Queue [wj , . . .] Q of remaining input tokens◮ Parsing actions built from atomic actions:◮ Adding arcs (wi → w j , w i ← w j )◮ Stack and queue operations◮ Left-to-right parsing in O(n) time◮ Restricted to projective dependency graphsDependency Parsing 23(70)Dependency Parsing 24(70)
Parsing MethodsParsing MethodsYamada’s AlgorithmNivre’s Algorithm◮ Three parsing actions:◮ Four parsing actions:Shift[. . .] S [w i , . . .] Q[. . . , w i ] S [. . .] QShift[. . .] S [w i , . . .] Q[. . . , w i ] S [. . .] QLeftRight[. . . , w i , w j ] S [. . .] Q[. . . , w i ] S [. . .] Q w i → w j[. . . , w i , w j ] S [. . .] Q[. . . , w j ] S [. . .] Q w i ← w j◮ Algorithm variants:◮ Originally developed for Japanese (strictly head-final) with onlythe Shift and Right actions [Kudo and Matsumoto(2002)]◮ Adapted for English (with mixed headedness) by adding theLeft action [Yamada and Matsumoto(2003)]◮ Multiple passes over the input give time complexity O(n 2 )Reduce[. . . , w i ] S [. . .] Q ∃w k : w k → w i[. . .] S [. . .] QLeft-Arc r[. . . , w i ] S [w j , . . .] Q ¬∃w k : w k → w i[. . .] S [w j , . . .] Q w ir← w jRight-Arc r[. . . , w i ] S [w j , . . .] Q ¬∃w k : w k → w j[. . . , w i , w j ] S [. . .] Q w ir→ w j◮ Characteristics:◮ Integrated labeled dependency parsing◮ Arc-eager processing of right-dependents◮ Single pass over the input gives time complexity O(n)Dependency Parsing 25(70)Dependency Parsing 26(70)Parsing MethodsParsing MethodsExampleClassifier-Based Parsingpred[root] S [Economic] S [news] S [had] S [little] S [effect] S [on] S [financial] Sobjsbjnmod nmod nmodShift Left-Arc nmod Shift Left-Arc sbj Right-Arc pred ShiftLeft-Arc nmod Right-Arc obj Right-Arc nmod Shift Left-Arc nmodRight-Arc pc Reduce Reduce Reduce Reduce Right-Arc pppcnmo◮ Data-driven deterministic parsing:◮ Deterministic parsing requires an oracle.◮ An oracle can be approximated by a classifier.◮ A classifier can be trained using treebank data.◮ Learning methods:◮ Support vector machines (SVM)[Kudo and Matsumoto(2002), Yamada and Matsumoto(2003),Isozaki et al.(2004)Isozaki, Kazawa and Hirao,Cheng et al.(2004)Cheng, Asahara and Matsumoto,Nivre et al.(2006)Nivre, Hall, Nilsson, Eryiğit and Marinov]◮ Memory-based learning (MBL)[Nivre et al.(2004)Nivre, Hall and Nilsson, Nivre and Scholz(2004)]◮ Maximum entropy modeling (MaxEnt)[Cheng et al.(2005)Cheng, Asahara and Matsumoto]Dependency Parsing 27(70)Dependency Parsing 28(70)Parsing MethodsParsing MethodsFeature ModelsComparing Algorithms◮ Learning problem:◮ Approximate a function from parser states, represented byfeature vectors to parser actions, given a training set of goldstandard derivations.◮ Typical features:◮ Tokens:◮ Target tokens◮ Linear context (neighbors in S and Q)◮ Structural context (parents, children, siblings in G)◮ Attributes:◮ Word form (and lemma)◮ Part-of-speech (and morpho-syntactic features)◮ Dependency type (if labeled)◮ Distance (between target tokens)◮ Parsing algorithm:◮ Nivre’s algorithm gives higher accuracy than Yamada’salgorithm for parsing the Chinese CKIP treebank[Cheng et al.(2004)Cheng, Asahara and Matsumoto].◮ Learning algorithm:◮ SVM gives higher accuracy than MaxEnt for parsing theChinese CKIP treebank[Cheng et al.(2004)Cheng, Asahara and Matsumoto].◮ SVM gives higher accuracy than MBL with lexicalized featuremodels for three languages[Hall et al.(2006)Hall, Nivre and Nilsson]:◮ Chinese (Penn)◮ English (Penn)◮ Swedish (Talbanken)Dependency Parsing 29(70)Dependency Parsing 30(70)
Page 1: IntroductionDependency GrammarDepen
Page 8 and 9: MIRAParsing MethodsResults by McDon
Page 10 and 11: ParsersPractical IssuesTrainable Pa
Page 12 and 13: Constituent Treebanks (3)Practical
Page 14: ◮ Nivre, Joakim (2006). Constrain

Dependency Parsing Dependency Grammar Dependency Syntax ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?