slides

Dynamic Programming Algorithms 

in Semiring and Hypergraph Frameworks 

(University of Pennsylvania) 

 

Fall 2007

Guest Lectures: Outline 

• I plan to teach 3 topics on Algorithms in NLP/MT 

• Dynamic Programming: An Advanced Survey 

• Semiring and Hypergraph Frameworks 

• Viterbi and Dijkstra/Knuth Algorithms 

• k-best Algorithms 

• k-best parsing and MT decoding 

• other k-best problems and algorithms 

• Perceptron-like Algorithms 

Liang Huang 

2 

Dynamic Programming

Dynamic Programming 

Liang Huang 

3 

Dynamic Programming


• Dynamic Programming is everywhere in NLP 

• 

• CKY Algorithm for Context-free Parsing 

• 

Viterbi Algorithm for Hidden Markov Models 

Forward-Backward and Inside-Outside Algorithms 

• A* Parsing 3 

Liang Huang 

Dynamic Programming



• 


• 

• A* Parsing 

• Also everywhere in Machine Translation 

• Decoding: Phrase-based, Syntax-based, etc. 

• Alignment: IBM models, HMM model, etc. 



Liang Huang 

3 

Dynamic Programming



• 


• 

• A* Parsing 

• Also everywhere in Machine Translation 

• Decoding: Phrase-based, Syntax-based, etc. 

• Alignment: IBM models, HMM model, etc. 

• Focus on Optimization Problems 

Liang Huang 



3 

Dynamic Programming

Two Dimensional Survey 

traversing order 

search space 

Viterbi 

Generalized 

Viterbi 

Dijkstra 

Knuth 

Liang Huang 

4 

Dynamic Programming

Graphs in NLP 

Liang Huang 

5 

Dynamic Programming

Motivations for Semirings 

• in a weighted graph, we need two operators: 

• 

• 

• 

extension (multiplicative) and summary (additive) 

the weight of a path is the product of edge weights 

the weight of a vertex is the summary of path weights 

e1 

e2 

d(π 1 )= ⊗ 

e i ∈π 1 

w(e i )=w(e 1 ) ⊗ w(e 2 ) ⊗ w(e 3 ) 

e3 

... 

... 

d(t) = ⊕ π i 

w(π i ) 

= w(p 1 ) ⊕ w(p 2 ) ⊕··· 

Liang Huang 

6 

Dynamic Programming

Definitions 

A monoid is a triple (A, ⊗, 1) where 

1. ⊗ is a closed associative binary operator on the set A, 

2. 1istheidentity element for ⊗, i.e., for all a ∈ A, a ⊗ 1=1 ⊗ a = a. 

A monoid is commutative if ⊗ is commutative. 

Liang Huang 

7 

Dynamic Programming

Definitions 




A monoid is commutative if ⊗ is commutative. ([0, 1], +, 0) 

([0, 1], , 1) 

([0, 1], max, 0) 

Liang Huang 

7 

Dynamic Programming

Definitions 





A semiring is a 5-tuple R =(A, ⊕, ⊗, 0, 1) such that 

1. (A, ⊕, 0) is a commutative monoid. 

([0, 1], +, 0) 

([0, 1], , 1) 

([0, 1], max, 0) 

2. (A, ⊗, 1) is a monoid. 

3. ⊗ distributes over ⊕: for all a, b, c in A, 

(a ⊕ b) ⊗ c =(a ⊗ c) ⊕ (b ⊗ c), 

c ⊗ (a ⊕ b) =(c ⊗ a) ⊕ (c ⊗ b). 

4. 0isanannihilator for ⊗: for all a in A, 0 ⊗ a = a ⊗ 0=0. 

Liang Huang 

7 

Dynamic Programming

Definitions 







2. (A, ⊗, 1) is a monoid. 


([0, 1], +, 0) 

([0, 1], , 1) 

([0, 1], max, 0) 

([0, 1], max, , 0, 1) 

(a ⊕ b) ⊗ c =(a ⊗ c) ⊕ (b ⊗ c), 

c ⊗ (a ⊕ b) =(c ⊗ a) ⊕ (c ⊗ b). 


Liang Huang 

7 

Dynamic Programming

Definitions 







2. (A, ⊗, 1) is a monoid. 


(a ⊕ b) ⊗ c =(a ⊗ c) ⊕ (b ⊗ c), 

c ⊗ (a ⊕ b) =(c ⊗ a) ⊕ (c ⊗ b). 

([0, 1], +, 0) 

([0, 1], , 1) 

([0, 1], max, 0) 

([0, 1], max, , 0, 1) 

([0, 1], +, , 0, 1) 


Liang Huang 

7 

Dynamic Programming

Examples 

Semiring Set ⊕ ⊗ 0 1 intuition/application 

Boolean {0, 1} ∨ ∧ 0 1 logical deduction, recognition 

Viterbi [0, 1] max × 0 1 prob. of the best derivation 

Inside R + ∪{+∞} + × 0 1 prob. of a string 

Real R ∪{+∞} min + +∞ 0 shortest-distance 

Tropical R + ∪{+∞} min + +∞ 0 with non-negative weights 

Counting N + × 0 1 number of paths 

how about ? 

Liang Huang 

8 

Dynamic Programming

Ordering 

Liang Huang 

9 

Dynamic Programming

Ordering 

• idempotent 9 

A semiring (A, ⊕, ⊗, 0, 1) is idempotent if for all a in A, a ⊕ a = a. 

Liang Huang 

Dynamic Programming

Ordering 

• idempotent 


• comparison 

(a ≤ b) ⇔ (a ⊕ b = a) defines a partial ordering. 

• examples: boolean, viterbi, tropical, real, ... 

({0, 1}, ∨, ∧, 0, 1) (R + ∪{+∞}, min, +, +∞, 0) 

([0, 1], max, ⊗, 0, 1) (R ∪{+∞}, min, +, +∞, 0) 

Liang Huang 

9 

Dynamic Programming

Ordering 

• idempotent 


• comparison 

(a ≤ b) ⇔ (a ⊕ b = a) defines a partial ordering. 

• examples: boolean, viterbi, tropical, real, ... 

({0, 1}, ∨, ∧, 0, 1) (R + ∪{+∞}, min, +, +∞, 0) 

([0, 1], max, ⊗, 0, 1) (R ∪{+∞}, min, +, +∞, 0) 

• total-order for optimization problems 

A semiring is totally-ordered if ⊕ defines a total ordering. 

• examples: all of the above 9 

Liang Huang 

Dynamic Programming

Monotonicity 

Liang Huang 

10 

Dynamic Programming

Monotonicity 

• monotonicity 10 

Liang Huang 

Dynamic Programming

Monotonicity 

• monotonicity 10 

Let K =(A, ⊕, ⊗, 0, 1) be a semiring, and ≤ a partial ordering over A. 

We say K is monotonic if for all a, b, c ∈ A 

(a ≤ b) ⇒ (a ⊗ c ≤ b ⊗ c) (a ≤ b) ⇒ (c ⊗ a ≤ c ⊗ b) 

Liang Huang 

Dynamic Programming

• monotonicity 

Monotonicity 



(a ≤ b) ⇒ (a ⊗ c ≤ b ⊗ c) (a ≤ b) ⇒ (c ⊗ a ≤ c ⊗ b) 

• optimal substructure in dynamic programming 

Liang Huang 

10 

Dynamic Programming


Monotonicity 



(a ≤ b) ⇒ (a ⊗ c ≤ b ⊗ c) (a ≤ b) ⇒ (c ⊗ a ≤ c ⊗ b) 


A: b ⊗ c 

Liang Huang 

10 

Dynamic Programming


Monotonicity 



(a ≤ b) ⇒ (a ⊗ c ≤ b ⊗ c) (a ≤ b) ⇒ (c ⊗ a ≤ c ⊗ b) 


A: b ⊗ c 

A: b’ ⊗ c 

b ⊗ c 

 

Liang Huang 

10 

Dynamic Programming


Monotonicity 



(a ≤ b) ⇒ (a ⊗ c ≤ b ⊗ c) (a ≤ b) ⇒ (c ⊗ a ≤ c ⊗ b) 


A: b ⊗ c 

• [lemma] idempotent => monotone 

A: b’ ⊗ c 

b ⊗ c 

 

Liang Huang 

10 

Dynamic Programming


Monotonicity 



(a ≤ b) ⇒ (a ⊗ c ≤ b ⊗ c) (a ≤ b) ⇒ (c ⊗ a ≤ c ⊗ b) 


A: b ⊗ c 

A: b’ ⊗ c 

b ⊗ c 

 


• 

Liang Huang 

our focus, totally-ordered semirings, are monotone 

10 

Dynamic Programming


Monotonicity 



(a ≤ b) ⇒ (a ⊗ c ≤ b ⊗ c) (a ≤ b) ⇒ (c ⊗ a ≤ c ⊗ b) 


A: b ⊗ c 

A: b’ ⊗ c 

b ⊗ c 

 


• 

Liang Huang 

our focus, totally-ordered semirings, are monotone 

10 

free lunch! 

Dynamic Programming



search space 

Viterbi 

Generalized 

Viterbi 

Dijkstra 

Knuth 

Liang Huang 

11 

Dynamic Programming

DP on Graphs 

• optimization problems on graphs 

=> generic shortest-path problem 

• weighted directed graph G=(V, E) with a function w 

that assigns each edge a weight from a semiring 

• compute the best weight of the target vertex t 

• generic update along edge (u, v) 

w(u, v) d(v) ⊕ = d(u) ⊗ w(u, v) 

• how to avoid cyclic updates? 

• 

only update when d(u) is fixed 

d(v) ← d(v) ⊕ (d(u) ⊗ w(u, v)) 

Liang Huang 

12 

Dynamic Programming

Viterbi Algorithm for DAGs 

1. topological sort 

2. visit each vertex v in sorted order and do updates 

• for each incoming edge (u, v) in E 

• use d(u) to update d(v): 

• 

key observation: d(u) is fixed to optimal at this time 

w(u, v) 

d(v) ⊕ = d(u) ⊗ w(u, v) 

• time complexity: O( V + E ) 

Liang Huang 

13 

Dynamic Programming

Variant 1: forward-update 



• for each outgoing edge (v, u) in E 

• use d(v) to update d(u): 

• 

d(u) ⊕ = d(v) ⊗ w(v, u) 

key observation: d(v) is fixed to optimal at this time 

w(v, u) 


Liang Huang 

14 

Dynamic Programming

Examples 

Liang Huang 

15 

Dynamic Programming

Examples 

• [Number of Paths in a DAG] 

Liang Huang 

15 

Dynamic Programming

Examples 


• 

• 

just use the counting semiring (N, +, , 0, 1) 

note: this is not an optimization problem! 

Liang Huang 

15 

Dynamic Programming

Examples 


• 

• 



• [Longest Path in a DAG] 15 

Liang Huang 

Dynamic Programming

Examples 


• 

• 

• [Longest Path in a DAG] 



• just use the semiring 15 

(R ∪{−∞}, max, +, −∞, 0) 

Liang Huang 

Dynamic Programming

Examples 


• 

• 


• just use the semiring 

• 



(R ∪{−∞}, max, +, −∞, 0) 

[Part-of-Speech Tagging with a Hidden Markov Model] 

Liang Huang 

15 

Dynamic Programming

Examples 


• 

• 


• just use the semiring 

• 



(R ∪{−∞}, max, +, −∞, 0) 

[Part-of-Speech Tagging with a Hidden Markov Model] 

Liang Huang 

15 

Dynamic Programming

Example: Word-based Decoding 

Liang Huang 

16 

Dynamic Programming

Example: Word Alignment 

Liang Huang 

17 

Dynamic Programming

Dijkstra Algorithm 

Liang Huang 

18 

Dynamic Programming


• Dijkstra does not require acyclicity 

• 

• 

instead of topological order, we use best-first order 

but this requires superiority of the semiring 


We say K is superior if for all a, b ∈ A 

a ≤ a ⊗ b, b ≤ a ⊗ b. 

• basically, combination always gets worse 

• or, no negative edge in a graph 

Liang Huang 

18 

Dynamic Programming



• 

• 





a ≤ a ⊗ b, b ≤ a ⊗ b. 



d(u) 

w(e) 

d(u) ⊗ w(e) 

Liang Huang 

18 

Dynamic Programming



• 

• 





d(u) 



w(e) 

a ≤ a ⊗ b, b ≤ a ⊗ b. 

d(u) ⊗ w(e) 

({0, 1}, ∨, ∧, 0, 1) 

([0, 1], max, ×, 0, 1) 

(R + ∪{+∞}, min, +, +∞, 0) 

(R ∪{+∞}, min, +, +∞, 0) 

Liang Huang 

18 

Dynamic Programming



• 

• 





d(u) 



w(e) 

a ≤ a ⊗ b, b ≤ a ⊗ b. 

d(u) ⊗ w(e) 

({0, 1}, ∨, ∧, 0, 1) 

([0, 1], max, ×, 0, 1) 

(R + ∪{+∞}, min, +, +∞, 0) 

(R ∪{+∞}, min, +, +∞, 0) 

Liang Huang 

18 

Dynamic Programming



• 

• 





d(u) 



w(e) 

a ≤ a ⊗ b, b ≤ a ⊗ b. 

d(u) ⊗ w(e) 

({0, 1}, ∨, ∧, 0, 1) 

([0, 1], max, ×, 0, 1) 

(R + ∪{+∞}, min, +, +∞, 0) 

(R ∪{+∞}, min, +, +∞, 0) 

Liang Huang 

18 

Dynamic Programming



• 

• 





d(u) 



w(e) 

a ≤ a ⊗ b, b ≤ a ⊗ b. 

d(u) ⊗ w(e) 

({0, 1}, ∨, ∧, 0, 1) 

([0, 1], max, ×, 0, 1) 

(R + ∪{+∞}, min, +, +∞, 0) 

(R ∪{+∞}, min, +, +∞, 0) 

Liang Huang 

18 

Dynamic Programming



• 

• 





d(u) 



w(e) 

a ≤ a ⊗ b, b ≤ a ⊗ b. 

d(u) ⊗ w(e) 

({0, 1}, ∨, ∧, 0, 1) 

([0, 1], max, ×, 0, 1) 

(R + ∪{+∞}, min, +, +∞, 0) 

(R ∪{+∞}, min, +, +∞, 0) 

Liang Huang 

18 

Dynamic Programming


• keep a cut (S : V - S) where S vertices are fixed 

• maintain a priority queue Q of V - S vertices 

• each iteration choose the best vertex v from Q 

• 

move v to S, and use d(v) to forward-update others 

... 

d(u) ⊕ = d(v) ⊗ w(v, u) 

S 

V - S 

time complexity: 

O((V+E) lgV) (binary heap) 

O(V lgV + E) (fib. heap) 

Liang Huang 

19 

Dynamic Programming





• 


... 

d(u) ⊕ = d(v) ⊗ w(v, u) 

S 

V - S 




Liang Huang 

19 

Dynamic Programming





• 


w(v, u) 

d(u) ⊕ = d(v) ⊗ w(v, u) 

... 

S 

V - S 




Liang Huang 

19 

Dynamic Programming

Viterbi vs. Dijkstra 

• structural vs. algebraic constraints 

• 

Dijkstra only applicable to optimization problems 

monotonic optimization problems 

Liang Huang 

20 

Dynamic Programming



• 



Liang Huang 

20 

Dynamic Programming



• 



Liang Huang 

20 

Dynamic Programming



• 



many 

NLP 

problems 

Liang Huang 

20 

Dynamic Programming



• 



many 

NLP 

problems 

forward-backward 

(Inside semiring) 

Liang Huang 

20 

Dynamic Programming



• 



many 

NLP 

problems 



Liang Huang 

max-margin 

models 

20 

Dynamic Programming



• 



many 

NLP 

problems 



Liang Huang 

max-margin 

models 

20 

cyclic FSMs/ 

grammars 

Dynamic Programming

What if both fail? 


many 

NLP 

problems 

generalized Bellman-Ford 

(CLR, 1990; Mohri, 2002) 

or, first do strongly-connected components (SCC) 

which gives a DAG; use Viterbi globally on this SCC-DAG; 

use Bellman-Ford locally within each SCC 

Liang Huang 

21 

Dynamic Programming

What if both work? 


many 

NLP 

problems 

full Dijkstra is slower than Viterbi 

O((V + E) lgV) vs. O(V + E) 

but it can finish as early as the target vertex is popped 

a (V + E) lgV vs. V + E 

Q: how to (magically) reduce a? 

Liang Huang 

22 

Dynamic Programming

A* Search 

d(v) 

h(v) 

• d(v): the distance from source s to v 

• h(v): the distance from v to target t 

• (v) must be an optimistic estimate of h(v): (v) h(v) 

• now, prioritize the queue by d(v) ⊗ (v) 

• Dijkstra is a special case where (v) = 

• also requires d(v) ⊗ (v) never improves (superior) 

• hope: d(t) ⊗ (t) = d(t) can be popped sooner 

Liang Huang 

23 

(v) 

Dynamic Programming



search space 

Viterbi 

Generalized 

Viterbi 

Dijkstra 

Knuth 

Liang Huang 

24 

Dynamic Programming



search space 

Viterbi 

Generalized 

Viterbi 

Dijkstra 

Knuth 

Liang Huang 

24 

Dynamic Programming

Background: CFG and Parsing 

Liang Huang 

25 

Dynamic Programming

(Directed) Hypergraphs 

• a generalization of graphs 

• 

• e = (T(e), h(e), f e). arity |e| = |T(e)| 

• a totally-ordered weight set R 

• we borrow the operator to be the comparison 

• weight function f e : R |e| to R 

• generalizes the ⊗ operator in semirings 

edge => hyperedge: several vertices to one vertex 

tails 

Liang Huang 

1 

2 

fe 

head 

d(v) ⊕ = f e (d(u 1 ),d(u 2 )) 

26 

Dynamic Programming

Hypergraphs and Deduction 

tails 

: a 

1 2 

: b 

: a : b 

1 2 

antecedents 

fe 

: fe (a,b) 

: fe (a,b) 

fe 

head 

consequent 

Liang Huang 

27 

(Nederhof, 2003) 

Dynamic Programming


tails 

: a 

1 2 

: b 

: a : b 

1 2 

antecedents 

fe 

: fe (a,b) 

: fe (a,b) 

fe 

head 

consequent 

(B, i, k) 

: a 

(C, k, j) 

: b 

1 2 

Liang Huang 

fe 

: a b Pr(A B C) 

(A, i, j) 

27 


Dynamic Programming


tails 

: a 

1 2 

: b 

: a : b 

1 2 

antecedents 

fe 

: fe (a,b) 

: fe (a,b) 

fe 

head 

consequent 

(B, i, k) 

: a 

(C, k, j) 

: b 

1 2 

fe 

(B, i, k) (C, k, j) 

(A, i, j) 

AB C 

: a b Pr(A B C) 

(A, i, j) 


Liang Huang 

27 

Dynamic Programming

Weight Functions and Semirings 

Liang Huang 

28 

Dynamic Programming


d(u) 

w(e) 

d(u) ⊗ w(e) 

Liang Huang 

28 

Dynamic Programming


d(u) 

w(e) 

d(u) ⊗ w(e) 

d(u) 

fe 

fe(d(u)) 

Liang Huang 

28 

Dynamic Programming


d(u) 

d(u) 

w(e) 

fe 

d(u) ⊗ w(e) 

fe(d(u)) 

fe(a) = a ⊗ w(e) 

Liang Huang 

28 

Dynamic Programming


d(u) 

d(u) 

w(e) 

fe 

d(u) ⊗ w(e) 

fe(d(u)) 

fe(a) = a ⊗ w(e) 

1 

tails 

2 

... 

fe 

head 

k 

Liang Huang 

28 

Dynamic Programming


d(u) 

d(u) 

w(e) 

fe 

d(u) ⊗ w(e) 

fe(d(u)) 

fe(a) = a ⊗ w(e) 

tails 

1 

2 

... 

fe 

fe(a1, ..., ak) 

head 

k 

Liang Huang 

28 

Dynamic Programming


d(u) 

d(u) 

w(e) 

fe 

d(u) ⊗ w(e) 

fe(d(u)) 

fe(a) = a ⊗ w(e) 

tails 

1 

2 

... 

fe 

fe(a1, ..., ak) = a1 ⊗ ... ⊗ ak ⊗ w(e) 

head 

k 

Liang Huang 

28 

Dynamic Programming


d(u) 

d(u) 

w(e) 

fe 

d(u) ⊗ w(e) 

fe(d(u)) 

fe(a) = a ⊗ w(e) 

tails 

1 

2 

... 

fe 

fe(a1, ..., ak) = a1 ⊗ ... ⊗ ak ⊗ w(e) 

head 

k 

Liang Huang 

28 

Dynamic Programming


d(u) 

w(e) 

d(u) ⊗ w(e) 

fe(a) = a ⊗ w(e) 

d(u) 

fe 

fe(d(u)) 

semiringcomposed 

tails 

1 

2 

... 

fe 

fe(a1, ..., ak) = a1 ⊗ ... ⊗ ak ⊗ w(e) 

head 

k 

Liang Huang 

28 

Dynamic Programming


d(u) 

w(e) 

d(u) ⊗ w(e) 

fe(a) = a ⊗ w(e) 

d(u) 

fe 

fe(d(u)) 

semiringcomposed 

tails 

1 

2 

... 

fe 

fe(a1, ..., ak) = a1 ⊗ ... ⊗ ak ⊗ w(e) 

head 

k 

also extend monotonicity and 

superiority to weight functions 

Liang Huang 

28 

Dynamic Programming



search space 

Viterbi 

Generalized 

Viterbi 

Dijkstra 

Knuth 

Liang Huang 

29 

Dynamic Programming

Viterbi Algorithm for DAGs 



• for each incoming edge (u, v) in E 

• use d(u) to update d(v): 

• 

key observation: d(u) is fixed to optimal at this time 

w(u, v) 

d(v) ⊕ = d(u) ⊗ w(u, v) 


Liang Huang 

30 

Dynamic Programming

Viterbi Algorithm for DAHs 



• for each incoming hyperedge e = ((u 1, .., u|e|), v, fe) 

• use d(u i)’s to update d(v) 

• key observation: d(ui)’s are fixed to optimal at this time 

1 

2 

fe 

d(v) ⊕ = f e (d(u 1 ), ··· ,d(u |e| )) 

• time complexity: O( V + E ) (assuming constant arity) 

Liang Huang 

31 

Dynamic Programming

Example: CKY Parsing 

• parsing with CFGs in Chomsky Normal Form (CNF) 

• typical instance of the generalized Viterbi for DAHs 

• many variants of CKY ~ various topological ordering 

(S, 1, n) (S, 1, n) (S, 1, n) 

Liang Huang 

O(n 3 |P|) 

with CKY pseudo-code 

32 

Dynamic Programming





(S, 1, n) (S, 1, n) (S, 1, n) 

Liang Huang 

O(n 3 |P|) 


32 

Dynamic Programming





(S, 1, n) (S, 1, n) (S, 1, n) 

bottom-up 

Liang Huang 

O(n 3 |P|) 


32 

Dynamic Programming





(S, 1, n) (S, 1, n) (S, 1, n) 

bottom-up 

Liang Huang 

O(n 3 |P|) 


32 

Dynamic Programming





(S, 1, n) (S, 1, n) (S, 1, n) 

bottom-up 

left-to-right 

Liang Huang 

O(n 3 |P|) 


32 

Dynamic Programming





(S, 1, n) (S, 1, n) (S, 1, n) 

bottom-up 

left-to-right 

Liang Huang 

O(n 3 |P|) 


32 

Dynamic Programming





(S, 1, n) (S, 1, n) (S, 1, n) 

bottom-up left-to-right right-to-left 

Liang Huang 

O(n 3 |P|) 


32 

Dynamic Programming

Forward Variant for DAHs 



• for each outgoing hyperedge e = ((u 1, .., u|e|), h(e), fe) 

• if d(ui)’s have all been fixed to optimal 

• use d(ui)’s to update d(h(e)) 

v = i 

1 fe 

2 = 


Liang Huang 

33 

Dynamic Programming







v = i 

1 fe 

2 = 


Liang Huang 

33 

Dynamic Programming







v = i 

Q: how to avoid repeated checking? 

maintain a counter r[e] for each e: 

how many tails yet to be fixed? 

fire this hyperedge only if r[e]=0 


Liang Huang 

33 

2 = 

1 fe 

Dynamic Programming

Example: Treebank Parsers 

• State-of-the-art statistical parsers 

• (Collins, 1999; Charniak, 2000) 

• 

• can’t do backward updates 

• don’t know how to decompose a big item 

• forward update from vertex (X, i, j) 

no fixed grammar (every production is possible) 

• check all vertices like (Y, j, k) or (Y, k, i) in the chart (fixed) 

• try combine them to form bigger item (Z, i, k) or (Z, k, j) 

Liang Huang 

34 

Dynamic Programming





• 


... 

d(u) ⊕ = d(v) ⊗ w(v, u) 

S 

V - S 




Liang Huang 

35 

Dynamic Programming





• 


... 

d(u) ⊕ = d(v) ⊗ w(v, u) 

S 

V - S 




Liang Huang 

35 

Dynamic Programming





• 


w(v, u) 

d(u) ⊕ = d(v) ⊗ w(v, u) 

... 

S 

V - S 




Liang Huang 

35 

Dynamic Programming

Knuth (1977) Algorithm 




• 


... 

1 

S 

V - S 




Liang Huang 

36 

Dynamic Programming





• 


... 

1 

S 

V - S 




Liang Huang 

36 

Dynamic Programming





• 


... 

1 fe 

S 

V - S 




Liang Huang 

36 

Dynamic Programming





• 


... 

1 fe 

S 

V - S 




Liang Huang 

36 

Dynamic Programming

Example: A* Parsing 

• Use A* search on top of the Knuth Algorithm 

• 

Showed significant speed up with carefully designed 

heuristic functions (Klein and Manning, 2003) 

[open problem] can you still define heuristic function 

if weight functions are not semiring-composed? 

Liang Huang 

37 

Dynamic Programming

Example: A* Parsing 

• Use A* search on top of the Knuth Algorithm 

• 

Showed significant speed up with carefully designed 

heuristic functions (Klein and Manning, 2003) 

[open problem] can you still define heuristic function 

if weight functions are not semiring-composed? 

Liang Huang 

37 

Dynamic Programming

Same Picture Again 


many 

NLP 

problems 

Liang Huang 

38 

Dynamic Programming



many 

NLP 

problems 

PCFG parsing 

with CNF 

Liang Huang 

38 

Dynamic Programming



many 

NLP 

problems 

Inside-Outside Alg. 


PCFG parsing 

with CNF 

Liang Huang 

38 

Dynamic Programming



many 

NLP 

problems 



max-margin 

parsing 

PCFG parsing 

with CNF 

Liang Huang 

38 

Dynamic Programming



many 

NLP 

problems 



max-margin 

parsing 

PCFG parsing 

with CNF 

cyclic 

grammars 

Liang Huang 

38 

Dynamic Programming



many 

NLP 

problems 



max-margin 

parsing 

PCFG parsing 

with CNF 

cyclic 

grammars 

generalized 

generalized 

Bellman-Ford 

(open) 

Liang Huang 

38 

Dynamic Programming

Conclusion 

• surveyed two general frameworks and two 

important algorithms in DP 

• adapted the theory of semirings to weight functions 

• demonstrated advantages of modularity 

• focused on optimization problems 

• covered typical NLP applications 

• a better understanding of these theories helps NLP 

• a generic DP language for NLP: Dyna (Eisner et al.) 

• suggested some open problems 

Liang Huang 

39 

Dynamic Programming

slides

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?