02.06.2013 Views

A Tree to String Transducer

A Tree to String Transducer

A Tree to String Transducer

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Machine translation<br />

A <strong>Tree</strong> <strong>to</strong> <strong>String</strong> <strong>Transducer</strong><br />

K. Yamada, K. Knight: A Syntax-Based Statistical Translation Model ACL 2001.<br />

• The input sentence is preprocessed by a syntactic parser<br />

• The channel performs operations on each node of the parse tree:<br />

– reordering child nodes<br />

– inserting extra words at each node<br />

– translating leaf words<br />

• The output of the the model is a string.<br />

March 7, 2011 T2S 1


Machine translation<br />

Parse <strong>Tree</strong>(E)<br />

PRP VB1<br />

VB<br />

PRP VB2 VB1<br />

he ha<br />

he adores<br />

TO VB<br />

MN TO<br />

music<br />

Sentence(J)<br />

<strong>to</strong><br />

VB<br />

listening<br />

VB2<br />

VB TO<br />

TO<br />

<strong>to</strong><br />

ga<br />

listening no<br />

MN<br />

music<br />

adores<br />

An Example ∗<br />

desu<br />

Reorder<br />

Insert<br />

Translate<br />

Take Leaves<br />

VB<br />

PRP VB2 VB1<br />

he<br />

TO VB<br />

MN TO<br />

music<br />

Kare ha ongaku wo kiku no ga daisuki desu<br />

<strong>to</strong><br />

listening<br />

VB<br />

adores<br />

PRP VB2 VB1<br />

kare ha<br />

TO VB<br />

MN TO<br />

ongaku wo kiku<br />

∗ Source: http://www.isi.edu/natural-language/people/cs562-8-22-06.pdf<br />

March 7, 2011 T2S 2<br />

ga<br />

no<br />

daisuki<br />

desu<br />

.


Machine translation<br />

⇒ The reordering is decided according <strong>to</strong> the r-table<br />

VB<br />

PRP VB1 VB2<br />

He adores VB TO<br />

listening TO NN<br />

original order reordering P(reorder)<br />

PRP VB1 VB2 0.074<br />

PRP VB2 VB1 0.723<br />

PRP VB1 VB2 VB1 PRP VB2 0.061<br />

· · · · · ·<br />

VB TO VB TO 0.252<br />

TO VB 0.749<br />

TO NN TO NN 0.107<br />

NN TO 0.893<br />

· · · · · ·<br />

<strong>to</strong> music<br />

VB<br />

PRP VB2<br />

VB1<br />

He TO VB adores<br />

NN<br />

music<br />

Reordering probability: 0.723 · 0.749 · 0.893 = 0.484<br />

TO<br />

<strong>to</strong><br />

listening<br />

March 7, 2011 T2S 3


Machine translation<br />

⇒The insertion of a new node is decided according <strong>to</strong> the n-table<br />

VB<br />

PRP VB2<br />

VB1<br />

parent TOP VB VB VB TO TO · · ·<br />

node VB VB PRP TO TO NN · · ·<br />

P(None) 0.735 0.687 0.344 0.709 0.900 0.800 · · ·<br />

P(Left) 0.004 0.061 0.004 0.030 0.003 0.096 · · ·<br />

P(right) 0.260 0.252 0.652 0.261 0.007 0.104 · · ·<br />

He TO VB adores<br />

NN<br />

music<br />

TO<br />

<strong>to</strong><br />

listening<br />

VB<br />

w P(ins-w)<br />

ha 0.219<br />

ta 0.131<br />

wo 0.099<br />

no 0.094<br />

ni 0.080<br />

te 0.078<br />

ga 0.062<br />

. .<br />

desu 0.0007<br />

PRP VB2<br />

VB1<br />

He ha TO VB ga adores desu<br />

NN<br />

music<br />

TO<br />

listening no<br />

Insertion probability: (0.652·0.219)·(0.252·0.094)·(0.252·0.062)·(0.252·0.0007)·<br />

0.735 · 0.709 · 0.900 · 0.800 = 3.498e − 9<br />

March 7, 2011 T2S 4<br />

<strong>to</strong><br />

.<br />

.


Machine translation<br />

⇒The translation is decided according <strong>to</strong> the t-table<br />

adores he listening music <strong>to</strong> · · ·<br />

daisuki 1.000 kare 0.952 kiku 0.333 ongaku 0.900 ni 0.216 · · ·<br />

NULL 0.016 kii 0.333 naru 0.100 NULL 0.204<br />

nani 0.005 mi 0.333 <strong>to</strong> 0.133<br />

VB<br />

PRP VB2<br />

VB1<br />

He ha TO VB ga adores desu<br />

NN<br />

music<br />

TO<br />

<strong>to</strong><br />

listening no<br />

.<br />

.<br />

.<br />

.<br />

.<br />

VB<br />

PRP VB2<br />

VB1<br />

kare ha TO VB ga daisuki desu<br />

NN<br />

ongaku<br />

TO<br />

wo<br />

.<br />

kiku no<br />

Translation probability: 0.952 · 0.900 · 0.038 · 1.000 = 0.0108<br />

March 7, 2011 T2S 5


Machine translation<br />

Formal description<br />

• Goal: Transform an English parse tree E in<strong>to</strong> a French sentence f<br />

• Definitions<br />

- E consists of nodes ε1, ε2,...,εn<br />

- f consists of words f1,f2, ...,fn<br />

- θi = (νi,ρi, τi) is a set of values of random variables associated <strong>to</strong> εi<br />

- θ = θ1,θ2, ...,θn is the set of all random variables associated with a parse<br />

tree E = ε1, ε2,...,εn<br />

P(f|E) =<br />

<br />

P(θ|E)<br />

θ:Str(θ(E))=f<br />

P(θ|E) = P(θ1, θ2, ...,θn|ε1,ε2, ...,εn)<br />

n<br />

= P(θi|θ1, θ2, ...,θn,ε1, ε2,...,εn)<br />

≈<br />

i=1<br />

n<br />

P(θi|εi)<br />

i=1<br />

March 7, 2011 T2S 6


Machine translation<br />

Where<br />

Formal description<br />

P(θi|εi) = P(νi, ρi,τi|εi) ≈ P(νi|εi)P(ρi|εi)P(τi|εi)<br />

= P(νi|N(εi))P(ρi|R(εi))P(τi|T (εi))<br />

= n(νi|N(εi))r(ρi|R(εi))t(τi|T (εi))<br />

n(ν|N(ε)) ≡ n(ν|N), r(ρ|R(ε)) ≡ r(ρ|R), t(τ|T (ε)) ≡ t(τ|T)<br />

are the parameters of the model<br />

For example:<br />

• n(ν|N) = P(right, ha|VB − PRP)<br />

• r(ρ|R) = P(PRP − VB2 − VB1|PRP − VB1 − VB2)<br />

P(f|E) = <br />

θ:Str(θ(E))=f<br />

n<br />

i=1 n(νi|N(εi))r(ρi|R(εi))t(τi|T (εi))<br />

March 7, 2011 T2S 7


Machine translation<br />

Estimation of the parameters<br />

1. Initialize all probability tables: n(ν|N), r(ρ|R) and t(τ|T)<br />

2. Reset all counters: c(ν, N), c(ρ, R) and c(τ, T)<br />

3. For each pair 〈E,f〉 in the training corpus<br />

For all θ , such that f = Str(θ(E))<br />

- Let cnt = P(θ|E)/ <br />

θ:Str(θ(E))=f P(θ|E)<br />

- For i = 1...n,<br />

c(νi, N(εi))+ = cnt<br />

c(ρi, R(εi))+ = cnt<br />

c(τi, T (εi))+ = cnt<br />

4. For each 〈ν, N〉, 〈ρ, R〉, and 〈τ, T 〉<br />

n(ν|N) = c(ν, N)/ <br />

ν<br />

c(ν, N)<br />

r(ρ|R) = c(ρ, R)/ <br />

ρ c(ρ, R)<br />

t(τ|T) = c(τ, T)/ <br />

τ<br />

c(τ, T)<br />

5. Repeat steps 2-4 for several iterations<br />

March 7, 2011 T2S 8


Machine translation<br />

Efficient EM training<br />

The EM algorithm uses a graph structure for a pair 〈E,f〉<br />

• A major-node v(εi,f l k ) shows a pairing of a subtree of E and a substring of f<br />

• Each major node connects <strong>to</strong> several ν-subnode v(ν; εi,f l k ), showing which<br />

value of ν is selected. The arc has weight P(ν|εi)<br />

• A ν-subnode v(ν; εi,f l k ) connects <strong>to</strong> a<br />

final-node with weight P(τ|εi) if εi is a<br />

terminal node<br />

• A ν-subnode connects <strong>to</strong> several ρsubnodes<br />

v(ρ; ν, εi,f l P(ρ|εi)<br />

k ) with weight<br />

• A ρ-subnode is connected <strong>to</strong> π-subnodes<br />

v(π; ρ, ν, εi,f l k ) with weight 1.0. The<br />

variable π shows a particular way of<br />

partitioning fl k<br />

P(ν|ε)<br />

P(ρ|ε)<br />

major-node<br />

ν-subnode<br />

ρ-subnode<br />

π-subnode<br />

• A π-subnode is connected <strong>to</strong> major-nodes corresponding <strong>to</strong> children of εi<br />

with weight 1.0. A major-node can be connected from different π-subnodes.<br />

major-node<br />

March 7, 2011 T2S 9


Machine translation<br />

Efficient EM training<br />

• A trace starting from the graph root, selecting one of the arcs from<br />

major-nodes, ν-subnodes and ρ-subnodes and all the arcs from πsubnodes<br />

corresponds <strong>to</strong> a particular <strong>to</strong> a particular θ<br />

• The product of the weight on the trace corresponds <strong>to</strong> P(θ|E)<br />

• An estimation algorithm similar <strong>to</strong> the inside-outside algorithm can<br />

be defined.<br />

• The time complexity is O(n 3 |ν||ρ||π|)<br />

March 7, 2011 T2S 10


Machine translation<br />

Decoder description<br />

K. Yamada, K. Knight: A decoder for Syntax-based Statistical MT ACL 2001.<br />

Modifications <strong>to</strong> the original MT for phrasal translations:<br />

• Fertility µ is used <strong>to</strong> allow 1-<strong>to</strong>-N mapping:<br />

t(τ|T) = t(f1f2 ...fl|e) = µ(l|e)<br />

l<br />

t(fi|e)<br />

• Direct translation φ of an English phrase e1e2 ...em <strong>to</strong> a foreign phrase<br />

f1f2 ...fl at non-terminal tree nodes:<br />

i=1<br />

ph(φ|Φ) = t(f1f2 . . .fl|e1e2 ...em) = µ(l|e1e2 . ..em)<br />

• Linear mix (if εi is non-terminal):<br />

l<br />

t(fi|e1e2 ...em)<br />

i=1<br />

P(θi|εi) = λΦi ph(φi|Φi) + (1 − λΦi )r(ρi|Ri)n(νi|Ni)<br />

March 7, 2011 T2S 11


Machine translation<br />

Decoder description<br />

• Given a French sentence, the decoder will find the most plausible English<br />

parse tree<br />

• Idea: a mechanism similar <strong>to</strong> normal parsing is used<br />

• Steps:<br />

1. Start from an English context-free grammar and incorporate <strong>to</strong> it the<br />

channel operations<br />

2. For each non-lexical rule (such as “VP → VB NP PP”), supplement the<br />

grammar with reordered rules and probabilities are taken from the r-table<br />

3. Rules such as “VP → VP X” and “X → word” are added and probabilities<br />

are taken from the n-table<br />

4. For each lexical rule in the English grammar, we add rules such as<br />

“englishWord → foreingWord”<br />

5. Parse a string of foreign words<br />

6. Undo reordering operations and remove leaf nodes with foreign words<br />

7. Among all possible tree, choose pick the best in which the product of the<br />

LM and the TM probability is the highest<br />

March 7, 2011 T2S 12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!