17.01.2015 Views

Prosodic Boundary Prediction based on Maximum Entropy Model ...

Prosodic Boundary Prediction based on Maximum Entropy Model ...

Prosodic Boundary Prediction based on Maximum Entropy Model ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

p ( f ) = ∑ p( x) p( y<br />

i<br />

) ( )<br />

( )<br />

| x fi<br />

x , y , (3)<br />

x,<br />

y<br />

where p( x ) is the empirical distributi<strong>on</strong> of x in the training sample. We require the<br />

model to accord with the observed statistics by c<strong>on</strong>straining this value to be the same<br />

as the expected value of f in the training set. That is, for each f i<br />

p( f ) = p( f ) . (4)<br />

i i<br />

Requirement (4) is called a c<strong>on</strong>straint equati<strong>on</strong> or simply a c<strong>on</strong>straint.<br />

Combining (1), (3) and (4) we have:<br />

( ) ( ) ( )<br />

( ) ( )<br />

∑ | , , ,<br />

( , )<br />

p x p y x f i x y = ∑<br />

x y<br />

( x,<br />

y)<br />

p x y f i x y .<br />

Suppose we have n features, then all the probability distributi<strong>on</strong> that satisfy the<br />

c<strong>on</strong>straints exerted by these features c<strong>on</strong>stitute a set C:<br />

{ ( | ) | ( ) ( ) {1, 2,..., }<br />

C ≡ p y x p f<br />

i<br />

= p f<br />

i<br />

for i ∈ n .<br />

Am<strong>on</strong>g all the models p in C, the maximum entropy philosophy dictates that we<br />

select the <strong>on</strong>e with maximum c<strong>on</strong>diti<strong>on</strong>al entropy [9]<br />

( ) ( ) ( | ) log ( | )<br />

H p ≡ − ∑ p x p y x p y x ,<br />

x,<br />

y<br />

(5)<br />

(6)<br />

(7)<br />

and<br />

p* arg max H ( p)<br />

= . (8)<br />

p∈C<br />

It is a c<strong>on</strong>strained optimizati<strong>on</strong> problem to find p*. The target maximum entropy<br />

model has the following form[9]:<br />

1<br />

p * ( y | x) = exp( ∑ λi f<br />

i<br />

( x, y))<br />

.<br />

Z ( x)<br />

i<br />

λ<br />

where Z λ (x) is a normalizing c<strong>on</strong>stant and λ i is a Lagrange multiplier which is<br />

comm<strong>on</strong>ly computed from the training set using GIS algorithm. Detailed steps are<br />

omitted here.<br />

(9)<br />

2.2 Feature Selecti<strong>on</strong> Strategy<br />

The principle of <strong>Maximum</strong> <strong>Entropy</strong> <strong>Model</strong> is to agree with all that is known and<br />

assume nothing about what is unknown. Yet it poses another important questi<strong>on</strong>: how<br />

to find appropriate facts that are most relevant to the task in hand Put another way,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!