HMM Parameter Tying - University of Birmingham

THE UNIVERSITY 

OF BIRMINGHAM 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 

HMM Parameter Tying 

Version 1 February 2002 

Digital Systems 

& 

Vision Processing 

Tying 

17-Feb-01 

SLIDE 1


OF BIRMINGHAM 

The Problem 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 


! Good recognition accuracy requires 

context-sensitive phone-level models 

! If there are 50 phones, the maximum 

number od triphone HMMs is 50 3 =125,000 

! Most ruled out by phonological constraints 

– most phone triples never occur in speech 

! But many are legal 


17-Feb-01 

SLIDE 2


OF BIRMINGHAM 

Model Parameters 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 

! Each model has 3 emitting states 

! Each state modelled as, say, a 10 

component Gaussian mixture 

! Each feature vector is 40 dimensional 

! Hence number of parameters per model is: 

3×(10 ×(40+40+1)+9)=2,457 


& 



17-Feb-01 

SLIDE 3 

Number 

of states 

Number of 

mixture 

components 

Mean 

vector 

Variance 

vector 

Mixture 

weight 

Transition 

probs


OF BIRMINGHAM 

Acoustic model parameters 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 4 

! So, even if we only have 1,000 acoustic 

models (instead of 125,000), total acoustic 

model parameters will be 2,457,000 

! Too many to estimate with practical quantity 

of data 

! Most common solution is HMM parameter 

tying 

! Some different HMMs share same 

parameters


OF BIRMINGHAM 

Tied variance 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 5 

! Variances are more costly to estimate than 

means 

! Simple solution – divide set of all HMMs into 

classes, so that within a class all HMM state 

PDFs have same variance 

! This is tied variance 

! If all HMM state PDFs share the same 

variance, the variance is referred to as 

grand variance


OF BIRMINGHAM 

Tied Mixtures 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 6 

! Another common method is tied (or shared) 

mixtures 

! In a normal Gaussian mixture HMM system, 

each state is a associated with a Gaussian 

mixture PDF of the form: 

b 

M 

( y) = ∑ w ( ) 

m 

pm 

y 

m= 

1 

! In a tied mixture system, all of the p m s are 

chosen from a fixed, finite set of unimodal 

Gaussian PDFs


OF BIRMINGHAM 

Tied Mixtures 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 

! By controlling the number of shared mixture 

components, the total number of acoustic 

model parameters is controlled 

! The set of shared Gaussian PDFs is like a 

vector quantiser codebook 

! Tied mixture HMMs are also known as 

semi-continuous HMMs 


& 



17-Feb-01 

SLIDE 7


OF BIRMINGHAM 

Generalised Triphones 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 8 

! Previous techniques share (or tie) particular 

model parameters 

! An alternative is to share whole HMMs 

! In other words, assume that some contexts 

induce the same effects on the acoustic 

realisation of a phone, and model their 

triphone using the same HMM 

! These equivalence classes of triphone 

HMMs are called generalised triphones


OF BIRMINGHAM 

Clustered triphones 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 

! Suppose M and N are phone-level HMMs, 

both with same number of states 

! Can define a distance d(M,N) between M 

and N in several ways 

! E.g. define d 1 (M,N) to be the difference 

between state 1 or M and state 1 of N 


& 


b 1,M b 1,N 


17-Feb-01 

SLIDE 9 

! Define d(M,N)= d 1 (M,N)+ d 2 (M,N)+ d 3 (M,N)


OF BIRMINGHAM 

Clustered triphones 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 10 

! Given a distance measure, we can treat 

HMMs as points in a high-dimensional 

space and cluster them together 

! Each cluster can then be represented by a 

single triphone HMM 

! Can control number of parameters by 

controlling number of clusters 

! For medium vocabulary tasks (500-1,000 

words) 300-500 clusters is sufficient


OF BIRMINGHAM 

Two problems 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 11 

! In the clustered triphone method we decide 

to combine triphone HMMs based on the 

similarity of their state output PDFs 

! But the whole point is that we don’t have 

accurate estimates of these PDFs 

! Suppose we want to model a new word, 

which contains a triphone which was not in 

the training set 

! Which generalised triphone should we use


OF BIRMINGHAM 

Phone decision trees 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 12 

! Most common approach to HMM tying, 

which addresses both problems, is decision 

tree clustering 

! Decision tree clustering can be applied to 

individual states or to whole HMMs – we’ll 

consider whole HMMs 

! Basic idea is to supplement data-driven 

methods (distances between PDFs) with 

knowledge about which phones are likely to 

induce similar contextual effects


OF BIRMINGHAM 

Phonetic knowledge 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 


! For example, we know that /f/ and /s/ are 

both unvoiced fricatives, produced in a 

similar manner 

! Therefore we might hypothesise that, for 

example, an utterance of the vowel /e/ 

preceded by /f/ might be similar to one 

preceded by /s/ 

! This is the basic idea behind decision tree 

clustering 


17-Feb-01 

SLIDE 13


OF BIRMINGHAM 

A phone decision tree for /e/ 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 14 

! A phone decision tree is 

just a binary tree, where 

each node of the tree is 

associated with: 

– A set of phones 

– A position (L or R) 

! The root node of the tree 

corresponds to /e/ 

! The terminal nodes 

correspond to 

significantly different 

contextual variants of /e/


OF BIRMINGHAM 

A decision tree node 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 


{/p/, /t/, /k/}, L 

a 

Y N 

b 

c 

! Want to choose a model 

for /e/ in a particular 

context 

! At node (a), ask question 

is the Left context one of 

the set {/p/, /t/, /k/} 

! If “yes” go to node (b), 

otherwise go to node (c) 

! Continue until a terminal 

node is reached 

! Choose associated HMM 


17-Feb-01 

SLIDE 15

Building a phone decision tree 


OF BIRMINGHAM 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 16 

for /e/ 

! First choose a set of questions 

– These can be chosen using phonetic 

knowledge about sets of phones which are 

likely to induce similar contextual effects 

– …plus pragmatics! 

! Also need the set E of acoustic patterns 

corresponding to /e/ in the training data 

! Each question partitions E into two subsets 

– E Y –the set of instances of /e/ for which the 

answer to the question is “Yes” 

– E N – the set of instances of /e/ for which the 

answer to the question is “No”


OF BIRMINGHAM 


SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 

! For each question Q, we can define a 

“quality measure” g(Q) 


& 



17-Feb-01 

SLIDE 17 

! g(Q) is a measure of how well the sets E Y 

and E N can be modelled by separate HMMs 

! Intuitively, g(Q) is a measure of how 

compact or ‘homogeneous’ the sets E Y 

and E N are 

! Choose the question Q for which g(Q) is 

biggest


OF BIRMINGHAM 


SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 


& 



17-Feb-01 

SLIDE 18 

! Training patterns in E Y (resp E N ) are 

assigned to the “Y” (resp “N”) successor 

nodes 

! Whole process is repeated for each 

successor node 

! Process stops when, for example, the 

number of samples associated with a node 

reaches a minimum 

! A HMM is built for each terminal node


OF BIRMINGHAM 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 

Phone Decision Tree 

/e/ 

{/p/, /t/, /k/; L} 

{/e/, /i/, /A/; L} 

N 

N 

Y 

N 

{/s/, /f/; R} 

{/s/, /f/; L} 

N 

N 

{/#/; R} 


& 


N 

{/e/, /i/; R} 


17-Feb-01 

SLIDE 19


OF BIRMINGHAM 

PDT Concluding Remarks 

SCHOOL OF 

ELECTRONIC & 

ELECTRICAL 

ENGINEERING 

! Phone decision trees can be applied at the 

state level, to construct a set of triphones 

with tied states 

! State level phone decision trees supported 

by HTK 


& 



17-Feb-01 

SLIDE 20

HMM Parameter Tying - University of Birmingham

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?