11.07.2015 Views

Diffusion Processes with Hidden States from ... - FU Berlin, FB MI

Diffusion Processes with Hidden States from ... - FU Berlin, FB MI

Diffusion Processes with Hidden States from ... - FU Berlin, FB MI

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4 Maximum Likelihood Principle<strong>with</strong> (see Fig 3.4 on the facing page):• A = (a i, j ) which is the transition matrix, while the elements a i, j denote the probability ofswitching <strong>from</strong> hidden state i to j,• B = (b i ) that is a vector of probability density functions in the observation space and• π = (π 1 ,...,π N ) T which is a stochastic vector, describing the initial state distributionπ i = P(q i = i).With this at hand one has three basic problems associated <strong>with</strong> HMMs [42], [29]:1. Determine P(O|λ) for some sequence of observation O = {o(0), o(1), o(2), ...,o(T )}.This means to determine the probability of observing a certain output sequence O, giventhe parameter set λ.2. Given O and any λ, one has to determine the most probable hidden state sequence (Viterbipath) Q = {q 0 , q 1 , q 2 , ...,q(T )}. To do so, the Viterbi algorithm is used.3. Determine λ ∗ = argmaxP(O|λ). The Baum-Welch algorithm solves this problem (section3.7).λIn the next two sections the maximum likelihood principle and the expectation-maximization (EM)algorithm will be introduced. Afterwards we present the Baum-Welch algorithm which is the EMalgorithm for HMMs.3.4 Maximum Likelihood PrincipleConsider a density function P(O|λ) that is governed by the set of parameters λ. Especially whentaking into account that P might be a set of Gaussian, λ would comprehend the means and covariances.Yet one can introduce a data set of size NO = {o 1 ,o 2 ,...,o N }, (3.25)maybe generated by this distribution. Additionally one can assume that these data vectors areindependent and identically distributed (i.i.d.) <strong>with</strong> distribution P. Hereon we can write down theresulting density for the samples in the following manner:P(O|λ) =N∏i=1P(o i |λ) = L (λ|O). (3.26)This function L (λ|X) is called the likelihood of the parameters given the data, respectively thelikelihood function.The likelihood is a function of the parameters λ where the data O is fixed. In the ”MaximumLikelihood Problem”, our goal is to find the parameter set λ that maximizes L [5]. This meansthat we have to find λ ∗ , in order to get the parameter estimate:21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!