08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3<br />

4<br />

1<br />

2<br />

q<br />

1<br />

p<br />

h 1t 2<br />

h 1t<br />

2 2 3 3<br />

1<br />

2<br />

1<br />

4<br />

The initial distribution is α(q) = 1 and α(p) = 0. At each step a change <strong>of</strong> state occurs<br />

followed by the output <strong>of</strong> heads or tails with probability determined by the new state.<br />

We consider three problems in increasing order <strong>of</strong> difficulty. First, given a HMM what<br />

is the probability <strong>of</strong> a given output sequence? Second, given a HMM and an output<br />

sequence, what is the most likely sequence <strong>of</strong> states? And third, knowing that the HMM<br />

has at most n states and given an output sequence, what is the most likely HMM? Only<br />

the third problem concerns a ”hidden” Markov model. In the other two problems, the<br />

model is known and the questions can be answered in polynomial time using dynamic<br />

programming. There is no known polynomial time algorithm for the third question.<br />

How probable is an output sequence<br />

Given a HMM, how probable is the output sequence O = O 0 O 1 O 2 · · · O T <strong>of</strong> length<br />

T +1? To determine this, calculate for each state i and each initial segment <strong>of</strong> the sequence<br />

<strong>of</strong> observations, O 0 O 1 O 2 · · · O t <strong>of</strong> length t + 1, the probability <strong>of</strong> observing O 0 O 1 O 2 · · · O t<br />

ending in state i. This is done by a dynamic programming algorithm starting with t = 0<br />

and increasing t. For t = 0 there have been no transitions. Thus, the probability <strong>of</strong><br />

observing O 0 ending in state i is the initial probability <strong>of</strong> starting in state i times the<br />

probability <strong>of</strong> observing O 0 in state i. The probability <strong>of</strong> observing O 0 O 1 O 2 · · · O t ending<br />

in state i is the sum <strong>of</strong> the probabilities over all states j <strong>of</strong> observing O 0 O 1 O 2 · · · O t−1<br />

ending in state j times the probability <strong>of</strong> going from state j to state i and observing O t .<br />

The time to compute the probability <strong>of</strong> a sequence <strong>of</strong> length T when there are n states is<br />

O(n 2 T ). The factor n 2 comes from the calculation for each time unit <strong>of</strong> the contribution<br />

from each possible previous state to the probability <strong>of</strong> each possible current state. The<br />

space complexity is O(n) since one only needs to remember the probability <strong>of</strong> reaching<br />

each state for the most recent value <strong>of</strong> t.<br />

Algorithm to calculate the probability <strong>of</strong> the output sequence<br />

The probability, Prob(O 0 O 1 · · · O T , i) <strong>of</strong> the output sequence O 0 O 1 · · · O T<br />

state i is given by<br />

ending in<br />

Prob(O 0 , i) = α(i)p(O 0 , i)<br />

304

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!