MADS BAGGESEN - 20042326

Abstract 

In this master’s thesis I implement and evaluate an alternative representation 

of hidden Markov models (HMMs), focusing on sparse HMMs. The 

reason for working with sparse HMMs is that the classic algorithms all 

assume that the model is completely connected, and have the running 

time of O(TN 2 ), where T is the length of the observed sequence and N 

is the number of states. The representation that I use give another bound 

namely O(TK) where K is the number of transitions with a positive 

transition probability. This bound is no different from the N 2 bound for 

completely connected HMMs, but when the model is sparse the O(TK) 

will be better – and for some models even linear in the size of the model. 

These very sparse models, with only a linear number of transitions, do 

occur in practise and are among others used for sequence alignment. Durbin 

et al. (2006) uses a family of models called profile HMMs, where each 

state is only connected to three other states. These profile models are used 

as a generel example throughout the thesis. 

The structure of the thesis follows the working process so I will first 

present the generel theory of the main algorithms for working with HMMs 

and based on that I present different implementations of the forward algorithm. 

I first produce a naive implementation which I use as a base 

line when comparing my other implementations. Having the naive implementation 

running I change the memory layout to get more sequencial 

reads and by that I make significant improvements in the running time, 

without changing the memory usage. Based on the improved memory 

layout I then discuss and evaluate different strategies for splitting the 

problem up into smaller chunks suitable for working in different threads. 

My final implementation is a threaded implementation using pthreads 

and the improved memory layout. This turns out to work very well, but 

the increase in efficiency decreases fast when I use four threads or more. 

The conclusion is that the new algorithm gives a large performance 

boost on sparse models and can even compete on complete models. I 

suggest continuing with this strategy, especially for sparse models. 

iii

Previous page

Next page

1

3

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

49

51

53

MADS BAGGESEN - 20042326

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?