15.12.2012 Views

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.3. Modeling<br />

the Process at a<br />

Single Site in Two<br />

<strong>Sequence</strong>s<br />

Phylogenetic Model Evaluation 337<br />

where p ij (t) is the probability that the nucleotide is j at time t,<br />

given that it was i at time 0. Assuming a homogeneous Markov<br />

process, let r ij be the instantaneous rate of change from nucleotide<br />

i to nucleotide j, <strong>and</strong> let R be the matrix of these rates of<br />

change. Then, representing p ij (t) in matrix notation as P(t), we<br />

can write equation [1] as:<br />

P t I Rt 1<br />

Rt 2!<br />

1<br />

Rt<br />

3!<br />

∞<br />

k<br />

( Rt<br />

)<br />

= ∑ k!<br />

()= + + ( ) + ( ) +<br />

= e<br />

k=<br />

0<br />

Rt<br />

2 3<br />

where R is a time-independent rate matrix satisfying three<br />

conditions:<br />

1. r > 0 for i ≠ j;<br />

ij<br />

2. rii = –S r , implying that R1 = 0, where 1 j¹i ij T = (1, 1, 1, 1)<br />

<strong>and</strong> 0T = (0, 0, 0, 0)—this condition is needed to ensure that<br />

P(t) is a valid transition matrix for t ≥ 0;<br />

3. pT R = 0T , where pT = (p , p , p , p ) is the stationary distri-<br />

1 2 3 4<br />

bution, 0 < pj < 1, <strong>and</strong> S 4<br />

j=1 p = 1. j<br />

In addition, if f denotes the frequency of the jth nucleotide in<br />

0j<br />

the ancestral sequence, then the Markov process governing the<br />

evolution of a site along a single edge is:<br />

1. Stationary, if Pr(X(t)= j)= f = p , for j = 1, 2, 3, 4, where p<br />

0j j<br />

is the stationary distribution, <strong>and</strong><br />

2. Reversible, if the balance equation p r = p r is met for 1 ≤ i,<br />

i ij j ji<br />

j ≤ 4, where p is the stationary distribution.<br />

In the context of modeling the accumulation of point mutations<br />

at a single site of a nucleotide sequence, R is an essential<br />

component of the Markov models that are used to do so. Each<br />

element of R has a role in determining what state the site will<br />

be in at time t, so it is useful to underst<strong>and</strong> the implications of<br />

changing the values of the elements in R. For this reason, it is<br />

unwise to consider the rate of evolution as a single variable when,<br />

in fact, it is a matrix of rates of evolution.<br />

Consider a site in a pair of nucleotide sequences that evolve from<br />

a common ancestor by independent Markov processes. Let X <strong>and</strong><br />

Y denote the Markov processes operating at the site, one along<br />

each edge, <strong>and</strong> let P X (t) <strong>and</strong> P Y (t) be the transition functions<br />

that describe the Markov processes X(t) <strong>and</strong> Y(t). The joint probability<br />

that the sequences contain nucleotide i <strong>and</strong> j, respectively,<br />

is then given by:<br />

...<br />

[2]<br />

f ij (t) = Pr[X(t) = i, Y(t) = j|X(0) = Y(0)], [3]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!