15.12.2012 Views

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

262 Liò <strong>and</strong> Bishop<br />

4. Codon Models<br />

data substantially better than a series of independent pairwise<br />

nucleotide distributions (a zero-order Markov chain). They also<br />

used a Markov model on a phylogenetic tree, parameterized by<br />

a di-nucleotide rate matrix <strong>and</strong> an independent-site equilibrium<br />

sequence distribution, <strong>and</strong> estimated substitution parameters<br />

using an expectation maximization (EM) procedure (23).<br />

Whelan <strong>and</strong> Goldman (24) have developed the singlet-double-triplet<br />

(SDT) model, which incorporated events that change<br />

one, two, or three adjacent nucleotides. This model allows for<br />

neighbor- or context-dependent models of base substitutions,<br />

which consider the N-bases preceding each base <strong>and</strong> are capable<br />

of capturing the dependence of substitution patterns on neighboring<br />

bases. They found that the inclusion of doublet <strong>and</strong> triplet<br />

mutations in the model gives statistically significant improvements<br />

in fit of model to data, indicating that larger-scale mutation events<br />

do occur. There are indications that higher-order states, autocorrected<br />

rates, <strong>and</strong> multiple functional categories all improve the<br />

fit of the model <strong>and</strong> that the improvements are roughly additive.<br />

The effect of higher-order states (context dependence) is particularly<br />

pronounced.<br />

In an attempt to introduce greater biological reality through<br />

knowledge of the genetic code <strong>and</strong> the consequent effect of nucleotide<br />

substitutions in protein coding sequences on the encoded<br />

amino acid sequences, Goldman <strong>and</strong> Yang (25, 26) described a<br />

codon mutation model. They considered the 61 sense codons<br />

i consisting of nucleotides i 1 i 2 i 3 . The rate matrix Q consisted of<br />

elements Q ij describing the rate of change of codon i = i 1 i 2 i 3 to<br />

j = j 1 j 2 j 3 (i ≠ j) depending on the number <strong>and</strong> type of differences<br />

between i 1 <strong>and</strong> j 1 , i 2 <strong>and</strong> j 2 , <strong>and</strong> i 3 <strong>and</strong> j 3 as follows:<br />

Q<br />

ij<br />

0 if 2 or 3 of the pairs ik, jk<br />

are different<br />

−daai<br />

, aaj<br />

/ V<br />

= mp je<br />

if one pair differ by a transversion<br />

daai , aaj<br />

/ V<br />

mkp je<br />

i<br />

−<br />

⎧<br />

⎪<br />

⎨<br />

⎪<br />

⎩⎪<br />

f one pair differ by a transition<br />

where d aai,aaj is the distance between the amino acid coded by the<br />

codon i (aa i ) <strong>and</strong> the amino acid coded by the codon j (aa j ) as<br />

calculated by Grantham (27) on the basis of the physicochemical<br />

properties of the amino acids. This model takes account of codon<br />

frequencies (through p j ), transition/transversion bias (through<br />

k), differences in amino acid properties between different codons

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!