17.11.2012 Views

Codon Evolution Mechanisms and Models

Codon Evolution Mechanisms and Models

Codon Evolution Mechanisms and Models

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

tRNA availability. In short: (1) pyrimidine twocodon<br />

amino acids prefer A-ending codons over<br />

G-ending; (2) purine two-codon amino acids prefer<br />

C-ending codons over U-ending; (3) if there exists<br />

a tRNA with inosine, the wobble position prefer U<strong>and</strong><br />

C-ending codons over those with A-endings;<br />

(4) codons with higher tRNA abundance are<br />

preferred; <strong>and</strong> (5) codons that are decoded by<br />

more than one different tRNA isoacceptors. The<br />

constraint of tRNA abundance is probably the most<br />

important constraint (Ikemura, 1985). Therefore, a<br />

convenient way to define translationally optimal<br />

codons is those codons that are cognate to the most<br />

abundant tRNA isoacceptor in each codon family.<br />

The tRNA abundances can be inferred from the<br />

tRNA gene copy number of genome data. Since<br />

tRNA abundance <strong>and</strong> codon usage are highly<br />

correlated, optimal codons can be alternatively<br />

defined as those that are the most common.<br />

The frequency of optimal codons is the ratio of<br />

the number of optimal codons to the total number<br />

of codons:<br />

Fop = oopt<br />

. (13.6)<br />

otot<br />

The number of optimal codons is:<br />

oopt = �<br />

oc. (13.7)<br />

c∈Copt<br />

The subset of optimal codons, Copt, is defined<br />

according to the above criteria, from all the codons<br />

C that are included in the analysis. Amino acids<br />

with one codon do not contribute any information<br />

<strong>and</strong> are omitted. Amino acids with one isoacceptor<br />

are often excluded when the optimal codon can not<br />

be determined. The total number of codons in a<br />

sequence otot is the total number of codons included<br />

in the analysis.<br />

13.5.2.2 <strong>Codon</strong> bias index (CBI)<br />

The codon bias index also measures the extent to<br />

which preferred codons are used in a gene (Bennetzen<br />

<strong>and</strong> Hall, 1982). The preferred codons are<br />

defined as codons frequent in highly expressed<br />

genes <strong>and</strong> codons cognate to the major tRNA<br />

species. It is similar to Fop, but uses the expected<br />

usage as a scaling factor <strong>and</strong> thus is normalized<br />

between −1 <strong>and</strong> 1. A value of 1 means only preferred<br />

codons are used, zero means r<strong>and</strong>om choice<br />

MEASURES OF CODON BIAS 195<br />

<strong>and</strong> less than zero implies greater use of nonpreferred<br />

codons:<br />

CBI = oopt − er<strong>and</strong><br />

, (13.8)<br />

otot − er<strong>and</strong><br />

where oopt is the number of preferred optimal<br />

codons, otot is the total number of codons, <strong>and</strong> er<strong>and</strong><br />

is the expected number of optimal codons if r<strong>and</strong>om<br />

codon assignments were made for each amino acid.<br />

er<strong>and</strong> is used to account for the r<strong>and</strong>om effect of<br />

codon usage <strong>and</strong> is computed as follows:<br />

er<strong>and</strong> = �<br />

a∈A<br />

oa<br />

n opt<br />

a<br />

ka<br />

, (13.9)<br />

where oa is the number of occurrences of amino acid<br />

a in the sequence, n opt<br />

a is the number of instances of<br />

optimal codons for amino acid a, <strong>and</strong>kathe codon<br />

redundancy.<br />

Amino acids with only one codon are excluded<br />

from the analysis, as are occasionally amino acids<br />

that show little preference towards a single codon<br />

(e.g. Asp in Yeast).<br />

13.5.2.3 <strong>Codon</strong> usage bias (B)<br />

The codon usage bias (B) assesses the codon bias of<br />

a test set of genes (or group of genes) relative to a<br />

second reference set of genes (Karlin <strong>and</strong> Mrázek,<br />

1996; Karlin et al., 1998). The reference set, composed<br />

of a gene class, an entire genome, or a single<br />

gene, is used as a st<strong>and</strong>ard to which other genes<br />

or groups of genes can be compared. This metric is<br />

defined as the amino acid frequency weighted sum<br />

of distances of the relative codon usage frequencies<br />

between the two sets, f <strong>and</strong> f ref :<br />

B= �<br />

Fad(fa, f ref<br />

a ), (13.10)<br />

a∈A<br />

where Fa is the frequency of the amino acid a in the<br />

test set, vectors fa <strong>and</strong> fref a are the codon frequency<br />

vectors for amino acid a in the test <strong>and</strong> reference set<br />

respectively, <strong>and</strong> d is the 1-norm distance between<br />

the codon vectors of amino acid a:<br />

d(fa, f ref<br />

a )=�<br />

c∈Ca<br />

| fac, f ref<br />

ac | (13.11)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!