01.04.2015 Views

Sequence Comparison.pdf

Sequence Comparison.pdf

Sequence Comparison.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

150 8 Scoring Matrices<br />

350<br />

300<br />

PAM distance<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

0<br />

5<br />

10<br />

15<br />

20<br />

25<br />

30<br />

35<br />

40<br />

45<br />

50<br />

55<br />

60<br />

65<br />

70<br />

75<br />

80<br />

85<br />

Observed Percent Difference<br />

Fig. 8.1 The correspondence between the PAM evolutionary distances and the dissimilarity levels.<br />

There is no general theory on constructing scoring matrices for local alignments<br />

with gaps. One issue in gapped alignment is to select appropriate gap costs. In Section<br />

8.7, we briefly discuss the affine gap cost model and related issues in gapped<br />

alignment.<br />

We conclude the chapter with the bibliographic notes in Section 8.8.<br />

8.1 The PAM Scoring Matrices<br />

The PAM matrices are the first amino acid substitution matrices for protein sequence<br />

comparison. Theses matrices were first constructed by Dayhoff and coworkers based<br />

on a Markov chain model of evolution. A point accepted mutation in a protein is<br />

a substitution of one amino acid by another that is “accepted” by natural selection.<br />

For a mutation to be accepted, the resulting amino acid must have the same function<br />

as the original one. A PAM unit is an evolutionary time period over which 1% of<br />

the amino acids in a sequence are expected to undergo accepted mutations. Because<br />

a mutation might occur several times at a position, two protein sequences that are<br />

100 PAM diverged are not necessarily different in every position; instead, they are<br />

expected to be different in about 52% of positions. Similarly, two protein sequences<br />

that are 250 PAM diverged have only roughly 80% dissimilarity. The correspondence<br />

between the PAM evolutionary distance and the dissimilarity level is shown<br />

in Figure 8.1.<br />

Dayhoff and her coworkers first constructed the amino acid substitution matrix<br />

for one PAM time unit, and then extrapolated it to other PAM distances. The<br />

construction started with 71 blocks of aligned protein sequences. In each of these<br />

blocks, a sequence is no more than 15% different from any other sequence. The high<br />

within-block similarity was imposed to minimize the number of substitutions that<br />

may have resulted from multiple substitutions at the same position.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!