You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 8<br />
Scoring Matrices<br />
With the introduction of the dynamic programming algorithm for comparing protein<br />
sequences in 1970s, a need arose for scoring amino acid substitutions. Since then,<br />
the construction of scoring matrices has become one key issue in sequence comparison.<br />
A variety of considerations such as the physicochemical and three-dimensional<br />
structure properties have been used for deriving amino acid scoring (or substitution)<br />
matrices.<br />
The chapter is divided into eight sections. The PAM matrices are introduced in<br />
Section 8.1. We first define the PAM evolutionary distance. We then describe the<br />
Dayhoff’s method of constructing the PAM matrices. Frequently used PAM matrices<br />
are listed at the end of this section.<br />
In Section 8.2, after briefly introducing the BLOCK database, we describe the<br />
Henikoff and Henikoff’s method of constructing the BLOSUM matrices. In addition,<br />
frequently used BLOSUM matrices are listed.<br />
In Section 8.3, we show that in seeking local alignment without gaps, any amino<br />
acid scoring matrix takes essentially a log-odds form. There is a one-to-one correspondence<br />
between the so-called valid scoring matrices and the sets of target and<br />
background frequencies. Moreover, given a valid scoring matrix, its implicit target<br />
and background frequencies can be retrieved efficiently.<br />
The log-odds form of the scoring matrices suggests that the quality of database<br />
search results relies on the proper choice of scoring matrix. Section 8.4 describes<br />
how to select scoring matrix for database search with a theoretic-information<br />
method.<br />
In comparison of protein sequences with biased amino acid compositions, standard<br />
scoring matrices are no longer optimal. Section 8.5 introduces a general procedure<br />
for converting a standard scoring matrix into one suitable for the comparison<br />
of two sequences with biased compositions.<br />
For certain applications of DNA sequence comparison, nontrivial scoring matrix<br />
is critical. In Section 8.6, we discuss a variant of the Dayhoff’s method in constructing<br />
nucleotide substitution matrices. In addition, we address why comparison of<br />
protein sequences is often more effective than that of the coding DNA sequences.<br />
149