01.04.2015 Views

Sequence Comparison.pdf

Sequence Comparison.pdf

Sequence Comparison.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 8<br />

Scoring Matrices<br />

With the introduction of the dynamic programming algorithm for comparing protein<br />

sequences in 1970s, a need arose for scoring amino acid substitutions. Since then,<br />

the construction of scoring matrices has become one key issue in sequence comparison.<br />

A variety of considerations such as the physicochemical and three-dimensional<br />

structure properties have been used for deriving amino acid scoring (or substitution)<br />

matrices.<br />

The chapter is divided into eight sections. The PAM matrices are introduced in<br />

Section 8.1. We first define the PAM evolutionary distance. We then describe the<br />

Dayhoff’s method of constructing the PAM matrices. Frequently used PAM matrices<br />

are listed at the end of this section.<br />

In Section 8.2, after briefly introducing the BLOCK database, we describe the<br />

Henikoff and Henikoff’s method of constructing the BLOSUM matrices. In addition,<br />

frequently used BLOSUM matrices are listed.<br />

In Section 8.3, we show that in seeking local alignment without gaps, any amino<br />

acid scoring matrix takes essentially a log-odds form. There is a one-to-one correspondence<br />

between the so-called valid scoring matrices and the sets of target and<br />

background frequencies. Moreover, given a valid scoring matrix, its implicit target<br />

and background frequencies can be retrieved efficiently.<br />

The log-odds form of the scoring matrices suggests that the quality of database<br />

search results relies on the proper choice of scoring matrix. Section 8.4 describes<br />

how to select scoring matrix for database search with a theoretic-information<br />

method.<br />

In comparison of protein sequences with biased amino acid compositions, standard<br />

scoring matrices are no longer optimal. Section 8.5 introduces a general procedure<br />

for converting a standard scoring matrix into one suitable for the comparison<br />

of two sequences with biased compositions.<br />

For certain applications of DNA sequence comparison, nontrivial scoring matrix<br />

is critical. In Section 8.6, we discuss a variant of the Dayhoff’s method in constructing<br />

nucleotide substitution matrices. In addition, we address why comparison of<br />

protein sequences is often more effective than that of the coding DNA sequences.<br />

149

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!