4 Multiple Sequence Alignment 4.1 Multiple sequence alignment

More documents

Recommendations

Info

38 Grundlagen der Bioinformatik, SS’09, D. Huson, May 10, 20094.3.4 Scoring along a treeAssume T is a phylogenetic tree whose leaves are labeled by the sequences to be aligned. Instead ofcomparing all pairs of residues in a column of a MSA, one may instead determine an optimal labelingof the internal nodes of the tree by symbols in a given column (in this case 3) and then sum over alledges in the tree (in this case 7):NNCNNCNSuch an optimal most parsimonious labeling of internal nodes can be computed in polynomial timeusing the Fitch algorithm (discussed later).Based on this tree, the scores for columns (3) is: 4 × 6 + 1 × (−3) + 2 × 9 = 39.C4.3.5 Scoring along a starIn a third alternative, one sequence is treated as the ancestor of all others others in a so-called starphylogeny:N(1) N N (2) N N (3)CNNNNNNCNCBased on this star phylogeny, assuming that sequence 1 is at the center of the star, the scores forcolumns (1), (2) and (3) respectively are: 4 × 6 = 24, 3 × 6 − 3 = 15 and 2 × 6 − 2 × 3 = 6.At present, there is no conclusive argument that gives any one scoring scheme more justification thanthe others. The sum-of-pairs score is most widely used, but it is problematic as we have seen earlier.4.4 Dynamic program for an MSAAlthough local alignments are biologically often more relevant, it is easier to discuss global MSA.Dynamic programs developed for pairwise alignment can be modified to multiple alignments. Wediscuss how to compute a global MSA for three sequences, in the case of a linear gap penalty. Assumewe are given:⎧⎨ A 1 = (a 11 , a 12 , . . . , a 1n1 )A = A 2 =⎩A 3 =(a 21 , a 22 , . . . , a 2n2 )(a 31 , a 32 , . . . , a 3n3 ).We proceed by computing the entries of an (n 1 + 1) × (n 2 + 1) × (n 3 + 1)-matrix F (i, j, k) recursively.After filling the matrix, the cell F (n 1 , n 2 , n 3 ) contains the best score α for a global alignment A ∗ .Traceback recovers an optimal alignment.
Grundlagen der Bioinformatik, SS’09, D. Huson, May 10, 2009 39The main recursion is (remember there are 2 r − 1 = 8 − 1 = 7 types of columns in this case):⎧⎪⎨F (i, j, k) = max⎪⎩F (i − 1, j − 1, k − 1) + s(a 1i , a 2j , a 3k ),F (i − 1, j − 1, k) + s(a 1i , a 2j , −),F (i − 1, j, k − 1) + s(a 1i , −, a 3k ),F (i, j − 1, k − 1) + s(−, a 2j , a 3k ),F (i − 1, j, k) + s(a 1i , −, −),F (i, j − 1, k) + s(−, a 2j , −),F (i, j, k − 1) + s(−, −, a 3k ),for 1 ≤ i ≤ n 1 , 1 ≤ j ≤ n 2 , 1 ≤ k ≤ n 3 ,where s(a, b, c) returns a score for a given column of symbols a, b, c; for example, s = s SP , the sumof-pairsscore.Example: ⎧⎨ A 1 =A = A 2 =⎩A 3 =ABDEACBEADCEE⎧⎨ A ∗=⇒ A ∗ 1 = A − B D − E −= A ∗ 2 = A C B − − E −⎩A ∗ 3 = A − − D C E EMatrix:Clearly, this algorithm generalizes to r sequences. It has space complexity O(n r ), where n is thesequence length (assuming equal sequence length for all r sequences). Hence, it is only practical forsmall r and small n.And how about time complexity? That depends on the scoring function. For the SP-score it isO(r 2 · n r · 2 r ).Theorem 4.4.1 Computing an MSA with optimal SP-score is NP-complete.4.5 Compatible multiple alignmentsAs we can’t usually compute obtain an optimal MSA in reasonable time, we will consider methodsthat approximate the optimal solution. The key idea is to compute an MSA by successive pairwisealignments. For this we need the following definition:Definition 4.5.1 (Compatible alignments) Let A = {A 1 , . . . , A r } be a set of sequences and letB = {A i1 , . . . , A ik } be a subset of A. Let A ∗ = {A ∗ 1 , . . . , A∗ r} be a multiple alignment of A andB ∗ = {A ∗ i 1, . . . , A ∗ i k} be a multiple alignment of B. The alignment A ∗ is compatible with the alignmentB ∗ , if A ∗ restricted to B is equal to B ∗ , ignoring all columns that consist only of gaps.
Page 1 and 2: Grundlagen der Bioinformatik, SS’
Page 5: Grundlagen der Bioinformatik, SS’

4 Multiple Sequence Alignment 4.1 Multiple sequence alignment

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?