11.07.2015 Views

6. MSA using DCA and branch-and-bound - Algorithms in ...

6. MSA using DCA and branch-and-bound - Algorithms in ...

6. MSA using DCA and branch-and-bound - Algorithms in ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

34 Comp. Sequence Analysis WS’04 ZBIT, D. Huson, (script by K. Re<strong>in</strong>ert), December 17, 2004AGT0013C1002T3210C_{s_1,s_2}G0 00 0012310T0 1 20 C 1 2 0 1 2 31C_{s_1,s_3}AG1 0 0 1 2G0 2 1 0 01C_{s_2,s_3}Set all pairwise weights to 1 <strong>and</strong> assume ĉ 1 = 1. It is not difficult to verify that (c 2 , c 3 ) = (1, 0) isC-optimal with respect to ĉ 1 , s<strong>in</strong>ceC(1, 1, 0) = C s1 ,s 2[1, 1] + C s1 ,s 3[1, 0] + C s2 ,s 3[1, 0] = 0.Given this cut the best possible alignment has cost 7. The unique optimal alignment has cost 6 <strong>and</strong>can be achieved with the cut (2, 1) which is also C-optimal with respect to ĉ 1 .C -T C-T -C T -CTA ++ GT = AGT AG ++ T = AGT- G- -G- -G - -Gcut(1,1,0) cut (1,2,1)cost 7 cost 6Assume we have a multiple sequence alignment program <strong>MSA</strong> that we can use to solve small <strong>in</strong>stancesof the problem of align<strong>in</strong>g k sequences. The <strong>DCA</strong> algorithm can be summarized as follows:Algorithm <strong>DCA</strong>(s 1 , s 2 , . . . , s k , L)Input: sequences s 1 , . . . , s k <strong>and</strong> cut-off LOutput: alignment of s 1 , . . . , s kbeg<strong>in</strong>if max{n 1 , . . . n k } ≤ L thenreturn <strong>MSA</strong>(s 1 , s 2 , . . . , s k )elseSet ĉ 1 := ⌈ n 12⌉Set (c 2 , . . . , c k ) := C-opt((s 1 , . . . , s k ), ĉ 1 )return <strong>DCA</strong>(α cˆ1s 1, α c 2s 2, . . . , α c ksk) ++<strong>DCA</strong>(σĉ 1 1, σc 2 2, . . . , σc k k)endThe function C-opt can naively be implemented as nested loops that run over all possible values0, . . . , n i , ∀i <strong>and</strong> return the cut position with the m<strong>in</strong>imal multiple additional costs.The computation of C-opt requires at most O(Nk 2 n 2 ) steps, with n = max i {|s i |} <strong>and</strong> N = ∏ i |s i|.A direct improvement of this procedure is the comb<strong>in</strong>ation of the loops with a simple <strong>branch</strong>-<strong>and</strong><strong>bound</strong>procedure that cuts off comb<strong>in</strong>ations of the c 2 , . . . , c k , if already a partial sum of the additionalcost term exceeds the m<strong>in</strong>imal cost found so far.T<strong>6.</strong>2 Naïve Dynamic programm<strong>in</strong>gThe straightforward extension of the Needleman-Wunsch algorithm to compute an optimal (W)SOPcostalignment A ∗ for k sequences s 1 , . . . , s k <strong>us<strong>in</strong>g</strong> dynamic programm<strong>in</strong>g takes time O(n k ), withn = m<strong>in</strong> i |s i |. Obviously this is only practical for very few, short sequences (k = 3, 4)Our goal now is to develop a <strong>branch</strong>-<strong>and</strong>-<strong>bound</strong> approach.In the follow<strong>in</strong>g we will use the edit graph representation of a (multiple) alignment. In this representationf<strong>in</strong>d<strong>in</strong>g an optimal alignment corresponds to f<strong>in</strong>d<strong>in</strong>g a shortest path <strong>in</strong> a directed acyclic graph(DAG).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!