Global Sequence Alignment: the Needleman-âWunsch Algorithm

Global Sequence Alignment: 

the Needleman-‐Wunsch Algorithm 

Tony Capra 

BMIF 310 

Sept. 6, 2013

Which sequence is more 

“similar” to s 1 ? 

s 1 = ASLVNDK! 

s 2 = ALVNKDK OR 

s 3 = AFPSTW!

What is the intuiLon behind s 2 

seeming more similar? 

We can write s1 and s2 in a way that juxtaposes more 

idenLcal characters than s1 and s3: 

s 1 = ASLVN-DK! 

s 2 = A-LVNKDK! 

s 2 = A LVN DK 

s 1 = ASLVNDK! 

s 3 = AFPSTW-! 

s 3 = A 

These juxtaposiLons are called alignments.

Overview 

• Why align sequences of characters? 

• Formalize our intuiLve noLon of an opLmal 

alignment. 

• Describe efficient algorithm for finding 

opLmal global alignments of two sequences.

Why align sequences? 

• Alignments provide a way to consistently 

evaluate the similarity of pairs of sequences. 

• We focus on aligning protein and DNA 

sequences. 

• Algorithms we will discuss today work on any 

sequences of characters.


Biological sequence similarity suggests funcLonal 

similarity and common evoluLonary origins (homology). 

Beta globin 

evoluLonary 

trajectory: 

Alignments quanLfy 

the similarity between 

different sequences 

within and between 

species. 

Ancetral globin: 

h[p://www.nature.com/scitable/topicpage/dna-‐deleLon-‐and-‐duplicaLon-‐and-‐the-‐associated-‐331


Alignments can highlight funcLonally relevant regions 

of a DNA/protein sequence: 

FuncLonally important regions oben have be[er 

alignments due to evolu5onary conserva5on. 

adapted from: h[p://www.pnas.org/content/104/25/10388/suppl/DC1#F5

Global vs. Local Alignment 

s 1 

Global (Needleman-‐Wunsch) 

s 2 

Local (Smith-‐Waterman) 

Today we will learn an efficient algorithm for 

finding opLmal global alignments.

There are many possible alignments. 

s 1 = ASLVNDK! !s 2 = ALVNKDK! 

ASLVNDK--! 

A-LVN-KDK! 

ASLVN-DK! 

A-LVNKDK! 

A----SLVNDK! 

ALVNK----DK! 

ASLVNDK! 

ALVNKDK!

Which is the best alignment? 


ASLVNDK--! 

A-LVN-KDK! 

ASLVN-DK! 

A-LVNKDK! 

A----SLVNDK! 

ALVNK----DK! 

ASLVNDK! 

ALVNKDK!

Which is the best alignment? 


ASLVNDK--! 

A-LVN-KDK! 

A LVN K 

ASLVN-DK! 

A-LVNKDK! 

A LVN DK 

A----SLVNDK! 

ALVNK----DK! 

A DK 

ASLVNDK! 

ALVNKDK! 

A DK

Implicit Column Scoring FuncLon 

• 1, match 

• 0, mismatch (mutaLon) 

• 0, gap (inserLon, deleLon) 

ASLVN-DK! 

A-LVNKDK! 

10111011=6! 

• Sum the score over all columns of alignment. 

Deriving more complex scoring schemes will be a main 

topic of my next lecture.

Idea: 

How can we find an opLmal 

alignment? 

– Enumerate all possible alignments of s 1 and s 2 . 

– Calculate score using the column scoring funcLon. 

– Pick the alignment(s) with the highest score. 

Why is this a terrible idea?

The # of alignments of two sequences is 

an exponenLal funcLon of their lengths. 

If n is the length of s 1 and m is the length of s 2 , then the number of possible 

alignments is roughly: 

For sequences of plausible lengths (~150) this quickly 

outgrows the number of atoms in the observable universe.

So what can we do?

Dynamic Programming 

• Is not a lively way of wriLng computer code. 

• Is a way to exploit substructure within a 

problem by keeping good records of parLal 

answers. 

• Is a very common and powerful algorithmic 

technique.

Some NotaLon 

• s 1 [1..i] = the first i characters of s 1 (i in 0 to n) 

• s 2 [1..j] = the first j characters of s 2 (j in 0 to m) 

• M(i,j) = score of opLmal alignment of s 1 [1..i] 

with s 2 [1..j] 

– in other words, M(i,j) = the maximum # of 

matches in an opLmal alignment of s 1 [1..i] with 

s 2 [1..j]

M(i,j) Examples 


M(2,1) = 1 

s 1 : AS! 

s 2 : A-! 

1+0=1! 

M(3,3) = ?! 

s 1 : ASL-! 

s 2 : A-LV! 

1+0+1+0=2!

The Key Insight 

• Think about the last column of an opLmal 

alignment. 

• Four possibiliLes (where X, Y are characters): 

X! 

-! 

-! 

X! 

⇒ M(n-‐1, m) is opLmal 

for s 1 [1..n-‐1] and s 2 [1..m] 

⇒ M(n, m-‐1) is opLmal 

for s 1 [1..n] and s 2 [1..m-‐1] 

X! 

Y! 

X! 

X! 

⇒ M(n-‐1, m-‐1) is opLmal 

for s 1 [1..n-‐1] and s 2 [1..m-‐1] 

⇒ M(n-‐1, m-‐1) is opLmal 

for s 1 [1..n-‐1] and s 2 [1..m-‐1]

The Key Insight II 

M(i,j) = max( 

M(i, j-‐1) + 0, 

M(i-‐1, j) + 0, 

M(i-‐1,j-‐1) + t(i,j)) 

X! 

-! 

where t(i,j) = 1 if s 1 [i] == s 2 [j] and 0 if not. 

-! 

X! 

X! X! 

Y! X! 

So, if you were given M(i, j-‐1), M(i-‐1, j), and M(i-‐1, j-‐1), 

you could compute M(i,j) in a constant number of steps.

Dynamic Programming Approach 

• Use this recursive relaLonship to build up the 

score of an opLmal alignment. 

• Start with M(0, j) and M(i, 0). 

• Keep track of intermediate scores in a table.

Needleman-‐Wunsch Example 

M(i,j) s 1 A! S! L! V! N! D! K! 

s 2 0 1 2 3 4 5 6 7 

0 

A! 1 

L! 2 

V! 3 

N! 4 

K! 5 

D! 6 

K! 7

Start with base condiLons 

D(i,j) s 1 A! S! L! V! N! D! K! 

s 2 0 1 2 3 4 5 6 7 

0 0 ? ? ? ? ? ? ? 

A! 1 ? 

L! 2 ? 

V! 3 ? 

N! 4 ? 

K! 5 ? 

D! 6 ? 

K! 7 ? 

What is D(1, 0), the score of the opLmal alignment of 

s 1 [1..1] with a gap?

Start with base condiLons 

M(i,j) s 1 A! S! L! V! N! D! K! 

s 2 0 1 2 3 4 5 6 7 

0 0 0 ? ? ? ? ? ? 

A! 1 ? 

L! 2 ? 

V! 3 ? 

N! 4 ? 

K! 5 ? 

D! 6 ? 

K! 7 ? 

How about the rest of this row?

Now compute M(i,j) row by row. 

M(i,j) s 1 A! S! L! V! N! D! K! 

s 2 0 1 2 3 4 5 6 7 

0 0 0 0 0 0 0 0 0 

A! 1 0 ? 

L! 2 0 

V! 3 0 

N! 4 0 

K! 5 0 

D! 6 0 

K! 7 0 

M(1,1) = max(M(0,1) + 0, M(1,0) + 0, M(0,0) + t(1,1)) 

M(1,1) = max(0 + 0, 0 + 0, 0 + 1) = 1


M(i,j) s 1 A! S! L! V! N! D! K! 

s 2 0 1 2 3 4 5 6 7 

0 0 0 0 0 0 0 0 0 

A! 1 0 1 

L! 2 0 

V! 3 0 

N! 4 0 

K! 5 0 

D! 6 0 

K! 7 0


M(i,j) s 1 A! S! L! V! N! D! K! 

s 2 0 1 2 3 4 5 6 7 

0 0 0 0 0 0 0 0 0 

A! 1 0 1 1 1 1 1 1 1 

L! 2 0 ? 

V! 3 0 

N! 4 0 

K! 5 0 

D! 6 0 

K! 7 0 

M(i,j) = max(M(i-‐1, j) + 0, M(i, j-‐1) + 0, M(i-‐1, j-‐1) + t(i, j))


M(i,j) s 1 A! S! L! V! N! D! K! 

s 2 0 1 2 3 4 5 6 7 

0 0 0 0 0 0 0 0 0 

A! 1 0 1 1 1 1 1 1 1 

L! 2 0 1 1 2 2 2 2 2 

V! 3 0 1 1 2 3 3 3 3 

N! 4 0 1 1 2 3 4 4 4 

K! 5 0 1 1 2 3 4 4 5 

D! 6 0 1 1 2 3 4 5 5 

K! 7 0 1 1 2 3 4 5 6 

M(7,7) = 6, so the opLmal match count is six!

How do we construct an opLmal 

alignment from this table?

The Traceback 

M(i,j) s 1 A! S! L! V! N! D! K! 

s 2 0 1 2 3 4 5 6 7 

0 0 0 0 0 0 0 0 0 

A! 1 0 1 1 1 1 1 1 1 

L! 2 0 1 1 2 2 2 2 2 

V! 3 0 1 1 2 3 3 3 3 

N! 4 0 1 1 2 3 4 4 4 

K! 5 0 1 1 2 3 4 4 5 

D! 6 0 1 1 2 3 4 5 5 

K! 7 0 1 1 2 3 4 5 6 

Trace the path followed to get M(n,m). 

Align s 1 [i] and s 2 [j]. Align gap and s 2 [j]. Align gap and s 1 [i]. 

There may be more than one path/opLmal alignment.

Is this algorithm feasible? 

• It takes roughly nm + (n + m) operaLons to fill 

in the table and do the traceback. 

• QuadraLc run Lme is much be[er than 

exponenLal!

Global vs. Local Alignment 

s 1 

Global (Needleman-‐Wunsch) 

s 2 

s 1 

Local (Smith-‐Waterman) 

s 2 

In my next lecture, we will learn an efficient algorithm for 

finding opLmal local alignments.

When would you want a local 

alignment? 

• Proteins are composed of modular funcLonal “domains.” 

• These domains are oben shuffled into different combinaLons. 

• Thus, parts of proteins may align well, while others do not, 

due to their different evoluLonary histories.

Summary 

• Sequence alignment helps us discover and 

compare relaLonships between sequences. 

• Dynamic programming enables the efficient 

idenLficaLon of opLmal alignments.

Global Sequence Alignment: the Needleman-âWunsch Algorithm

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?

Global Sequence Alignment: the Needleman-âWunsch Algorithm