09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4

Unsupervised Learning: Clustering

Needleman–Wunsch Algorithm

The Needleman–Wunsch algorithm is used in bioinformatics to align

protein or nucleotide sequences. It was one of the first applications of

dynamic programming for comparing biological sequences. It works

using dynamic programming. First it creates a matrix where the rows and

columns are alphabets. Each cell of the matrix is a similarity score of the

corresponding alphabet in that row and column. Scores are one of three

types: matched, not matched, or matched with insert or deletion. Once

the matrix is filled, the algorithm does a backtracing operation from the

bottom-right cell to the top-left cell and finds the path where the neighbor

score distance is the minimum. The sum of the score of the backtracing

path is the Needleman–Wunsch distance for two strings.

Pyopa is a Python module that provides a ready-made Needleman–

Wunsch distance between two strings.

import pyopa

data = {'gap_open': -20.56,

'gap_ext': -3.37,

'pam_distance': 150.87,

'scores': [[10.0]],

'column_order': 'A',

'threshold': 50.0}

env = pyopa.create_environment(**data)

s1 = pyopa.Sequence('AAA')

s2 = pyopa.Sequence('TTT')

print(pyopa.align_double(s1, s1, env))

print(env.estimate_pam(aligned_strings[0], aligned_strings[1]))

86

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!