I have written a python code for Needleman Wunsch algorithm. My code gives only one possible global alignment out of many possible global alignments between two DNA sequences. The code is:
import numpy as np
x = np.array(['T','C','G','T','C','C','T','T','C','A','T','T'])
y = np.array(['G','T','C','T','C','A','T','G'])
Match = 1
Mismatch = 0
Gap = -1
m = len(x)
n = len(y)
def S(a, b):
if (a == b):
return(Match)
else:
return(Mismatch)
F = np.zeros((m + 1, n + 1))
for i in range(0, (m + 1)):
F[i, 0] = Gap * i
for j in range(0, (n + 1)):
F[0, j] = Gap * j
for i in range(1, (m + 1)):
for j in range(1, (n + 1)):
D = F[i - 1, j - 1] + S(x[i - 1], y[j - 1])
V = F[i - 1, j] + Gap
H = F[i, j - 1] + Gap
F[i, j] = max(np.array([D, V, H]))
F
S1 = []
S2 = []
i = m
j = n
while (i > 0 or j > 0):
if (i > 0 and j > 0 and F[i, j] == F[i - 1, j - 1] + S(x[i - 1], y[j - 1])):
S1 = S1 + [x[i - 1]]
S2 = S2 + [y[j - 1]]
i = i - 1
j = j - 1
elif (i > 0 and F[i, j] == F[i - 1, j] + Gap):
S1 = S1 + [x[i - 1]]
S2 = S2 + ['-']
i = i - 1
else:
S1 = S1 + ['-']
S2 = S2 + [y[j - 1]]
j = j - 1
print(np.array([S1, S2]).reshape(2, len(S1)))
Now how to modify this code so as to get all possible alignments? Thanks in advance.
Well, the keywords should be dynamic programming, python.
In order to get all optimal solutions you have to maintain in each step $(i,j)$ a list $L(i,j)$ which shows where the maximum has been attained. There are three such possibilities: $(i-1,j), (i,j-1), (i-1,j-1)$ or simply north (n), west (w), and north-west (nw). Then you need a backtracking algorithm in order to traverse all possible paths in the list from $(m,n)$ to $(0,0)$.