Sequence Alignment | Needleman Wunsch in Python
Key Takeaways
This video teaches the implementation of the Needleman Wunsch algorithm in Python for sequence alignment
Full Transcript
welcome back to a project of making a sequence alignment so we want to in this video we want to actually program it and implement it in Python so we have let's see we have one function defined alignment which takes as input X and an input Y it wants to transform X into y by the way if this is the first video watch I highly recommend that you watch my last video which is a detailed and thorough I think explanation of how this algorithm works because I will just show how to implement it here not really explain too much of why so let's see that we have the same example as in the last video do DNA I guess DNA sequences and we want to send this to alignment and we want to get returned at first we want to get returned just the optimal number of mm I guess the minimum number of edit steps to transform X into y but also in the end we would like to know exactly which edits to do so we'll define another function define solution that takes as input opt and I guess M&N if you remember from last video M is the length of our x and y n is the length of Y but let's start with define alignment so here we first defined that n is the length of Y M is the length of Y X and then remember that we need to define the opt matrix and let's just define each element to be zero to start with and remember that we want to have the Cena we want to have n one columns because we want to have one for the to have one for the initialization of op 0 0 then we want to do this for J in range of M plus one so actually for lining opt print line you know that looks like right alignment should look like this where we would have one two three four five six seven eight nine ten and I think wrecked length of this is one two three four five six seven eight nine and should be one more so that makes sense similarly for the the column I guess we would have just one more from this yeah and that makes sense it seems that we initialize opt correctly here then what we want to do is we want to do the initialization still so that is want to do that opt I comma zero is I 0 comma J is J we skip zero comma zero here because it's already zero from this one for each of the first this is the first column and then would you do the first row right so this is the initialization part let's just check if it's correct yeah zero here and then just increment by one each step payment for their first column all right so now we want to go through each part right we want to start here we just want to go and find the answer to each of these and then next row find solution to each of these and similarly so we go first we iterate by row and then for each row we solve for all of the columns and we start at one I is the row J is the column and now we want to do they apply a recursive formula just use that opt IJ is the minimum and this is exactly like we saw in last video I minus 1 J minus 1 then remember we need to have some Delta which checks if X X I equals Y J so let's define a Delta at are at the top o Delta and we can use a lambda for anonymous function here if they are not equal then it's one else zero just one thing here is that remember that python is zero indexed so we need to remember to subtract one here from this from this index to actually get the eighth one and the jate one when we use Delta here Delta we input XY let's input I minus 1 and J minus one that's for the alignment right this is a line now for the I guess delete we use opt opt I minus one comma J plus 1 and then for the for the insertion opt i J minus 1 plus 1 aligned delete and searched and then in the end we want to return up I guess M and N let's see to say opt edit steps and then let's print it it should be from three in this case we get three all right so this part is correct right this is the initialization part this is just applying the recursive formula that we saw so now that we've found the optimal number of edit edit steps we actually want to find the steps on how to generate the optimal solution so then we go to find solution and we we use something called backtracking which is that we will we will essentially look at all of the three this choices that we have either inserting a line or delete and look whichever was the smallest then that was the item that was added so we're going to call this Oh we're going to call this recursively we have the base case if M is equal to 0 or n is equal to 0 we'll just return otherwise we can check the cost for each insert or a line or delete oh this was the inserting or the the last last one right this is this one then a line right pretty much the same as we had down here and then we also had delete know what we can do is we can check which was the best choice well it's the minimum of those right and then if there's a tie we can say that we insert if there's a tie for example so if best choice was insert right either let's say that in the minimum of insert underline are the same then we let's just say that we take insert and solution dot append and we just like right insert just to know what we did plus string of y and minus one right remember that this is just one in the left cuz pattern is zero indexed let's see we need to define solution here and if this was the best choice then we need to also return version of that subproblem so in this case the X has not changed right we inserted so we just take all up to all of the they're either the rose but we go just to the end komm so we remember python does not include n in this case so we just skip the last one because we insert it and then you send the indices m and n minus one the best choice was to align then we just need to append that to align and we aligned with the last one I guess else or else--if best choice was delete just to make I guess the code a little bit clearer but you can also write else here I guess solution that depend and then remove and remove the X and we remove the last X remember the minus ones are just here because PI zero indexed all right we need to also return find solution of in this case when we align we need to go and remove one row and remove one column and subtracting this is by one this case me to return find solution of so we go just to em and we delete one row but we keep all of the columns and I think that's it here that we we run fine solution here and then we want to return solution but remember that this this entire thing is going to start backwards we actually want to start from the beginning so we can just reverse I guess we reverse the list and then we will also get back to solution what we can do then is and print just we want to transform that's why so we want to transform X to Y you and as you may guess this make sense to print in the beginning but we want to transform X Y minimum other steps is the updated steps and their way to do it is bring up steps so moment of truth let's see if the code know approximately ten hours later so I think I found the solution the problem is here we want to similarly we want to include all up to including the nth one so let's rerun it and we still get in there so string of steps all right step a string steps I don't know why I wrote steps solution right now this makes sense let's just clean it up a little bit to like this we want to transform this into this and the minimum amount of other steps are three we do that by aligning T insert C aligned G aligned a aligned C aligned g90 removed G aligned C and insert a which is exactly the same solution that we found in the last video so thank you so much for watching this video and I hope to see you in the next one
Original Description
In this video we go through how to implement a dynamic algorithm for solving the sequence alignment or edit distance problem. This is also referred to as the Needleman Wunsch algorithm, it seems as if this algorithm has quite many names :)
Code repository:
https://github.com/AladdinPerzon/Algorithms-Collection-Python
I recommend watching the explanation video before watching this implementation:
https://youtu.be/bQ7kRW6zo9Y
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Aladdin Persson · Aladdin Persson · 25 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
▶
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
computeCost.m Linear Regression Cost Function - Machine Learning
Aladdin Persson
gradientDescent.m Gradient Descent Implementation - Machine Learning
Aladdin Persson
Neural Network from scratch - Part 1 (Standard Notation)
Aladdin Persson
Neural Network from scratch - Part 2 (Forward Propagation)
Aladdin Persson
Neural Network from scratch - Part 3 (Backward Propagation)
Aladdin Persson
Neural Network from scratch - Part 4 (With Python)
Aladdin Persson
sigmoid.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunction.m - Programming Assignment 2 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunctionReg.m - Programming Assignment 2 Machine Learning
Aladdin Persson
lrCostFunction.m - Programming Assignment 3 Machine Learning
Aladdin Persson
oneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predictOneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 3 Machine Learning
Aladdin Persson
Caesar Cipher Encryption and Decryption with example
Aladdin Persson
Cryptography: Caesar Cipher Python
Aladdin Persson
Vigenere Cipher Explained (with Example)
Aladdin Persson
Cryptography: Vigenere Cipher Python
Aladdin Persson
Hill Cipher Explained (with Example)
Aladdin Persson
Cryptography: Hill Cipher Python
Aladdin Persson
Interval Scheduling Greedy Algorithm: Python
Aladdin Persson
Weighted Interval Scheduling Algorithm Explained
Aladdin Persson
Weighted Interval Scheduling Python Code
Aladdin Persson
Sequence Alignment | Needleman Wunsch Algorithm
Aladdin Persson
Sequence Alignment | Needleman Wunsch in Python
Aladdin Persson
Codility BinaryGap Python
Aladdin Persson
Codility CyclicRotation Python
Aladdin Persson
Derivation Linear Regression with Gradient Descent
Aladdin Persson
Linear Regression Gradient Descent From Scratch in Python
Aladdin Persson
Pytorch Neural Network example
Aladdin Persson
Pytorch CNN example (Convolutional Neural Network)
Aladdin Persson
Pytorch LeNet implementation from scratch
Aladdin Persson
Pytorch VGG implementation from scratch
Aladdin Persson
Pytorch GoogLeNet / InceptionNet implementation from scratch
Aladdin Persson
How to save and load models in Pytorch
Aladdin Persson
How to build custom Datasets for Images in Pytorch
Aladdin Persson
Pytorch Transfer Learning and Fine Tuning Tutorial
Aladdin Persson
Pytorch Data Augmentation using Torchvision
Aladdin Persson
Pytorch Quick Tip: Weight Initialization
Aladdin Persson
Pytorch Quick Tip: Using a Learning Rate Scheduler
Aladdin Persson
Pytorch ResNet implementation from Scratch
Aladdin Persson
Pytorch TensorBoard Tutorial
Aladdin Persson
Pytorch DCGAN Tutorial (See description for updated video)
Aladdin Persson
Naive Bayes from Scratch - Machine Learning Python
Aladdin Persson
Spam Classifier using Naive Bayes in Python
Aladdin Persson
K-Nearest Neighbor from scratch - Machine Learning Python
Aladdin Persson
Linear Regression Normal Equation Python
Aladdin Persson
SVM from Scratch - Machine Learning Python (Support Vector Machine)
Aladdin Persson
Neural Network from Scratch - Machine Learning Python
Aladdin Persson
Pytorch RNN example (Recurrent Neural Network)
Aladdin Persson
Pytorch Bidirectional LSTM example
Aladdin Persson
Pytorch Text Generator with character level LSTM
Aladdin Persson
Logistic Regression from Scratch - Machine Learning Python
Aladdin Persson
K-Means Clustering from Scratch - Machine Learning Python
Aladdin Persson
Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Aladdin Persson
Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Aladdin Persson
Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Aladdin Persson
Paper Review: Sequence to Sequence Learning with Neural Networks
Aladdin Persson
Pytorch Seq2Seq Tutorial for Machine Translation
Aladdin Persson
Pytorch Seq2Seq with Attention for Machine Translation
Aladdin Persson
Related AI Lessons
⚡
⚡
⚡
⚡
AI Security Isn't a Product. It's an Engineering Discipline.
Dev.to AI
Why Solving Legal AI's Context Problem Is Harder Than You Think
Forbes Innovation
How Can We Truly Protect Information Privacy in the Age of Artificial Intelligence?
Medium · Machine Learning
The AI Validation Gap: The $2.5 Trillion Blind Spot In Enterprise AI
Forbes Innovation
🎓
Tutor Explanation
DeepCamp AI