Sequence Alignment | Needleman Wunsch in Python

Aladdin Persson · Intermediate ·🛡️ AI Safety & Ethics ·6y ago

Key Takeaways

This video teaches the implementation of the Needleman Wunsch algorithm in Python for sequence alignment

Full Transcript

welcome back to a project of making a sequence alignment so we want to in this video we want to actually program it and implement it in Python so we have let's see we have one function defined alignment which takes as input X and an input Y it wants to transform X into y by the way if this is the first video watch I highly recommend that you watch my last video which is a detailed and thorough I think explanation of how this algorithm works because I will just show how to implement it here not really explain too much of why so let's see that we have the same example as in the last video do DNA I guess DNA sequences and we want to send this to alignment and we want to get returned at first we want to get returned just the optimal number of mm I guess the minimum number of edit steps to transform X into y but also in the end we would like to know exactly which edits to do so we'll define another function define solution that takes as input opt and I guess M&N if you remember from last video M is the length of our x and y n is the length of Y but let's start with define alignment so here we first defined that n is the length of Y M is the length of Y X and then remember that we need to define the opt matrix and let's just define each element to be zero to start with and remember that we want to have the Cena we want to have n one columns because we want to have one for the to have one for the initialization of op 0 0 then we want to do this for J in range of M plus one so actually for lining opt print line you know that looks like right alignment should look like this where we would have one two three four five six seven eight nine ten and I think wrecked length of this is one two three four five six seven eight nine and should be one more so that makes sense similarly for the the column I guess we would have just one more from this yeah and that makes sense it seems that we initialize opt correctly here then what we want to do is we want to do the initialization still so that is want to do that opt I comma zero is I 0 comma J is J we skip zero comma zero here because it's already zero from this one for each of the first this is the first column and then would you do the first row right so this is the initialization part let's just check if it's correct yeah zero here and then just increment by one each step payment for their first column all right so now we want to go through each part right we want to start here we just want to go and find the answer to each of these and then next row find solution to each of these and similarly so we go first we iterate by row and then for each row we solve for all of the columns and we start at one I is the row J is the column and now we want to do they apply a recursive formula just use that opt IJ is the minimum and this is exactly like we saw in last video I minus 1 J minus 1 then remember we need to have some Delta which checks if X X I equals Y J so let's define a Delta at are at the top o Delta and we can use a lambda for anonymous function here if they are not equal then it's one else zero just one thing here is that remember that python is zero indexed so we need to remember to subtract one here from this from this index to actually get the eighth one and the jate one when we use Delta here Delta we input XY let's input I minus 1 and J minus one that's for the alignment right this is a line now for the I guess delete we use opt opt I minus one comma J plus 1 and then for the for the insertion opt i J minus 1 plus 1 aligned delete and searched and then in the end we want to return up I guess M and N let's see to say opt edit steps and then let's print it it should be from three in this case we get three all right so this part is correct right this is the initialization part this is just applying the recursive formula that we saw so now that we've found the optimal number of edit edit steps we actually want to find the steps on how to generate the optimal solution so then we go to find solution and we we use something called backtracking which is that we will we will essentially look at all of the three this choices that we have either inserting a line or delete and look whichever was the smallest then that was the item that was added so we're going to call this Oh we're going to call this recursively we have the base case if M is equal to 0 or n is equal to 0 we'll just return otherwise we can check the cost for each insert or a line or delete oh this was the inserting or the the last last one right this is this one then a line right pretty much the same as we had down here and then we also had delete know what we can do is we can check which was the best choice well it's the minimum of those right and then if there's a tie we can say that we insert if there's a tie for example so if best choice was insert right either let's say that in the minimum of insert underline are the same then we let's just say that we take insert and solution dot append and we just like right insert just to know what we did plus string of y and minus one right remember that this is just one in the left cuz pattern is zero indexed let's see we need to define solution here and if this was the best choice then we need to also return version of that subproblem so in this case the X has not changed right we inserted so we just take all up to all of the they're either the rose but we go just to the end komm so we remember python does not include n in this case so we just skip the last one because we insert it and then you send the indices m and n minus one the best choice was to align then we just need to append that to align and we aligned with the last one I guess else or else--if best choice was delete just to make I guess the code a little bit clearer but you can also write else here I guess solution that depend and then remove and remove the X and we remove the last X remember the minus ones are just here because PI zero indexed all right we need to also return find solution of in this case when we align we need to go and remove one row and remove one column and subtracting this is by one this case me to return find solution of so we go just to em and we delete one row but we keep all of the columns and I think that's it here that we we run fine solution here and then we want to return solution but remember that this this entire thing is going to start backwards we actually want to start from the beginning so we can just reverse I guess we reverse the list and then we will also get back to solution what we can do then is and print just we want to transform that's why so we want to transform X to Y you and as you may guess this make sense to print in the beginning but we want to transform X Y minimum other steps is the updated steps and their way to do it is bring up steps so moment of truth let's see if the code know approximately ten hours later so I think I found the solution the problem is here we want to similarly we want to include all up to including the nth one so let's rerun it and we still get in there so string of steps all right step a string steps I don't know why I wrote steps solution right now this makes sense let's just clean it up a little bit to like this we want to transform this into this and the minimum amount of other steps are three we do that by aligning T insert C aligned G aligned a aligned C aligned g90 removed G aligned C and insert a which is exactly the same solution that we found in the last video so thank you so much for watching this video and I hope to see you in the next one

Original Description

In this video we go through how to implement a dynamic algorithm for solving the sequence alignment or edit distance problem. This is also referred to as the Needleman Wunsch algorithm, it seems as if this algorithm has quite many names :) Code repository: https://github.com/AladdinPerzon/Algorithms-Collection-Python I recommend watching the explanation video before watching this implementation: https://youtu.be/bQ7kRW6zo9Y
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Aladdin Persson · Aladdin Persson · 25 of 60

1 computeCost.m Linear Regression Cost Function - Machine Learning
computeCost.m Linear Regression Cost Function - Machine Learning
Aladdin Persson
2 gradientDescent.m Gradient Descent Implementation -  Machine Learning
gradientDescent.m Gradient Descent Implementation - Machine Learning
Aladdin Persson
3 Neural Network from scratch - Part 1 (Standard Notation)
Neural Network from scratch - Part 1 (Standard Notation)
Aladdin Persson
4 Neural Network from scratch - Part 2 (Forward Propagation)
Neural Network from scratch - Part 2 (Forward Propagation)
Aladdin Persson
5 Neural Network from scratch - Part 3 (Backward Propagation)
Neural Network from scratch - Part 3 (Backward Propagation)
Aladdin Persson
6 Neural Network from scratch - Part 4 (With Python)
Neural Network from scratch - Part 4 (With Python)
Aladdin Persson
7 sigmoid.m - Programming Assignment 2 Machine Learning
sigmoid.m - Programming Assignment 2 Machine Learning
Aladdin Persson
8 costFunction.m - Programming Assignment 2 Machine Learning
costFunction.m - Programming Assignment 2 Machine Learning
Aladdin Persson
9 predict.m - Programming Assignment 2 Machine Learning
predict.m - Programming Assignment 2 Machine Learning
Aladdin Persson
10 costFunctionReg.m - Programming Assignment 2 Machine Learning
costFunctionReg.m - Programming Assignment 2 Machine Learning
Aladdin Persson
11 lrCostFunction.m - Programming Assignment 3 Machine Learning
lrCostFunction.m - Programming Assignment 3 Machine Learning
Aladdin Persson
12 oneVsAll.m - Programming Assignment 3 Machine Learning
oneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
13 predictOneVsAll.m - Programming Assignment 3 Machine Learning
predictOneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
14 predict.m - Programming Assignment 3 Machine Learning
predict.m - Programming Assignment 3 Machine Learning
Aladdin Persson
15 Caesar Cipher Encryption and Decryption with example
Caesar Cipher Encryption and Decryption with example
Aladdin Persson
16 Cryptography: Caesar Cipher Python
Cryptography: Caesar Cipher Python
Aladdin Persson
17 Vigenere Cipher Explained (with Example)
Vigenere Cipher Explained (with Example)
Aladdin Persson
18 Cryptography: Vigenere Cipher Python
Cryptography: Vigenere Cipher Python
Aladdin Persson
19 Hill Cipher Explained (with Example)
Hill Cipher Explained (with Example)
Aladdin Persson
20 Cryptography: Hill Cipher Python
Cryptography: Hill Cipher Python
Aladdin Persson
21 Interval Scheduling Greedy Algorithm: Python
Interval Scheduling Greedy Algorithm: Python
Aladdin Persson
22 Weighted Interval Scheduling Algorithm Explained
Weighted Interval Scheduling Algorithm Explained
Aladdin Persson
23 Weighted Interval Scheduling Python Code
Weighted Interval Scheduling Python Code
Aladdin Persson
24 Sequence Alignment | Needleman Wunsch Algorithm
Sequence Alignment | Needleman Wunsch Algorithm
Aladdin Persson
Sequence Alignment | Needleman Wunsch in Python
Sequence Alignment | Needleman Wunsch in Python
Aladdin Persson
26 Codility BinaryGap Python
Codility BinaryGap Python
Aladdin Persson
27 Codility CyclicRotation Python
Codility CyclicRotation Python
Aladdin Persson
28 Derivation Linear Regression with Gradient Descent
Derivation Linear Regression with Gradient Descent
Aladdin Persson
29 Linear Regression Gradient Descent From Scratch in Python
Linear Regression Gradient Descent From Scratch in Python
Aladdin Persson
30 Pytorch Neural Network example
Pytorch Neural Network example
Aladdin Persson
31 Pytorch CNN example (Convolutional Neural Network)
Pytorch CNN example (Convolutional Neural Network)
Aladdin Persson
32 Pytorch LeNet implementation from scratch
Pytorch LeNet implementation from scratch
Aladdin Persson
33 Pytorch VGG implementation from scratch
Pytorch VGG implementation from scratch
Aladdin Persson
34 Pytorch GoogLeNet / InceptionNet implementation from scratch
Pytorch GoogLeNet / InceptionNet implementation from scratch
Aladdin Persson
35 How to save and load models in Pytorch
How to save and load models in Pytorch
Aladdin Persson
36 How to build custom Datasets for Images in Pytorch
How to build custom Datasets for Images in Pytorch
Aladdin Persson
37 Pytorch Transfer Learning and Fine Tuning Tutorial
Pytorch Transfer Learning and Fine Tuning Tutorial
Aladdin Persson
38 Pytorch Data Augmentation using Torchvision
Pytorch Data Augmentation using Torchvision
Aladdin Persson
39 Pytorch Quick Tip: Weight Initialization
Pytorch Quick Tip: Weight Initialization
Aladdin Persson
40 Pytorch Quick Tip: Using a Learning Rate Scheduler
Pytorch Quick Tip: Using a Learning Rate Scheduler
Aladdin Persson
41 Pytorch ResNet implementation from Scratch
Pytorch ResNet implementation from Scratch
Aladdin Persson
42 Pytorch TensorBoard Tutorial
Pytorch TensorBoard Tutorial
Aladdin Persson
43 Pytorch DCGAN Tutorial (See description for updated video)
Pytorch DCGAN Tutorial (See description for updated video)
Aladdin Persson
44 Naive Bayes from Scratch - Machine Learning Python
Naive Bayes from Scratch - Machine Learning Python
Aladdin Persson
45 Spam Classifier using Naive Bayes in Python
Spam Classifier using Naive Bayes in Python
Aladdin Persson
46 K-Nearest Neighbor from scratch - Machine Learning Python
K-Nearest Neighbor from scratch - Machine Learning Python
Aladdin Persson
47 Linear Regression Normal Equation Python
Linear Regression Normal Equation Python
Aladdin Persson
48 SVM from Scratch - Machine Learning Python (Support Vector Machine)
SVM from Scratch - Machine Learning Python (Support Vector Machine)
Aladdin Persson
49 Neural Network from Scratch - Machine Learning Python
Neural Network from Scratch - Machine Learning Python
Aladdin Persson
50 Pytorch RNN example (Recurrent Neural Network)
Pytorch RNN example (Recurrent Neural Network)
Aladdin Persson
51 Pytorch Bidirectional LSTM example
Pytorch Bidirectional LSTM example
Aladdin Persson
52 Pytorch Text Generator with character level LSTM
Pytorch Text Generator with character level LSTM
Aladdin Persson
53 Logistic Regression from Scratch - Machine Learning Python
Logistic Regression from Scratch - Machine Learning Python
Aladdin Persson
54 K-Means Clustering from Scratch - Machine Learning Python
K-Means Clustering from Scratch - Machine Learning Python
Aladdin Persson
55 Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Aladdin Persson
56 Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Aladdin Persson
57 Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Aladdin Persson
58 Paper Review: Sequence to Sequence Learning with Neural Networks
Paper Review: Sequence to Sequence Learning with Neural Networks
Aladdin Persson
59 Pytorch Seq2Seq Tutorial for Machine Translation
Pytorch Seq2Seq Tutorial for Machine Translation
Aladdin Persson
60 Pytorch Seq2Seq with Attention for Machine Translation
Pytorch Seq2Seq with Attention for Machine Translation
Aladdin Persson

Related AI Lessons

AI Security Isn't a Product. It's an Engineering Discipline.
Learn why AI security requires a continuous engineering discipline rather than a one-time product implementation, and how to apply this mindset to your AI development workflow
Dev.to AI
Why Solving Legal AI's Context Problem Is Harder Than You Think
Solving legal AI's context problem requires understanding decision-making processes, not just having large models
Forbes Innovation
How Can We Truly Protect Information Privacy in the Age of Artificial Intelligence?
Learn how to prioritize information privacy in the age of AI and make it a competitive advantage
Medium · Machine Learning
The AI Validation Gap: The $2.5 Trillion Blind Spot In Enterprise AI
The AI validation gap poses a strategic risk to enterprises, costing $2.5 trillion, and requires immediate attention
Forbes Innovation
Up next
Containers Don't Make Your AI Agent Safe
Web Dev Simplified
Watch →