Transformer Explainer- Learn About Transformer With Visualization
Key Takeaways
This video explains the Transformer architecture with visualization, covering token embedding, positional encoding, multi-head self-attention, and the Transformer block. It highlights the importance of understanding Transformers for learning generative AI and NLP.
Full Transcript
hello all my name is krishak and welcome to my YouTube channel so guys uh in this specific video I have found this amazing website which will actually help you understand about Transformer completely in depth okay uh now why Transformers are important because I know everybody is interested in learning generative a I hope you are specifically learning and this is this topic is one of the prerequisite before you go ahead and start um generative AI when I probably talk about prerequisits I'm just not talking about Transformers but the entire NLP in deep learning right you need to definitely know how does RNN work how does lstm RNN work how does grn Works how do encoder decoder Works how does attention uh is all you need you basically need to understand about the research paper and many more things and finally you basically go ahead with the Transformer and most of the llm model has the basic architecture with respect to the Transformer right and if you probably go ahead and check my generative AI course with Lang chain here I have covered each and everything the prerequisite with respect to uh the Deep learning NLP the machine learning NLP and then we have probably gone ahead and we have implemented multiple Solutions not only that I've also started uploading videos with respect to langra and I'm still designing some amazing projects which can be added over here so all those things are specifically doing that right and if you have not checked it out please go ahead and check it out and if you really want to learn machine learning from Basics then I have my this udmi course which is complete machine learning NLP boot camp mlops and deployment here from NLP from machine learning to Advanced NLP techniques both in deep learning I have actually covered it now uh this is just one of the materials that I had actually prepared for my course and even in my YouTube channel I've taken this particular I've explained the about Transformers but this uh materials that you'll be seeing is completely in depth they are around 33 Pages which I've explained each and every working of this particular Transformer now additionally to this you know I've taken multiple examples also here but additionally to this I have also found out this amazing website which provides more thorough explanation with respect to visualization that is the reason I'm actually sharing it to you because I know everybody and many people have specifically asked me this particular question Krish do you have something related to visualization with respect to the Transformer how does the Transformer actually work right so that is the reason I'm actually coming up with this particular video uh this is just to help you out so that you can actually prepare well and this all explanation will be super important when you attend the interviews when you're probably learning about llms or when you're developing generative a application because that are the prerequisites now with respect to this Transformers here you'll be able to see all the explanation is also given step by step which is really good but I'm really really much excited with respect to this visualization now if you know about Transformers the first step basically whenever we have the words we perform to token embedding then we go ahead with positional en encoding and after we do this positional encoding why positional encoding is basically used because we need to order this particular words right which word has actually come come first which mod has actually come second so that is the reason you can actually see positions over here right then U after this you go ahead and uh train this multi-head self attention wherein you create this uh Q KV metrix right Q basically means query K basically means key and value basically means um the values uh value metrix itself right and here you specifically do this for 12 heads in the architecture in the research paper it is basically given 12 Heads This is just one of the head right so if you probably just go ahead and hover over here you can see actually how the calculation is basically done right and this is the formula that is applied e e ID W DJ plus BJ is equal to and you basically go ahead and calculate the qkv parameters right or metric and this will be very important because as soon as you go ahead and just search for data let's say if I go Ahad and hover over here right so here when I'm hovering over here two lines are basically getting highlighted one is data and the next word that is visualization so for this data and the next word visualization the correlated metrix what I can see is 084 right similarly with respect to all the other values you'll be able to see over here and this is how entirely it is trained and this is just one headed tension like that you have 12 headed tensions right so once this is done then you go to the next step in the next step you basically go ahead and create this multi-layered uh neural network so here you have this this is your residuals uh you can probably see the entire calculation how things are basically basically happening then you can see with respect to each and every token probabilities how it is basically computed and with respect to a word like data visualization empowers users to create so here your create output is basically coming up right other than than create you can also get visualized because the percentage of the softmax is very high right so like this you will be definitely able to see each and every visualization how things are actually happening over here how the dot product is basically happening how things are basically getting calculated inside this see after getting the dot product you do scaling masking and then you apply softmax plus Dropout right so many things are specifically happening step by step you'll be able to see it this and this you will only be able to understand when at least you know some working of the Transformers again the main reason of creating this particular video was to make you just provide some additional things so that you can learn this entire technique well right so one key thing that I will tell you while you're learning this please make sure that you learn along with this and you have to make sure you read all these things right if you're reading all these things trust me each and every step you'll be able to understand because here also everything is basically happening see here data visualization empowers users to this the first step is token emitting positional encoding you get some values then you perform tokenization positional encoding and you get the final emitting then you have the Transformer block where you create the multi-head attention where you have this key Cube uh uh key query key and value matrices and then you add with a bias you finally get this right so each token embedding Vector is transformed into query key and values these vectors are derived by multiplying ining Matrix with the learn Matrix from Q K and B right and it's just like a web search like how in YouTube we search something and it'll just go ahead and take that query search from all the keys and retri you the values right so all these things are there then you have this Mass self attention how things are basically happening step by step you just go ahead and read step by step and let me know whether you are able to understand or not yes other than that uh you know that I have lot of courses in Udi which are very affordable just $3.99 rupees you can go ahead and check it out in the description uh the best order of learning my course will be from mathematics then you have this um machine learning NLP then you have this generative AI then you have this data analy boot cam so you can go ahead and read with respect to this so yes this was it from my side I will see you in the next video thank you
Original Description
https://poloclub.github.io/transformer-explainer/ Transformer is a neural network architecture that has fundamentally changed the ...
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Krish Naik · Krish Naik · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Natural Language Processing|Stemming
Krish Naik
Natural Language Processing|BagofWords
Krish Naik
Gaussian distribution or Normal Distribution in statisctics
Krish Naik
Natural Language Processing|TF-IDF for Machine Learning| Text Prerocessing
Krish Naik
Log Normal Distribution in Statistics
Krish Naik
Covariance in Statistics
Krish Naik
Confusion matrix, Precision, Recall| Data Science Interview questions
Krish Naik
Tutorial 44-Balanced vs Imbalanced Dataset and how to handle Imbalanced Dataset
Krish Naik
Implementing a Spam classifier in python| Natural Language Processing
Krish Naik
Tutorial 11-Exploratory Data Analysis(EDA) of Titanic dataset
Krish Naik
Face Recognition using open CV and VGG 16 Transfer Learning
Krish Naik
Pedestrian Detection using OpenCV from Videos
Krish Naik
Face and Eye Detection from Videos using HAAR Cascade Classifier
Krish Naik
Reading, Writing and Displaying images with Opencv| OpenCV Tutorial
Krish Naik
OpenCV Installation | OpenCV tutorial
Krish Naik
Face and Eye Detection from Images using HAAR Cascade Classifier
Krish Naik
Car Detection using HAAR Cascade and Opencv from Videos.
Krish Naik
Using OpenFace for Face recognition in Keras
Krish Naik
OpenPose Tutorial with Tensorflow
Krish Naik
Multiple Linear Regression using python and sklearn
Krish Naik
Dimensional Reduction| Principal Component Analysis
Krish Naik
Movie Recommender System using Python
Krish Naik
TPR,FPR,FNR,TNR, Confusion Matrix
Krish Naik
Precision, Recall and F1-Score
Krish Naik
Artificial Neural Network for Customer's Exit Prediction from Bank
Krish Naik
GridSearchCV- Select the best hyperparameter for any Classification Model
Krish Naik
RandomizedSearchCV- Select the best hyperparameter for any Classification Model
Krish Naik
K Nearest Neighbor classification with Intuition and practical solution
Krish Naik
K Means Clustering Intuition
Krish Naik
Create custom Alexa Skill- Lambda function- Part2
Krish Naik
Hierarchical Clustering intuition
Krish Naik
Implement Transfer Learning with a generic Code Template
Krish Naik
Gender Classifier and Age Estimator using Resnet Convolution Neural Network
Krish Naik
Unlock Your Application With Your Face using OpenCV
Krish Naik
Draw rectangle from webcam and sketch process it on a live feed
Krish Naik
Complete Life Cycle of a Data Science Project
Krish Naik
How we can apply Machine Learning in Finance
Krish Naik
Deep Learning in Medical Science
Krish Naik
How to switch your career to Data Science.
Krish Naik
Linear Regression Mathematical Intuition
Krish Naik
Handle Categorical features using Python
Krish Naik
Machine Learning Algorithm- Which one to choose for your Problem?
Krish Naik
DBSCAN Clustering Easily Explained with Implementation
Krish Naik
Curse of Dimensionality Easily explained| Machine Learning
Krish Naik
Feature Selection Techniques Easily Explained | Machine Learning
Krish Naik
Tutorial 29-R square and Adjusted R square Clearly Explained| Machine Learning
Krish Naik
Cross Validation using sklearn and python | Machine Learning
Krish Naik
Handling Missing Data Easily Explained| Machine Learning
Krish Naik
Deploy Machine Learning Model using Flask
Krish Naik
Deployment of Deep Learning Model using Flask
Krish Naik
How to Visualize Multiple Linear Regression in python
Krish Naik
K Nearest Neighbour Easily Explained with Implementation
Krish Naik
Predicting Heart Disease using Machine Learning
Krish Naik
Predicting Lungs Disease using Deep Learning
Krish Naik
Stock Sentiment Analysis using News Headlines
Krish Naik
Random Forest(Bootstrap Aggregation) Easily Explained
Krish Naik
Voting Classifier(Hard Voting and Soft Voting Classifier)
Krish Naik
Credit Card Fraud Detection using Machine Learning from Kaggle
Krish Naik
Hyperparameter Optimization for Xgboost
Krish Naik
Tutorial 45-Handling imbalanced Dataset using python- Part 1
Krish Naik
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking
Medium · AI
Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking
Medium · LLM
The 2026 AI Model Release Race: Every Major LLM Launch You Need to Know
Dev.to AI
Call GPT, Claude, and Gemini from one API key — a 3-step setup
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI