Chat Bot With PyTorch - NLP And Deep Learning - Python Tutorial (Part 2)
Key Takeaways
This video tutorial demonstrates how to build a simple chatbot using PyTorch and Deep Learning, covering basic Natural Language Processing (NLP) techniques such as tokenization, stemming, and bag of words. The tutorial utilizes tools like NLTK, PyTorch, and NumPy to create a chatbot model and train it using a dataset.
Full Transcript
hey guys welcome back to the second part of our chat bot tutorial so in this part we are going to create the actual training data so we already created the NL TK utils to tokenize and to stem our words so let's continue and apply this to our data that we have so we will create a new file and let's call this train dot pi and here we want to load our JSON file so we need Jason the Jason module and then we can say with open and then it's called intense dot jason in write mode or no read mode sorry SF we want to say our intense equals jason dot load F and then we can print our intense so let's clear this and let's run the training file to see if we have this load it so this is working and then we want to create our training data as I showed you in the first part so we want to apply tokenization then lowering and stemming then we also exclude the punctuation characters and then we apply the back of words and for this we need to collect all of the words so let's do this so let's create empty arrays first or empty lists so we say all words equals an empty list and we also want to collect all the different patterns and also know which different they have so we create an empty list for the text and we also create an empty list which we call XY which will later hold both our patterns and then the text so now we want to loop over our intent so we say for intent in intense and this is as we can see a Jason or now it's a dictionary a Python object and in the very beginning we have the intense key and then we only have one array with all the different texts and parents and responses so we say for intent in intense with the key intense and then we get the tack by saying business intent dot or with the key tag as in the Chasen files or the tack key and we will append this to our tax array so we say tax dot append our tag and then we want to loop over all the different patterns so this again is an array with the different patterns so we loop over this we say for pattern in intense with the key patterns and then we have this pattern and then we want to apply tokenization so we already implemented a utility function in the last part so we simply have to import this so we say from NLT kay utils import tokenize and let's already import the stemming function and the back of works function so now what we want to do is we want to tokenize our patterns so we say W equals tokenize the pattern then we want to put this into the all words array so we say all words and then dot extend and then W so we are you not using a pen but extent because this again is an array and we don't want to put an array of arrays here so we want to extend this here and then we will also put in the pattern or the tokenized pattern and the corresponding label to our XY list so we say XY append and then here is a tuple we use W and the tag so this will then know the pattern and the corresponding tag and then we are done with collecting these so now if you go back in our pipeline after tokenization we also want to lower and stem the words and exclude punctuation characters so let's do this so let's define some ignore words equals and then here for example we don't want a question mark or a exclamation mark or a dot or let's also don't use a comma and then we apply list comprehensions let's simply print all words to see if this is working so now if I clear this and then run this and then we see we still have an arrow here so for pattern in intent intent on oh yeah so now we only have to use intent for each single intent and then the pattern so let's run this again and then we see we get all the different words so they have been tokenized and now let's apply stemming so let's say our all words equals again a list and then we stem each word for W in all words and we also want to exclude the ignore word so we can very easily do this with list comprehension too so we say if W not in ignore words and now let's clear this and run this again to see if this is working so um I still have this from the last part so we don't need this anymore so now here we see we have all the words in lower cases and for example here we see that the ending got chopped off so stemming works too and now let's sort these words so let's say all words equals sort it and then we also only want unique words so we can simply convert this to a set so this is a nice little trick to remove duplicate elements and then the swords function will return a list again so now let's do this with the tax too so let's say our tax equals sort it and then a set from the tax so now this will have unique labels I don't think that this is necessary but it's better to do it so let's print let's print the tax here to see if this is working so let's run this and then we see we have all the different tax so the livery funny goodbye greeting items payments and thanks so now what we want to do is to create the training data so for this we want to continue in our pipeline and now create the bag of words so let's create a list with our X train data so let's say this is an empty list and then the Y train equals an empty list so this will be the tax or the associated number for each tag and in the X we put all the bag of words so we will loop over our X Y array that we have here so we say for and then we can unpack this tuple here so we put a tuple here with the pattern and the tax so we say for pattern or let's call this pattern sentence and tack in X Y and now what we want to do is we want to create a of words by calling the function bag of words and we can see we already implemented the definition so this will get the tokenized sentence so this is exactly the pattern sentence which is already tokenized so here we applied tokenization and then it needs the all words and then we append this to our training data so X train append the back so we still have to implement this function then and for the Y data so this will be our labels so for this we use the tags and then text dot index tag so for example that we print the delivery that we print the tags yes here we still have the text and now if the texts are in this order and we have to tack delivery then this will give us the label 0 and for funny this will give us the label 1 and so on so we have numbers for our labels and then we put this to our Y train so Y train append label so here we have to be careful sometimes you also want this as a so-called one hot encoded vector but in this case we are using PI torch and later we are going to use the cross entropy loss and this doesn't want it as one hot so here we only want to have the class labels so it's called cross entropy loss which we will see in the third part so that's why we don't have to care about one hot encoding here so only and put in the label for this pattern and then after this we want to convert this to a numpy array so we import numpy s in P and then we say our X train equals a number array based on this extra enlist and the same with y train equals numpy array y train so now we have the training data and now we still have to implement the bag of words function so we didn't do this in the last part so let's do this now so here let me copy and paste an example for you again so now what we have to do is we have our tokenize sentence with our new and incoming word so hello how are you and then we have the all words so here we already collected all the words based on the patterns that we looked up here so this is just a small example and then we look at each word in the sentence and if it is available in the words array then we put a 1 here so for example we have a hello so we put a 1 at the position where hello is we also have you here so we put a 1 at the position where u is and we don't have our and we don't have how in this example so all the rest of the positions will be 0 so this is how the bag of words is working so now let's do this so this will get a tokenized sentence and in the training pipeline we will also apply the stemming for the all words array so let's do the same for the tokenized sentence so let's do this and let me close this here so now we want to call the stammer for each word in the tokenize sentence so we use list comprehension again and say tokenize sentence equals the stemming function of our word W for W in tokenized sentence and now we applied the stemming and then we create and a back and initialize it with zero for each word so like this we have all words and then we create an array with the same size but only with zero so we can do this with numpy so we need number here too so import numpy s and P and then we say our back equals then we say numpy syros with the size of the length of the words or it's called all words here and then let's also define a data type this should be numpy float32 and then we loop over our all words so we say for index and word in and numerate all words so this will give us both the index and the current word and then we check if this word is in our tokenized sentence so then it will get a 1 so we will say with this index equals two 1s a float and then we will return the back so let's try this out so let's say our sentence equals this one so this is already tokenized then our all words are these words and then our bag of words equals the bag of words function with the sentence first and the words first and the word second and then we print the bag of words and now let's clear this and run Python NL TK yuto's and then we see we get the same array as I showed you here so this is working so let's remove this again and then we are done with this file so let's head back to our training file and now as a last thing in this part I want to create a PI torch data set from this training data so let's import some things that we need for pie charts so we import torch we import torch dot and n s and N and we say from torch dot utils dot data we import data sets and data loader so if you haven't installed PI torch already and don't know what these are then please have a look at my beginner course because there I will explain all of these things so now down here let's create a new data set so we have to create a class and call this chat data set and then this must inherit data set and we have to implement the init function which will only get self and here we will store self dot number of samples equals this is the length of X train then we will store the data so we say self dot X data equals just our X train array and self dot Y data equals our Y training array and then we also have to implement the get item function with self and the index and here so this is that we can later access data set with an index and then we can say here we return self dot X data of this index and self dot Y data of this index as a tuple and then we also define a or the length method with self and here we simply return self dot number of samples so now we have our chat data set so let's create this so let's say data set equals chat data set and then we also want to create a data loader from this so we say our training train loader equals data loader and then as a data set it gets this data set then say batch size equals batch size so let's create or define some hyper parameters here so oh I have to put this here so let's say hyperparameters and then here we say batch size equals let's say eight in this example and then we use this year we also say shuffle equals true for our training and in my case I say number of workers equals two so this is just for multi threading or multi processing you also or on Windows especially I think this might raise an error so let's try to set this to two in your case if you get an error here in my case I'm using two this makes the loading a little bit faster and yeah so now we have our chat data set so why did I implement this as a high-touch data set now and created a data loader and this is just because then we can automatically iterate over this and get better training so that's it for part two and then in part 3 we will implement the actual PI torch model and the training loop so see you next time
Original Description
In this Python Tutorial we build a simple chatbot using PyTorch and Deep Learning. I will also provide an introduction to some basic Natural Language Processing (NLP) techniques.
1) Theory + NLP concepts (Stemming, Tokenization, bag of words)
2) Create training data
3) PyTorch model and training
4) Save/load model and implement the chat
Resource:
This tutorial was inspired and adapted from the following article:
"Contextual Chatbots with Tensorflow": https://chatbotsmagazine.com/contextual-chat-bots-with-tensorflow-4391749d0077
✅ Write cleaner code with Sourcery, instant refactoring suggestions in VS Code & PyCharm: https://sourcery.ai/?utm_source=youtube&utm_campaign=pythonengineer *
📚 Get my FREE NumPy Handbook:
https://www.python-engineer.com/numpybook
📓 Notebooks available on Patreon:
https://www.patreon.com/patrickloeber
⭐ Join Our Discord : https://discord.gg/FHMg9tKFSN
If you enjoyed this video, please subscribe to the channel!
NLTK:
https://www.nltk.org
You can find the code on GitHub:
https://github.com/patrickloeber/pytorch-chatbot
PyTorch Beginner Course:
https://www.youtube.com/playlist?list=PLqnslRFeH2UrcDBWF5mfPGpqQDSta6VK4
Please checkout my website to see all tutorials:
https://www.python-engineer.com
You can find me here:
Twitter: https://twitter.com/patloeber
GitHub: https://github.com/patrickloeber
Icons:
https://fontawesome.com/icons/comments
https://fontawesome.com/icons/robot
#PyTorch #NLP #DeepLearning
----------------------------------------------------------------------------------------------------------
* This is a sponsored or an affiliate link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Patrick Loeber · Patrick Loeber · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Lists in Python - Advanced Python 01 - Programming Tutorial
Patrick Loeber
Tuples in Python - Advanced Python 02 - Programming Tutorial
Patrick Loeber
Dictionaries in Python - Advanced Python 03 - Programming Tutorial
Patrick Loeber
Sets in Python - Advanced Python 04 - Programming Tutorial
Patrick Loeber
Strings in Python - Advanced Python 05 - Programming Tutorial
Patrick Loeber
Collections in Python - Advanced Python 06 - Programming Tutorial
Patrick Loeber
Itertools in Python - Advanced Python 07 - Programming Tutorial
Patrick Loeber
Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce
Patrick Loeber
Exceptions in Python - Advanced Python 09 - Programming Tutorial
Patrick Loeber
Logging in Python - Advanced Python 10 - Programming Tutorial
Patrick Loeber
JSON in Python - Advanced Python 11 - Programming Tutorial
Patrick Loeber
Random Numbers in Python - Advanced Python 12 - Programming Tutorial
Patrick Loeber
Decorators in Python - Advanced Python 13 - Programming Tutorial
Patrick Loeber
Generators in Python - Advanced Python 14 - Programming Tutorial
Patrick Loeber
Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial
Patrick Loeber
Threading in Python - Advanced Python 16 - Programming Tutorial
Patrick Loeber
Multiprocessing in Python - Advanced Python 17 - Programming Tutorial
Patrick Loeber
Function arguments in detail - Advanced Python 18 - Programming Tutorial
Patrick Loeber
The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial
Patrick Loeber
Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial
Patrick Loeber
Context Managers in Python - Advanced Python 21 - Programming Tutorial
Patrick Loeber
KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial
Patrick Loeber
Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial
Patrick Loeber
Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial
Patrick Loeber
Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04
Patrick Loeber
Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial
Patrick Loeber
Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial
Patrick Loeber
SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial
Patrick Loeber
Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial
Patrick Loeber
Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial
Patrick Loeber
Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial
Patrick Loeber
PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial
Patrick Loeber
K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial
Patrick Loeber
Anaconda Tutorial - Installation and Basic Commands
Patrick Loeber
PyTorch Tutorial 01 - Installation
Patrick Loeber
PyTorch Tutorial 02 - Tensor Basics
Patrick Loeber
PyTorch Tutorial 03 - Gradient Calculation With Autograd
Patrick Loeber
PyTorch Tutorial 04 - Backpropagation - Theory With Example
Patrick Loeber
PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation
Patrick Loeber
PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer
Patrick Loeber
PyTorch Tutorial 07 - Linear Regression
Patrick Loeber
PyTorch Tutorial 08 - Logistic Regression
Patrick Loeber
PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training
Patrick Loeber
PyTorch Tutorial 10 - Dataset Transforms
Patrick Loeber
Download Images With Python Automatically - Python Web Scraping Tutorial
Patrick Loeber
PyTorch Tutorial 11 - Softmax and Cross Entropy
Patrick Loeber
Select Movies with Python - Web Scraping Tutorial
Patrick Loeber
PyTorch Tutorial 12 - Activation Functions
Patrick Loeber
List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial
Patrick Loeber
PyTorch Tutorial 13 - Feed-Forward Neural Network
Patrick Loeber
How To Add A Progress Bar In Python With Just One Line - Python Tutorial
Patrick Loeber
PyTorch Tutorial 14 - Convolutional Neural Network (CNN)
Patrick Loeber
The Walrus Operator - New in Python 3.8 - Python Tutorial
Patrick Loeber
PyTorch Tutorial 15 - Transfer Learning
Patrick Loeber
YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1
Patrick Loeber
YouTube Data API Tutorial with Python - Find Channel Videos - Part 2
Patrick Loeber
YouTube Data API Tutorial with Python - Get Video Statistics - Part 3
Patrick Loeber
YouTube Data API Tutorial with Python - Analyze the Data - Part 4
Patrick Loeber
AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial
Patrick Loeber
Ultimate FREE Study Guide for Machine Learning and Deep Learning
Patrick Loeber
More on: LLM Foundations
View skill →Related Reads
📰
📰
📰
📰
I Found the Neural Network I Built in Class 9 — Here’s What Happened When I Tried to Run It Again
Medium · Deep Learning
Introduction to Deep Learning and Neural Networks: From Human Brain to Artificial Intelligence
Medium · Deep Learning
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI