Build a Chatbot - ML for Hackers #6

Siraj Raval · Beginner ·🛠️ AI Tools & Apps ·10y ago

Key Takeaways

This video teaches how to build a chatbot using the deep learning library Torch and the Lua programming language, specifically focusing on a sequence to sequence model to generate human-like conversations.

Full Transcript

open the pod bay doors Hal I'm afraid I can't do that s why not Hal because I'm overfitted on the wrong data what the hello world welcome to serology in today's episode we're going to learn how to build a chatbot chat Bots have come a long way in the past few years remember the smarter child bot on AIM that thing was pretty fun at the time but now it's like do you even AI bro the future of software is one where Bots will slowly replace our need to fiddle with clunky UI we'll be able to just ask our AI to book an Uber or find the best taco place on Yelp for US service layers will be hidden under a plain English conversational layer when I think of real AI I think of a human level chatbot the OG computer scientist Alan Turing proposed a test to judge whether or not a machine exhibited human level intelligence by having the human observe a conversation between a human and a machine if it couldn't tell if the machine was a human or not it passed the test so far no chatbot has passed a Turing test but we'll get there traditionally chatbots have used a retrieval-based model to communicate in a retrieval based model programmers code in a set of predefined responses and some kind of heuristic to pick the appropriate response based on the input and context the first chat boots were just rule-based expression matching like if I ask the exact phrase will I ever get laid it responds no every time but more recently companies have started using more complex heuristics like using a machine learning classifier Facebook messenger's chatbot API is an example of this you can hardcode responses to potential questions and the system classifies words to understand intent so you could either ask what day is it today or today is what day and it would understand that both questions although worded differently have the same intent the harder chat up model is generative these don't rely on any predefined responses whatsoever they generate them from scratch two Google researchers released a paper called a neural conversational model where they train a neural net on two data sets to do this first on a movie dialogue data set so it would be able to speak conversational English then on an IT support data set so it had domain knowledge when they tested it on a real human asking for support it was remarkably efficient at helping them solve their problem without any hard-coded responses just by giving it data and training it okay so what kind of Bot do we want to build well when building a chat bot we have to think about possible constraints are we operating on a closed domain or an open domain in an open domain the conversation can go anywhere there are an infinite number of things to talk about in a closed domain the conversation focuses on a single subject if we want to operate on an open domain using a generative model that's pretty much AGI so we're not quite there yet if we use an open domain with a retrieval model we'd have to hardcode literally everything so also impossible so right now we can build a chatbot in a closed domain using either retrieval or generative model okay let's add in one more constraint do we want it to have long or short conversations short conversations are easy you just output a single response to a single question long conversations are a bit harder the AI has to keep track of what's being said that is the context over a series of questions from the user support topics would be a good example of this we could go the easy route and use a retrieval model if all we want is a bot to give us the local weather but if we want our bot to have a long conversation with us about the weather like what's the weather in SF is my family safe where can I find a new family then we should go for a generative model we need lots of data to train our bot on a generative model like a big chat log or a knowledge base and when done well that's pretty much the bleeding edge which means that's what we have to do do so we're going to recreate the results from the neural conversational model paper using the Deep learning library torch in the Lua programming language let's collect our data set first we'll be using the Cornell movie dialogue data set and we'll set our variables from the command line to how much of the data set we want to use and the minimum frequency of words that we keep in our vocabulary our next step is to build a model we use our command line arguments to help determine the size of the model the two variables being the number of hidden layers and the word count of our data set in our case this will be a sequence to sequence Model A sequence to sequence model cons cons of two long short-term memory recurrent neural networks the first neural net is an encoder it processes the input the second neural net is the decoder and it generates the output so why the sequence to sequence model yes deep neural Nets are awesome but they require the dimensionality of the inputs and outputs to be a fix size we're accepting a sequence of words in a sentence and outputting a new sequence of words so we need a sequence learning model that can learn on data with long range memory dependencies lstm architecture is the Natural Choice the encoder lstm turns the input sentence of variable length into to a fix dimensional Vector representation we can think of this as the thought Vector so given a large enough data set of questions and responses it will recognize the closeness of a set of questions and represent them as a single thought Vector what time is it what's the time yoyo what's the time Isle myle will all fall into a single thought Vector so after training we'll have a huge set of not just synapse weights but thought vectors as well next we'll want to add in some hyperparameters we want to use a class nll Criterion for our model NL stands for negative log likelihood this will help us obtain log probabilities from our input data which will help us improve our sentence predictions the learning rate and momentum helps Pace our time steps and Decay factor and min mean error help improve our learning rate while training then we'll make sure Cuda is enabled and start training our model using back propagation in each Epoch or run we'll declare our error and timer variables and loop through each example in each batch the default batch size is a thousand examples for each of those examples we'll get the input sentence and the target sentence we'll use the input and the target as parameters to train our model then we'll want to error check and make sure we record our progress at the end of each iteration we save our model if it improves and update the learning rate boom that's it after training this baby on AWS we can have a conversation with it the more data you give it the better it's going to get and if you're going to do this add a filter for curse words I'm looking at you Microsoft this will eventually automate a lot of support jobs away completely so if there are any government people in the house right now let's get on that basic income Jam ASAP unless you want a revolution for more info check out the links down below and please subscribe for more ml videos for now I've got to go fix a memory leak so thanks for watching

Original Description

This video will get you up and running with your first Chatbot using the deep learning library Torch! The code for this video is here: https://github.com/llSourcell/Chatbot-AI I created a Slack channel for us, sign up here: https://wizards.herokuapp.com/ Here's the Neural Conversational Model paper (check out the machine-generated support conversations, they're mind-blowingly good): http://arxiv.org/pdf/1506.05869v3.pdf You should train this baby in the cloud using AWS. See ML for Hackers #4 for a tutorial on how to use AWS: https://www.youtube.com/watch?v=eKmIVU8EUbw Some great info on LSTM architecture: http://deeplearning4j.org/lstm.html Link to Facebook's Chatbot API if you're curious: https://developers.facebook.com/blog/post/2016/04/12/bots-for-messenger/ I love you guys! Thanks for watching my videos, I do it for you. I left my awesome job at Twilio and I'm doing this full time now. I recently created a Patreon page. If you like my videos, feel free to help support my effort here!: https://www.patreon.com/user?ty=h&u=3191693 Much more to come so please subscribe, like, and comment. Follow me: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/ Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 20 of 60

1 What is Bitcoin?
What is Bitcoin?
Siraj Raval
2 5 Ways to Use Bitcoin
5 Ways to Use Bitcoin
Siraj Raval
3 BTC Fever - Siraj [Music Video]
BTC Fever - Siraj [Music Video]
Siraj Raval
4 5 Reasons to Build Decentralized Apps
5 Reasons to Build Decentralized Apps
Siraj Raval
5 The Interplanetary File System
The Interplanetary File System
Siraj Raval
6 How to Build a Dapp in 3 min
How to Build a Dapp in 3 min
Siraj Raval
7 Life Before Smartphones
Life Before Smartphones
Siraj Raval
8 4 Ways to Use Smart Contracts
4 Ways to Use Smart Contracts
Siraj Raval
9 3 Dapps You HAVE to See
3 Dapps You HAVE to See
Siraj Raval
10 Char's Life as a BitTorrent Engineer
Char's Life as a BitTorrent Engineer
Siraj Raval
11 4 Reasons AlphaGo is a Huge Deal
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
12 Build a Neural Net in 4 Minutes
Build a Neural Net in 4 Minutes
Siraj Raval
13 Sentiment Analysis in 4 Minutes
Sentiment Analysis in 4 Minutes
Siraj Raval
14 The Hackathon Life
The Hackathon Life
Siraj Raval
15 Your First ML App - Machine Learning for Hackers #1
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
16 Build an AI Composer - Machine Learning for Hackers #2
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
17 Build a Game AI - Machine Learning for Hackers #3
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
18 Build a Movie Recommender - Machine Learning for Hackers #4
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
19 Build an AI Artist - Machine Learning for Hackers #5
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
Build a Chatbot - ML for Hackers #6
Build a Chatbot - ML for Hackers #6
Siraj Raval
21 Build an AI Reader - Machine Learning for Hackers #7
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
22 Build an AI Writer - Machine Learning for Hackers #8
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
23 Build a Chatbot w/ an API - ML for Hackers #9
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
24 One-Shot Learning - Fresh Machine Learning #1
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
25 Generative Adversarial Nets - Fresh Machine Learning #2
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
26 Tone Analysis - Fresh Machine Learning #3
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
27 Generate Rap Lyrics - Fresh Machine Learning #4
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
28 Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
29 Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
30 Build an Antivirus in 5 Min - Fresh Machine Learning #7
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
31 TensorFlow in 5 Minutes (tutorial)
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
32 Build a Recurrent Neural Net in 5 Min
Build a Recurrent Neural Net in 5 Min
Siraj Raval
33 Build a Simulation in 5 Min
Build a Simulation in 5 Min
Siraj Raval
34 Build a TensorFlow Image Classifier in 5 Min
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
35 Tensorboard Explained in 5 Min
Tensorboard Explained in 5 Min
Siraj Raval
36 Generate Music in TensorFlow
Generate Music in TensorFlow
Siraj Raval
37 Build a Game Bot (LIVE)
Build a Game Bot (LIVE)
Siraj Raval
38 Deep Learning Frameworks Compared
Deep Learning Frameworks Compared
Siraj Raval
39 Introduction - Learn Python for Data Science #1
Introduction - Learn Python for Data Science #1
Siraj Raval
40 Build a Neural Network (LIVE)
Build a Neural Network (LIVE)
Siraj Raval
41 Twitter Sentiment Analysis - Learn Python for Data Science #2
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
42 Recommendation Systems - Learn Python for Data Science #3
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
43 Predicting Stock Prices - Learn Python for Data Science #4
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
44 Pong Neural Network (LIVE)
Pong Neural Network (LIVE)
Siraj Raval
45 Deep Dream in TensorFlow - Learn Python for Data Science #5
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
46 Visualizing Data with D3.js (LIVE)
Visualizing Data with D3.js (LIVE)
Siraj Raval
47 Genetic Algorithms - Learn Python for Data Science #6
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
48 Enter Siraj [Music Video]
Enter Siraj [Music Video]
Siraj Raval
49 Build a Web Scraper (LIVE)
Build a Web Scraper (LIVE)
Siraj Raval
50 Why is P vs NP Important?
Why is P vs NP Important?
Siraj Raval
51 How to Make a Neural Network (LIVE)
How to Make a Neural Network (LIVE)
Siraj Raval
52 How to Make an Amazing Tensorflow Chatbot Easily
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
53 How to Make an Amazing Video Game Bot Easily
How to Make an Amazing Video Game Bot Easily
Siraj Raval
54 How to Make a Tensorflow Neural Network (LIVE)
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
55 How to Make a Simple Tensorflow Speech Recognizer
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
56 Joel Shor - Really Quick Questions with an Awesome Google Engineer
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
57 How to Make a Path Planning Algorithm Easily (LIVE)
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
58 The Best Way to Prepare a Dataset Easily
The Best Way to Prepare a Dataset Easily
Siraj Raval
59 Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
60 How to Make a Tic Tac Toe Neural Network Easily (LIVE)
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval

This video teaches how to build a chatbot using Torch and Lua, covering the basics of sequence to sequence models and neural conversational models. The goal is to create a chatbot that can have human-like conversations.

Key Takeaways
  1. Collect a dataset for training the chatbot
  2. Build a sequence to sequence model using Torch and Lua
  3. Determine the size of the model based on command line arguments
  4. Train the model using backpropagation and negative log likelihood criterion
  5. Add hyperparameters to improve the model's performance
  6. Test the chatbot and refine its performance
💡 Using a sequence to sequence model with LSTM recurrent neural networks can effectively generate human-like conversations for a chatbot.

Related AI Lessons

Up next
I Asked ChatGPT to Apply to 500 Jobs (8 Interviews in 48 Hours)
Sabrina Ramonov 🍄
Watch →