Build a Chatbot - ML for Hackers #6
Key Takeaways
This video teaches how to build a chatbot using the deep learning library Torch and the Lua programming language, specifically focusing on a sequence to sequence model to generate human-like conversations.
Full Transcript
open the pod bay doors Hal I'm afraid I can't do that s why not Hal because I'm overfitted on the wrong data what the hello world welcome to serology in today's episode we're going to learn how to build a chatbot chat Bots have come a long way in the past few years remember the smarter child bot on AIM that thing was pretty fun at the time but now it's like do you even AI bro the future of software is one where Bots will slowly replace our need to fiddle with clunky UI we'll be able to just ask our AI to book an Uber or find the best taco place on Yelp for US service layers will be hidden under a plain English conversational layer when I think of real AI I think of a human level chatbot the OG computer scientist Alan Turing proposed a test to judge whether or not a machine exhibited human level intelligence by having the human observe a conversation between a human and a machine if it couldn't tell if the machine was a human or not it passed the test so far no chatbot has passed a Turing test but we'll get there traditionally chatbots have used a retrieval-based model to communicate in a retrieval based model programmers code in a set of predefined responses and some kind of heuristic to pick the appropriate response based on the input and context the first chat boots were just rule-based expression matching like if I ask the exact phrase will I ever get laid it responds no every time but more recently companies have started using more complex heuristics like using a machine learning classifier Facebook messenger's chatbot API is an example of this you can hardcode responses to potential questions and the system classifies words to understand intent so you could either ask what day is it today or today is what day and it would understand that both questions although worded differently have the same intent the harder chat up model is generative these don't rely on any predefined responses whatsoever they generate them from scratch two Google researchers released a paper called a neural conversational model where they train a neural net on two data sets to do this first on a movie dialogue data set so it would be able to speak conversational English then on an IT support data set so it had domain knowledge when they tested it on a real human asking for support it was remarkably efficient at helping them solve their problem without any hard-coded responses just by giving it data and training it okay so what kind of Bot do we want to build well when building a chat bot we have to think about possible constraints are we operating on a closed domain or an open domain in an open domain the conversation can go anywhere there are an infinite number of things to talk about in a closed domain the conversation focuses on a single subject if we want to operate on an open domain using a generative model that's pretty much AGI so we're not quite there yet if we use an open domain with a retrieval model we'd have to hardcode literally everything so also impossible so right now we can build a chatbot in a closed domain using either retrieval or generative model okay let's add in one more constraint do we want it to have long or short conversations short conversations are easy you just output a single response to a single question long conversations are a bit harder the AI has to keep track of what's being said that is the context over a series of questions from the user support topics would be a good example of this we could go the easy route and use a retrieval model if all we want is a bot to give us the local weather but if we want our bot to have a long conversation with us about the weather like what's the weather in SF is my family safe where can I find a new family then we should go for a generative model we need lots of data to train our bot on a generative model like a big chat log or a knowledge base and when done well that's pretty much the bleeding edge which means that's what we have to do do so we're going to recreate the results from the neural conversational model paper using the Deep learning library torch in the Lua programming language let's collect our data set first we'll be using the Cornell movie dialogue data set and we'll set our variables from the command line to how much of the data set we want to use and the minimum frequency of words that we keep in our vocabulary our next step is to build a model we use our command line arguments to help determine the size of the model the two variables being the number of hidden layers and the word count of our data set in our case this will be a sequence to sequence Model A sequence to sequence model cons cons of two long short-term memory recurrent neural networks the first neural net is an encoder it processes the input the second neural net is the decoder and it generates the output so why the sequence to sequence model yes deep neural Nets are awesome but they require the dimensionality of the inputs and outputs to be a fix size we're accepting a sequence of words in a sentence and outputting a new sequence of words so we need a sequence learning model that can learn on data with long range memory dependencies lstm architecture is the Natural Choice the encoder lstm turns the input sentence of variable length into to a fix dimensional Vector representation we can think of this as the thought Vector so given a large enough data set of questions and responses it will recognize the closeness of a set of questions and represent them as a single thought Vector what time is it what's the time yoyo what's the time Isle myle will all fall into a single thought Vector so after training we'll have a huge set of not just synapse weights but thought vectors as well next we'll want to add in some hyperparameters we want to use a class nll Criterion for our model NL stands for negative log likelihood this will help us obtain log probabilities from our input data which will help us improve our sentence predictions the learning rate and momentum helps Pace our time steps and Decay factor and min mean error help improve our learning rate while training then we'll make sure Cuda is enabled and start training our model using back propagation in each Epoch or run we'll declare our error and timer variables and loop through each example in each batch the default batch size is a thousand examples for each of those examples we'll get the input sentence and the target sentence we'll use the input and the target as parameters to train our model then we'll want to error check and make sure we record our progress at the end of each iteration we save our model if it improves and update the learning rate boom that's it after training this baby on AWS we can have a conversation with it the more data you give it the better it's going to get and if you're going to do this add a filter for curse words I'm looking at you Microsoft this will eventually automate a lot of support jobs away completely so if there are any government people in the house right now let's get on that basic income Jam ASAP unless you want a revolution for more info check out the links down below and please subscribe for more ml videos for now I've got to go fix a memory leak so thanks for watching
Original Description
This video will get you up and running with your first Chatbot using the deep learning library Torch!
The code for this video is here:
https://github.com/llSourcell/Chatbot-AI
I created a Slack channel for us, sign up here:
https://wizards.herokuapp.com/
Here's the Neural Conversational Model paper (check out the machine-generated support conversations, they're mind-blowingly good):
http://arxiv.org/pdf/1506.05869v3.pdf
You should train this baby in the cloud using AWS. See ML for Hackers #4 for a tutorial on how to use AWS:
https://www.youtube.com/watch?v=eKmIVU8EUbw
Some great info on LSTM architecture:
http://deeplearning4j.org/lstm.html
Link to Facebook's Chatbot API if you're curious:
https://developers.facebook.com/blog/post/2016/04/12/bots-for-messenger/
I love you guys! Thanks for watching my videos, I do it for you. I left my awesome job at Twilio and I'm doing this full time now.
I recently created a Patreon page. If you like my videos, feel free to help support my effort here!:
https://www.patreon.com/user?ty=h&u=3191693
Much more to come so please subscribe, like, and comment.
Follow me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/
Signup for my newsletter for exciting updates in the field of AI:
https://goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Siraj Raval · Siraj Raval · 20 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
▶
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
What is Bitcoin?
Siraj Raval
5 Ways to Use Bitcoin
Siraj Raval
BTC Fever - Siraj [Music Video]
Siraj Raval
5 Reasons to Build Decentralized Apps
Siraj Raval
The Interplanetary File System
Siraj Raval
How to Build a Dapp in 3 min
Siraj Raval
Life Before Smartphones
Siraj Raval
4 Ways to Use Smart Contracts
Siraj Raval
3 Dapps You HAVE to See
Siraj Raval
Char's Life as a BitTorrent Engineer
Siraj Raval
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
Build a Neural Net in 4 Minutes
Siraj Raval
Sentiment Analysis in 4 Minutes
Siraj Raval
The Hackathon Life
Siraj Raval
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
Build a Chatbot - ML for Hackers #6
Siraj Raval
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
Build a Recurrent Neural Net in 5 Min
Siraj Raval
Build a Simulation in 5 Min
Siraj Raval
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
Tensorboard Explained in 5 Min
Siraj Raval
Generate Music in TensorFlow
Siraj Raval
Build a Game Bot (LIVE)
Siraj Raval
Deep Learning Frameworks Compared
Siraj Raval
Introduction - Learn Python for Data Science #1
Siraj Raval
Build a Neural Network (LIVE)
Siraj Raval
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
Pong Neural Network (LIVE)
Siraj Raval
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
Visualizing Data with D3.js (LIVE)
Siraj Raval
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
Enter Siraj [Music Video]
Siraj Raval
Build a Web Scraper (LIVE)
Siraj Raval
Why is P vs NP Important?
Siraj Raval
How to Make a Neural Network (LIVE)
Siraj Raval
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
How to Make an Amazing Video Game Bot Easily
Siraj Raval
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
The Best Way to Prepare a Dataset Easily
Siraj Raval
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval
More on: LLM Foundations
View skill →
🎓
Tutor Explanation
DeepCamp AI