Build an AI Writer - Machine Learning for Hackers #8

Siraj Raval · Beginner ·🧬 Deep Learning ·10y ago

Key Takeaways

This video demonstrates building an AI writer using Python and the lasagna deep learning library, which can generate a short story based on an input image. The code utilizes pre-trained models, including a convolutional neural network (CNN) for image recognition, a multimodal neural language model for encoding image features into a joint image-sentence embedding, and recurrent neural networks (RNNs) for decoding and style shifting.

Full Transcript

My dear Megan Tang, this is our love story. Human or machine? Human. Why? I found some drops of soilent on it. Hello world. Welcome to Serology. In today's episode, we're going to build an AI writer. That is an app that can write a short story about an image just by looking at it. Sorry, Stephen King. You're out of the game. Isn't it weird how just by stringing together an exact combination of words, we can produce something of profound beauty? When we read these stories, our brains are somehow encoding them into thoughts. When we encode a sentence to a thought, the more semantically similar it is to an existing thought, the more we'll be able to relate to it. So, how do we get an AI to write a story if it doesn't have any experience living life in the real world? Well, we're going to build an AI writer in Python using the deep learning library lasagna. And we've got a lot of code to go over, so I'll explain as we go. Let's get Pythonic. At the highest level, we could code this app in just three lines of code. It's a little ridiculous. We import the generate class, then call the load all function, which will initialize all of our machine learning model. Then call the story function with the generated models and image location as the parameter. That's it. It'll output a story. But let's dive a little deeper. The load function is just boilerplate initialization. So let's take a closer look at the story function where the real magic happens. We'll start off by loading an image into memory. This will be the image that we want to tell a story about. We'll use the load image function to load it and have the parameter set to the location of the image on our machine. The load image function uses the scientific computing library numpy to get the bite representation of the image and then resize it so it's smaller. while preserving its aspect ratio. Once we've loaded our image, it's time to input the image into a deep convolutional neural network to retrieve its features. In a previous episode, we talked about how convolutional neural nets were great for image recognition since they roughly mimic the human visual cortex. This CNN is pre-trained. We initialize it in the build connet function which is called in the boilerplate load all method. Once we specified all the layers, we load up our pre-trained synapse weights file called BG19. This file was trained on a huge data set of labeled images. So it will be able to recognize the objects in a novel image. Once we input our image into our CNN, it'll return an array of features for us. These features are the highest level features in our neural net, the layer right before the output layer, the most abstract representation of the image, its content. Once we have our features, we'll want to encode the image features into a multimodal neural language model. So what is this? Well, it's based off a paper called unifying visual semantic embeddings. In our code, we're using a pre-trained model that will input a joint image sentence embedding into a multimodal vector space. It used an LSTM to encode the sentence and a CNN to encode the image. Then a decoder neural language model generates a novel description from the image. Since our model is pre-trained, when we embed our image into this multimodal space, our features are updated to include the weights of the joint space. Then we compute the nearest neighbors. To do this, first we retrieve the array of scores, that is a list of all novel sentences generated from the novel image, which we then sort in order of closeness. Then we'll want to print out the nearest captions. Now that we have a set of captioned sentences, we'll want to compute a set of skip thought vectors for each sentence. Skip thought vectors are a vector representation of a sentence. This is another implementation of the encoder decoder model. The encoder and decoder are both recurrent neural networks. We take an input sentence and encode it into a skip thought vector by inputting it into the encoding recurrent neural net. Since we are modeling a sequence of words, we use gated recurrent units or GRUs at each neuron. GRUs consists of two gates, an update gate and a reset gate. The gating units modulate the flow of data inside the unit. And unlike LSTM cells, there are no separate memory cells. LSTM cells control the amount of memory content that is seen or used by other units in the network. GRU cells don't. They expose its full content without any control. So GRUs have a less complex structure and are thus more computationally efficient. We're starting to see these be used more and more. They're relatively new. So when we feed the sentences into the RNN, it'll create an abstraction, the vector representation or skip thought vector. Sentences that share semantic and syntactic properties will be mapped to either the same or similar skip thought vectors. The function returns these vectors as a numpy array which we can then modify via the styles shift function. We'll take our thought vectors and modify them to match the style of stories using a pre-trained recurrent neural network. The RNN was trained on a data set of romance novels where each passage was mapped to a thought vector. So we're essentially computing a function that looks like this for a style shift. F ofx is a book passage thought vector. X is an image caption, C is a caption style vector, and B is a book style vector. We remove the caption style from the caption and replace it with the book style to create a book passage vector. Once we have our book passage styled vector, we can generate the story by running the decoder function on it. The decoder is another recurrent neural network that given a vector representation of a sentence can predict the previous and the next sentence. We'll run the decoder on our passage vector and that will generate our story based on the image for us. Let's take a look at what it says about this picture. Let's read the first few sentences. She was taking the man out of her mouth and she gave him a gentle shake of her head. Oh my god, I can't wait to see what happened in the past 24 hours. I had never met a woman before. This thing is a pro. For a small chunk of code, there's a lot of machine learning going on here. We use a convolutional neural net to compute image features, an LSTM recurrent neural net to encode our image into joint space and retrieve the sentence captions, a GRU recurrent neural net to calculate the skip vectors of those sentences, and after style shifting, an RNN to decode our passive vector to a story. That's four neural nets. You can run this on your local machine since the necessary models are pre-trained. For more info, check out the links below. And I just signed up for Patreon. So, if you guys find my videos useful, I'd really appreciate your support to help me continue doing this full-time. Please subscribe for more ML videos. And for now, I've got to go fix a null pointer exception. So thanks for watching.

Original Description

This video will get you up and running with your first AI Writer able to write a short story based on an image that you input. The code for this video is here: https://github.com/llSourcell/AI_Writer I created a Slack channel for us, sign up here: https://wizards.herokuapp.com/ Great write-up on recurrent neural nets (LSTMs and GRUs) http://deeplearning4j.org/lstm.html Paper on skip thought vectors: http://arxiv.org/pdf/1506.06726v1 Paper on Unifying Visual Semantic Embeddings: https://arxiv.org/pdf/1411.2539v1.pdf You can test this code out at this site! It's really cool, they have a bunch of deep learning models in the cloud, you just have to upload an input and it gives you an output: http://www.somatic.io/models/2n6g7RZQ If you're interested in NLP, check out Michael Collins course. This guy is such a G (it's free and open source!): https://www.coursera.org/course/nlangp And check out this guy's free deep learning course on Udacity: https://www.udacity.com/course/deep-learning--ud730 I love you guys! Thanks for watching my videos, I do it for you. I left my awesome job at Twilio and I'm doing this full time now. I recently created a Patreon page. If you like my videos, feel free to help support my effort here!: https://www.patreon.com/user?ty=h&u=3191693 Much more to come so please subscribe, like, and comment. Follow me: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/ Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 22 of 60

1 What is Bitcoin?
What is Bitcoin?
Siraj Raval
2 5 Ways to Use Bitcoin
5 Ways to Use Bitcoin
Siraj Raval
3 BTC Fever - Siraj [Music Video]
BTC Fever - Siraj [Music Video]
Siraj Raval
4 5 Reasons to Build Decentralized Apps
5 Reasons to Build Decentralized Apps
Siraj Raval
5 The Interplanetary File System
The Interplanetary File System
Siraj Raval
6 How to Build a Dapp in 3 min
How to Build a Dapp in 3 min
Siraj Raval
7 Life Before Smartphones
Life Before Smartphones
Siraj Raval
8 4 Ways to Use Smart Contracts
4 Ways to Use Smart Contracts
Siraj Raval
9 3 Dapps You HAVE to See
3 Dapps You HAVE to See
Siraj Raval
10 Char's Life as a BitTorrent Engineer
Char's Life as a BitTorrent Engineer
Siraj Raval
11 4 Reasons AlphaGo is a Huge Deal
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
12 Build a Neural Net in 4 Minutes
Build a Neural Net in 4 Minutes
Siraj Raval
13 Sentiment Analysis in 4 Minutes
Sentiment Analysis in 4 Minutes
Siraj Raval
14 The Hackathon Life
The Hackathon Life
Siraj Raval
15 Your First ML App - Machine Learning for Hackers #1
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
16 Build an AI Composer - Machine Learning for Hackers #2
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
17 Build a Game AI - Machine Learning for Hackers #3
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
18 Build a Movie Recommender - Machine Learning for Hackers #4
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
19 Build an AI Artist - Machine Learning for Hackers #5
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
20 Build a Chatbot - ML for Hackers #6
Build a Chatbot - ML for Hackers #6
Siraj Raval
21 Build an AI Reader - Machine Learning for Hackers #7
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
Build an AI Writer - Machine Learning for Hackers #8
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
23 Build a Chatbot w/ an API - ML for Hackers #9
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
24 One-Shot Learning - Fresh Machine Learning #1
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
25 Generative Adversarial Nets - Fresh Machine Learning #2
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
26 Tone Analysis - Fresh Machine Learning #3
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
27 Generate Rap Lyrics - Fresh Machine Learning #4
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
28 Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
29 Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
30 Build an Antivirus in 5 Min - Fresh Machine Learning #7
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
31 TensorFlow in 5 Minutes (tutorial)
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
32 Build a Recurrent Neural Net in 5 Min
Build a Recurrent Neural Net in 5 Min
Siraj Raval
33 Build a Simulation in 5 Min
Build a Simulation in 5 Min
Siraj Raval
34 Build a TensorFlow Image Classifier in 5 Min
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
35 Tensorboard Explained in 5 Min
Tensorboard Explained in 5 Min
Siraj Raval
36 Generate Music in TensorFlow
Generate Music in TensorFlow
Siraj Raval
37 Build a Game Bot (LIVE)
Build a Game Bot (LIVE)
Siraj Raval
38 Deep Learning Frameworks Compared
Deep Learning Frameworks Compared
Siraj Raval
39 Introduction - Learn Python for Data Science #1
Introduction - Learn Python for Data Science #1
Siraj Raval
40 Build a Neural Network (LIVE)
Build a Neural Network (LIVE)
Siraj Raval
41 Twitter Sentiment Analysis - Learn Python for Data Science #2
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
42 Recommendation Systems - Learn Python for Data Science #3
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
43 Predicting Stock Prices - Learn Python for Data Science #4
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
44 Pong Neural Network (LIVE)
Pong Neural Network (LIVE)
Siraj Raval
45 Deep Dream in TensorFlow - Learn Python for Data Science #5
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
46 Visualizing Data with D3.js (LIVE)
Visualizing Data with D3.js (LIVE)
Siraj Raval
47 Genetic Algorithms - Learn Python for Data Science #6
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
48 Enter Siraj [Music Video]
Enter Siraj [Music Video]
Siraj Raval
49 Build a Web Scraper (LIVE)
Build a Web Scraper (LIVE)
Siraj Raval
50 Why is P vs NP Important?
Why is P vs NP Important?
Siraj Raval
51 How to Make a Neural Network (LIVE)
How to Make a Neural Network (LIVE)
Siraj Raval
52 How to Make an Amazing Tensorflow Chatbot Easily
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
53 How to Make an Amazing Video Game Bot Easily
How to Make an Amazing Video Game Bot Easily
Siraj Raval
54 How to Make a Tensorflow Neural Network (LIVE)
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
55 How to Make a Simple Tensorflow Speech Recognizer
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
56 Joel Shor - Really Quick Questions with an Awesome Google Engineer
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
57 How to Make a Path Planning Algorithm Easily (LIVE)
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
58 The Best Way to Prepare a Dataset Easily
The Best Way to Prepare a Dataset Easily
Siraj Raval
59 Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
60 How to Make a Tic Tac Toe Neural Network Easily (LIVE)
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval

This video teaches you how to build an AI writer that can generate short stories based on input images using Python and pre-trained deep learning models. You'll learn how to utilize convolutional neural networks, multimodal neural language models, and recurrent neural networks to achieve this task.

Key Takeaways
  1. Import necessary libraries and load pre-trained models
  2. Load an image into memory and resize it
  3. Input the image into a CNN to retrieve its features
  4. Encode the image features into a multimodal neural language model
  5. Compute the nearest neighbors and retrieve captioned sentences
  6. Calculate skip thought vectors for each sentence
  7. Style shift the thought vectors to match a specific style
  8. Decode the styled vector to generate a story
💡 The key insight of this video is that you can use pre-trained models to build a complex AI application, such as an AI writer, with relatively little code and without requiring a large amount of training data.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →