Generative Modeling Applications

Siraj Raval · Beginner ·🧬 Deep Learning ·6y ago

Skills: LLM Foundations80%Multimodal LLMs70%Advanced Prompting60%Prompt Systems Engineering60%Modern CV Models50%

Key Takeaways

The video discusses generative modeling applications, including image, music, and 3D object generation, using tools like Magenta, TensorFlow, and Lucid, and techniques such as auto-encoders, adversarial networks, and sequence models.

Full Transcript

what if you could design an avatar what would be their story maybe you could generate a theme song for them or create a 3d model of them to explore a virtual world this is the power of generative modeling if we can learn a probability function we can use it to generate images music and even 3d objects hello world it's Suraj and machine learning in 2019 has gotten really good at generating data some of these techniques are truly mind-blowing and Vidya for example was able to generate an entire 3d map for a video game automatically in this episode we'll learn the basics of generative modeling then use that knowledge to build three apps that each generate music images and 3d objects respectively and anyone can get these up and running since the code lives in the cloud via Google collab you don't even need to install anything on your local machine links to each demo will be in the video description there are a lot of different ways that we can categorize the machine learning process does it have a label does it not have a label does it apply to simulated environments does it blend these are great frames they help us understand the techniques we should use given the type of data we're dealing with but a more application centric frame is does it generate or does it discriminate to paint a better picture here let's say we have a labelled data set of integers X is the input Y is the output a discriminative model would want to learn the probability of Y given X in probability theory we define this as a conditional relationship since one variable depends on another independence is overrated AF so if we iterate through each data point and check if given that x equals one what's the chance that y equals zero it's 100 percent for the first data point and 100 percent for the second so the total is a hundred percent now how about when we're given x equals two if we iterate through those data points where that's a given we'll find that only half of them have y equals zero thus the probability is one-half now a generative model is one that will instead learn the joint probability distribution of a data set the probability of x and y and this will output different values in the previous probability distribution if we take the case of x equals one and y equals zero we'll find that these both occur together twice out of the whole data sets thus the joint probability is 50% in general given a data set discriminative models learn the boundary between classes whereas generative models model the distribution of individual classes across the entire data sets they accept no boundaries boss status a generative model models both the features and the class if we model the probability of x and y we can use this distribution to generate data points all algorithms that can model the probability of x and y are generative and we'll talk about a few of those examples in a second if we had a data set of song audio files and their associated genre labels we could build a discriminative model to learn the conditional distribution of the data so that in the future given a song we could predict its genre a generative model more interestingly could given the genre generate a related song there are three main types of generative models these days auto-encoders adversarial networks and sequence models auto-encoders attempt to generate an output that's the same as the input they compress the input into a lower dimensional representation called the hidden space it's called hidden because it's compressed data not because of some ancient secret then they attempt to reconstruct the output from this representation we use two neural networks here an encoder network that learns the probability of the hidden space given the input and a decoder model which learns the probability of the input given the hidden space which will reconstruct the input as the output adversarial networks involve a generator model that learns the probability of x given the learned hidden space H where X is the input and a discriminative model which learns the probability of Y given X which tries to associate an input instance X to a yes/no binary answer Y about whether the model generated was a genuine sample from the data set we were training on or not the counterfeiter versus the detective and both improve over time the counterfeiter is what will help us generate the data set we need how bittersweet and let's not forget sequence models these models learn the probability of the form y given a specific location in the sequence and a given input sample as an example we can consider each word as a series of characters each sentence as a series of words and each paragraph has a series of sentences the output Y could be the sentiment of the sentence using a similar trick from autoencoders we can replace Y with the next item in this series of the sequence namely y equals x plus 1 allowing the model to learn let's now go through how these models work in practice first I found this really cool Google collab for generating piano songs collab if you don't know is a way to run Python code in your browser you can easily train models on a cloud GPU without having to install anything python beginners rejoice in this collab we can play with a pre trained generative model to make it do several things we can generate an entire piano performance we just need to select which key we'd like it to be in and given some initial notes we can generate the rest ourselves we can even generate an accompaniment for a given song you can imagine how we could download this pre-trained model then use it inside of a web or mobile app to serve people we could create a music making tool for artists or a music collaboration social network or we could just generate our own unique theme song without needing to hire a professional the team at Google made this possible by using a type of sequence model that's very popular lately called the transformer model their whole process was pretty clever starting with how they generated the data set to learn from they started with a collection of YouTube videos that all had a license allowing for their use then they used a model to classify those videos that only contained piano music nothing else their classifier was trained on audio set an audio data set that contained over 600 event classes and a collection of over 2 million 10-second sound clips drawn from YouTube videos hopefully not ASMR videos though super useful data set by the way they use its piano data to learn the sound of piano music in general no matter the order of the notes and collected hundreds of thousands of piano music videos as a result they then converted those audio waves into symbolic mid AI format which represents musical notes digitally within 10 K hours of symbolic piano music there is more than enough for a machine to learn how to play the piano the transformer network has an encoder and a decoder it's a sequence model input a sequence of notes and it outputs a sequence of generated notes in terms of what that collection of specific matrix operations in both the encoder and decoder do that's way too much to fit into this video but they are just math operations different combinations have been tested by AI researchers before coming to this conclusive ordering because this is the one that gave them the highest accuracy scores they built the model using magenta using its various operations to build these matrix operations step-by-step linear algebra makes life worth living magenta is a collection of musical tools built on top of tensorflow it's got pre-trained models that make it really easy to generate music it's learning the probability distribution of a music data set over time this is the learning process now if we wanted to instead of generating music generate an image say of different types of faces how are we gonna learn that probability model will need to give a machine a set of facial features and see if it can generate a unique face in that style I found a collab that lets you do just that no need to type in any code at all give the image a starting face and have it transform into another over time or give it some initial starting features and have it generate faces it's better than drugs it's machine learning it's using a program or progressive growing generative adversarial Network and video released one of the coolest papers ever when they announced it it takes the idea of again but introduces a unique training technique for it they progressively grow the size of both the discriminator and generator networks they are mirror images of each other and grow in synchrony new layers are added to the networks over time while it's training on an image datasets they train it on the celeb a image data set a collection of celebrity facial images as the resolution increases in the code you can see that it adds more convolutional layers in a loop nested inside the function use to build each Network that's a pretty clever trick convolutional layers are matrix operations that tend to perform well with image data thanks young this reduces the training time it's two to six times faster depending on the final output resolution we could imagine that we could use this kind of technology to create a talking head video that can give a weekly educational video lecture on YouTube generating mouth movements frame by frame wait maybe I should just automate my own job now I love doing this too much clearly transformer networks are pretty good at learning a probability distribution for musical data and programs are pretty good at learning a probability distribution for images but what if we wanted to generate 3d objects well I found a really cool collab that allows for style transfer for 3d objects basically you give the network a 3d object and a related image and it'll generate a new 3d object that's in the shape of the input in this style of the related image what a cute little paper demon bunny how it does this is by using a convolutional net work to extract both the style and the content into separate representations the style represents the textures and colors whereas the content is the 3d coordinate collection and through the optimization process at every step an output 3d model is rendered from a random angle from two different locations the original one and the learned one the content loss optimizes the approximate positions of the pixels whereas the style loss approximates the visual patterns without regard to the positions together through gradient descent it achieves a generated object this probability model we learned the probability of x and y where X is the 3d object and Y is the style image outputs some gorgeous visualizations and we can play with inside of the collab I think I'm in love the way that they implemented this is by using a really pretty library called lucid by the creators of this still it allows for some cool transformations with lucid you can visualize all the learned features inside of a neural network and implement various style transfer applications one of the best documented github repositories I've ever seen definitely check it out as for 3d datasets we could use object net 3d by Stanford University which is pretty good we could generate new types of characters for 3d video games as a service for developers or small companies we could also generate all sorts of fashion accessories giving the customer the utmost personalization for their own specific style or have it automate interior design work I've added the link to all three of these web demos in the video description for you to play with and find some inspiration from there are three things to remember from this episode you can generate data if you learn the joint probability distribution of a data set generative models are capable of doing this for lots of datasets out there and some examples of generative models are adversarial networks transformers and convolutional networks [Music]

Original Description

Generative modeling technology is changing the face of the Internet as you read this. It's now possible to design automated systems that can write novels, act as talking heads in videos, and compose music. We are in an absolute renaissance for creativity, the tools are your disposal to tell stories and create new realities have never been so powerful. In this episode, I'll explain how generative modeling works by demoing 3 examples that you can try yourself in your web browser. We'll generate music, faces, and 3D objects, all without having to install any dependencies locally! Newcomers, rejoice. I'll explain how it works at a mathematical level (my favorite), enjoy! Demo 1 (Generating Music): https://colab.research.google.com/notebooks/magenta/piano_transformer/piano_transformer.ipynb Demo 2 (Generating Faces): https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/tf_hub_generative_image_module.ipynb Demo 3 (Generating 3D Objects): https://colab.research.google.com/github/tensorflow/lucid/blob/master/notebooks/differentiable-parameterizations/style_transfer_3d.ipynb Autoencoders explained: https://www.youtube.com/watch?v=H1AllrJ-_30 Generative Adversarial Networks explained: https://www.youtube.com/watch?v=yz6dNf7X7SA Sequence Models explained: https://www.youtube.com/watch?v=ElmBrKyMXxs Generative Modeling explained: https://www.youtube.com/watch?v=PhCM3qoRZHE Are you a total beginner to machine learning? Watch this: https://www.youtube.com/watch?v=Cr6VqTRO1v0 My AI Startup video: https://www.youtube.com/watch?v=NzmoPqte4V4 Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ INSTAGRAM: https://bit.ly/312pLUb FACEBOOK: https://bit.ly/2OqOhx1 TWITTER: https://bit.ly/2OHYLbB WEBSITE: https://bit.ly/2OoVPQF Hit the Join button above to sign up to become a member of my channel for access to exclusive live streams! Join us at the School of AI: https://theschool.ai/ Signup for my newsletter for exciting updates

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 0 of 60

← Previous Next →

What is Bitcoin?

What is Bitcoin?

5 Ways to Use Bitcoin

5 Ways to Use Bitcoin

BTC Fever - Siraj [Music Video]

BTC Fever - Siraj [Music Video]

5 Reasons to Build Decentralized Apps

5 Reasons to Build Decentralized Apps

The Interplanetary File System

The Interplanetary File System

How to Build a Dapp in 3 min

How to Build a Dapp in 3 min

Life Before Smartphones

Life Before Smartphones

4 Ways to Use Smart Contracts

4 Ways to Use Smart Contracts

3 Dapps You HAVE to See

3 Dapps You HAVE to See

Char's Life as a BitTorrent Engineer

Char's Life as a BitTorrent Engineer

4 Reasons AlphaGo is a Huge Deal

4 Reasons AlphaGo is a Huge Deal

Build a Neural Net in 4 Minutes

Build a Neural Net in 4 Minutes

Sentiment Analysis in 4 Minutes

Sentiment Analysis in 4 Minutes

The Hackathon Life

The Hackathon Life

Your First ML App - Machine Learning for Hackers #1

Your First ML App - Machine Learning for Hackers #1

Build an AI Composer - Machine Learning for Hackers #2

Build an AI Composer - Machine Learning for Hackers #2

Build a Game AI - Machine Learning for Hackers #3

Build a Game AI - Machine Learning for Hackers #3

Build a Movie Recommender - Machine Learning for Hackers #4

Build a Movie Recommender - Machine Learning for Hackers #4

Build an AI Artist - Machine Learning for Hackers #5

Build an AI Artist - Machine Learning for Hackers #5

Build a Chatbot - ML for Hackers #6

Build a Chatbot - ML for Hackers #6

Build an AI Reader - Machine Learning for Hackers #7

Build an AI Reader - Machine Learning for Hackers #7

Build an AI Writer - Machine Learning for Hackers #8

Build an AI Writer - Machine Learning for Hackers #8

Build a Chatbot w/ an API - ML for Hackers #9

Build a Chatbot w/ an API - ML for Hackers #9

One-Shot Learning - Fresh Machine Learning #1

One-Shot Learning - Fresh Machine Learning #1

Generative Adversarial Nets - Fresh Machine Learning #2

Generative Adversarial Nets - Fresh Machine Learning #2

Tone Analysis - Fresh Machine Learning #3

Tone Analysis - Fresh Machine Learning #3

Generate Rap Lyrics - Fresh Machine Learning #4

Generate Rap Lyrics - Fresh Machine Learning #4

Build an Autoencoder in 5 Min - Fresh Machine Learning #5

Build an Autoencoder in 5 Min - Fresh Machine Learning #5

Build a Self Driving Car in 5 Min - Fresh Machine Learning #6

Build a Self Driving Car in 5 Min - Fresh Machine Learning #6

Build an Antivirus in 5 Min - Fresh Machine Learning #7

Build an Antivirus in 5 Min - Fresh Machine Learning #7

TensorFlow in 5 Minutes (tutorial)

TensorFlow in 5 Minutes (tutorial)

Build a Recurrent Neural Net in 5 Min

Build a Recurrent Neural Net in 5 Min

Build a Simulation in 5 Min

Build a Simulation in 5 Min

Build a TensorFlow Image Classifier in 5 Min

Build a TensorFlow Image Classifier in 5 Min

Tensorboard Explained in 5 Min

Tensorboard Explained in 5 Min

Generate Music in TensorFlow

Generate Music in TensorFlow

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

Deep Learning Frameworks Compared

Deep Learning Frameworks Compared

Introduction - Learn Python for Data Science #1

Introduction - Learn Python for Data Science #1

Build a Neural Network (LIVE)

Build a Neural Network (LIVE)

Twitter Sentiment Analysis - Learn Python for Data Science #2

Twitter Sentiment Analysis - Learn Python for Data Science #2

Recommendation Systems - Learn Python for Data Science #3

Recommendation Systems - Learn Python for Data Science #3

Predicting Stock Prices - Learn Python for Data Science #4

Predicting Stock Prices - Learn Python for Data Science #4

Pong Neural Network (LIVE)

Pong Neural Network (LIVE)

Deep Dream in TensorFlow - Learn Python for Data Science #5

Deep Dream in TensorFlow - Learn Python for Data Science #5

Visualizing Data with D3.js (LIVE)

Visualizing Data with D3.js (LIVE)

Genetic Algorithms - Learn Python for Data Science #6

Genetic Algorithms - Learn Python for Data Science #6

Enter Siraj [Music Video]

Enter Siraj [Music Video]

Build a Web Scraper (LIVE)

Build a Web Scraper (LIVE)

Why is P vs NP Important?

Why is P vs NP Important?

How to Make a Neural Network (LIVE)

How to Make a Neural Network (LIVE)

How to Make an Amazing Tensorflow Chatbot Easily

How to Make an Amazing Tensorflow Chatbot Easily

How to Make an Amazing Video Game Bot Easily

How to Make an Amazing Video Game Bot Easily

How to Make a Tensorflow Neural Network (LIVE)

How to Make a Tensorflow Neural Network (LIVE)

How to Make a Simple Tensorflow Speech Recognizer

How to Make a Simple Tensorflow Speech Recognizer

Joel Shor - Really Quick Questions with an Awesome Google Engineer

Joel Shor - Really Quick Questions with an Awesome Google Engineer

How to Make a Path Planning Algorithm Easily (LIVE)

How to Make a Path Planning Algorithm Easily (LIVE)

The Best Way to Prepare a Dataset Easily

The Best Way to Prepare a Dataset Easily

Catherine Olsson - Really Quick Questions with an OpenAI Engineer

Catherine Olsson - Really Quick Questions with an OpenAI Engineer

How to Make a Tic Tac Toe Neural Network Easily (LIVE)

How to Make a Tic Tac Toe Neural Network Easily (LIVE)

The video teaches the basics of generative modeling and its applications in image, music, and 3D object generation. It also covers the use of various tools and techniques, including auto-encoders, adversarial networks, and sequence models. By watching this video, viewers can gain a better understanding of generative modeling and its potential uses.

Key Takeaways

Define the differences between generative and discriminative models
Choose a generative model type (auto-encoders, adversarial networks, sequence models)
Use Magenta for music generation with pre-trained models
Generate images of faces using a progressive growing GAN
Use Lucid for style transfer of 3D objects
Train on CelebA image dataset for face generation
Add convolutional layers in a loop for progressive growth of networks

💡 Generative models can model both the features and the class of a data set, allowing for a wide range of applications in image, music, and 3D object generation.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train