Research to Code - Machine Learning tutorial

Siraj Raval · Beginner ·📐 ML Fundamentals ·7y ago

Key Takeaways

This video tutorial by Siraj Raval covers the process of implementing research papers into code, specifically focusing on neural style transfer using deep learning, and demonstrates the use of various tools such as archive sanity, git kyv, and PyTorch.

Full Transcript

that feeling when you read a great paper but there's no code hello world it's Suraj and the practice of actually implementing a technique from a research paper into code is supremely useful to learn how it all works in this video we'll implement the model from neural style transfer a landmark paper that introduced the idea of applying filters in the style of a given artist to any image using deep learning if we just want the code for the paper it's best to first search the web to see if that code already exists this saves us a lot of time since implementing it isn't a simple task we can find a bunch of research papers using the popular tool archive sanity it indexes the latest papers submitted to the open journal archive there's also Twitter and reddit for keeping up to date with the field but a lot of time the code isn't linked to the paper in a post we can use a tool called git kyv which links papers with code to see if the code exists if it's not there we can go straight to github and search for a few of the keywords from the papers title to see if anything promising shows up if there's no code there well it's time to code it ourselves so how do you choose which paper to implement ask yourself what part of the machine learning research pipeline interests you the most are you really into neural networks how about unsupervised learning or attention mechanisms or stochastic models or evolutionary computing or cell folding cardboard you've got to first figure out what makes you excited for me personally it's either novel optimization techniques or generative models using probabilistic programming list them out in your notes then start searching for important papers in that field the best paper is the one you actually enjoy reading there are a lot of papers out there so be sure to pick one that's well written usually these come out of top-tier universities or research teams in smaller universities that have been tackling the problem for years I tend to look for papers with an industry focus a lot of papers from academia are cryptic and lacking in detail some intentionally so because their goal is to publish as many papers as possible that look good on the surface industry focused papers have real-life applicability so they are easier to reproduce so onto our neural style transfer paper I've got a great video called how to read a research paper that I've linked to in the video description it all boils down to carefully read the paper from start to finish multiple times as necessary there will be a lot or a few terms that you don't understand as you read it make a note of them you can look them up later if we read the paper a few times and still don't understand the gist of it we can follow the tree of citations at the bottom of the page and read relevant papers and if there's a paywall just pirated because Yolo once we've traversed the whole tree of knowledge as all papers are built on previous knowledge will be better equipped to interpret this paper before we start building our model when to first pay attention to the input data that was used by the author's if we use a different training set with images that aren't say high definition but the author's used high definition images there's a chance our algorithm won't perform as well as it did for the authors our main task will be to understand the variables and operators of the model that the authors chose to use were essentially translating math equations in the paper into code and data so before jumping into the code we have to fully understand the equations and processes in these equations notations for variables and operators can change from one mathematical convention to another and from one research group to another we should know what each variable is whether it's a scalar or a matrix and what every operator is doing on these variables a paper is a succession of equations so we'll need to know how we'll plug the output of equation 1 into the input of equation 2 once we've read and understood the paper it's time to create a prototype this can be a very time-consuming process the more detail we put into it so to start off let's use the highest-level library we can to get something working as fast as possible Karos is a great deep learning library that lets us build neural networks in python focused on vast experimentation good old Special K wait that's taken the paper details a system that generates an image with the same content as a base image but with the style of a different picture so there are three parts to the workflow a Content extractor a style extractor and a merger in the first part the content extractor they found a way to separate the semantic content of an image it says they used a convolutional neural network called vgg 19 table nets or neural networks that are well suited for image classification tasks and vgg 19 was trained on thousands of images and is capable of classifying images right out of the box it looks like they use the output of one of the hidden layers as a content extractor that makes sense the hidden layers of a confident extract high-level features of an image and the deeper the layer the more high level the attributes will be at the layer identifies between taking an image as input and output a guess as to what it is a CNN is transformations to turn the image pixels into an internal understanding of the content of the image we can use one of the intermediate semantic representations in a continent to compare the contents of two images if we pass two different images through a confident after being passed through a few hidden layers their representations will be very close in raw value if we pass both the final image and the content image and find the distance between the intermediate representations of those images we have the content loss the equation is listed as such this summation notation makes the concept look harder than it really is we make a list of layers where we want to compute the content loss we pass both images through the network until it's at a particular layer in the list take it out of that layer square the difference between each corresponding value in the output and sum them all up we do this for every layer in the list and sum those up we're also multiplying each of the representations by some value alpha called content weight after finding their differences and squaring the second part of the workflow was to extract the style of an image it looks like they used the same idea as the content extractor meaning they use the output of a hidden layer but they added an additional step it used a correlation estimator based on the gram matrix of the filters of a given hidden layer sounds complicated but if we read on it seems like what that does is it destroys the semantics of the image but preserves its basic components making an excellent texture extractor a gram matrix results from multiplying a matrix with the transpose of itself and because every column is multiplied with every row in the matrix we can think of the spatial information that was contained in the original representations to have been distributed this gram matrix contains all sorts of information about the image the texture shapes and style once we have that gram matrix we can find the distance between the gram Tracie's of the intermediate representations of both our image and the style image to find out how similar they are in style and it's all multiplied by some value beta known as the style wait for the last part they needed to blend the content of one image with the style of another and they of course framed it as an optimization problem as machine learning papers tend to do and in an optimization problem some cost function is minimized iteratively during training to achieve a goal their cost function penalized the synthesized image if its content was not equal to the desired content in its style was not equal to the desired style but the content and the style loss were added together to get the cost function they then performed back propagation to minimize the cost by getting the gradients of the final image and iteratively changing it to look more and more like the stylized content image I use an optimization technique that's terribly named called l-bfgs which isn't as popular as say stochastic gradient descent if we do a bit of research it looks like it's a second-order optimization scheme meaning it uses the derivative of the derivative that gets closer to the global minimum but the iteration cost is also bigger looks like this will likely be the term we'll need to spend the most time learning about but first let's create some naming conventions we've got a Content image a style image in a final synthesized image we can start coding this model in Karros sequentially has a list of steps to help us organize our thoughts here it looks like carrots doesn't use the l-bfgs optimizer so we can use Sai Pi for that part it's going to be important to document everything here as we code since there are a lot of moving parts we'll define some multi-dimensional arrays to help us create image variables then concatenate them all into a single tensor they first synthesized a white noise image then extracted the content and style of it we can input our tensor into the VG g16 model using care they calculated the distance between the content of the image and the original content image as well as the distance between the style of the image in the original style image we can extract data from specific layers using their numbering for both loss functions both distances were used to calculate the cost function and thus the gradient as is the case in machine learning if the gradient is zero we are done optimizing but if it's not we'll run another iteration of optimization that'll generate a new final image that's closer to the content image content wise and closer to the style image style wise and if the preset number of iterations is achieved finish otherwise we'll go back to the start after a couple of iterations we can check the result in our local directory and it seems to work well enough we can go back and tweak the parameters as necessary to get a result we're comfortable with now that we have a prototype version done if we want we can write a more detailed precise version in pure Python or a lower-level deep learning library like tensorflow do you want to be the very best like no one ever was well hit the subscribe button and it'll happen for now I've got to use PI torch so thanks for watching

Original Description

A lot of times, research papers don't have an associated codebase that you can browse and run yourself. In cases like that, you'll have to code up the paper yourself. That is easier said than done, and in this video i'll show you how you should read and dissect a research paper so you can quickly implement it programmatically. The paper we'll be implementing in this video is called Neural Style transfer, that applies artistic filters to an image using 3 loss functions. Its a great starting point, i'll demo it using code, animations, and math. Enjoy! Code for this video: https://github.com/llSourcell/Research_to_Code Please Subscribe! And like. And comment. That's what keeps me going. Want more education? Connect with me here: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology instagram: https://www.instagram.com/sirajraval Linkedin: https://www.linkedin.com/in/sirajraval/ github + code website is: http://www.gitxiv.com/ More learning resources; https://www.youtube.com/watch?v=-mu3TYZ_udM&t=2s https://www.youtube.com/watch?v=SHTOI0KtZnU https://medium.com/artists-and-machine-intelligence/neural-artistic-style-transfer-a-comprehensive-look-f54d8649c199 https://github.com/anishathalye/neural-style Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ And please support me on Patreon: https://www.patreon.com/user?u=3191693 Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 0 of 60

← Previous Next →
1 What is Bitcoin?
What is Bitcoin?
Siraj Raval
2 5 Ways to Use Bitcoin
5 Ways to Use Bitcoin
Siraj Raval
3 BTC Fever - Siraj [Music Video]
BTC Fever - Siraj [Music Video]
Siraj Raval
4 5 Reasons to Build Decentralized Apps
5 Reasons to Build Decentralized Apps
Siraj Raval
5 The Interplanetary File System
The Interplanetary File System
Siraj Raval
6 How to Build a Dapp in 3 min
How to Build a Dapp in 3 min
Siraj Raval
7 Life Before Smartphones
Life Before Smartphones
Siraj Raval
8 4 Ways to Use Smart Contracts
4 Ways to Use Smart Contracts
Siraj Raval
9 3 Dapps You HAVE to See
3 Dapps You HAVE to See
Siraj Raval
10 Char's Life as a BitTorrent Engineer
Char's Life as a BitTorrent Engineer
Siraj Raval
11 4 Reasons AlphaGo is a Huge Deal
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
12 Build a Neural Net in 4 Minutes
Build a Neural Net in 4 Minutes
Siraj Raval
13 Sentiment Analysis in 4 Minutes
Sentiment Analysis in 4 Minutes
Siraj Raval
14 The Hackathon Life
The Hackathon Life
Siraj Raval
15 Your First ML App - Machine Learning for Hackers #1
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
16 Build an AI Composer - Machine Learning for Hackers #2
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
17 Build a Game AI - Machine Learning for Hackers #3
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
18 Build a Movie Recommender - Machine Learning for Hackers #4
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
19 Build an AI Artist - Machine Learning for Hackers #5
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
20 Build a Chatbot - ML for Hackers #6
Build a Chatbot - ML for Hackers #6
Siraj Raval
21 Build an AI Reader - Machine Learning for Hackers #7
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
22 Build an AI Writer - Machine Learning for Hackers #8
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
23 Build a Chatbot w/ an API - ML for Hackers #9
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
24 One-Shot Learning - Fresh Machine Learning #1
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
25 Generative Adversarial Nets - Fresh Machine Learning #2
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
26 Tone Analysis - Fresh Machine Learning #3
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
27 Generate Rap Lyrics - Fresh Machine Learning #4
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
28 Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
29 Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
30 Build an Antivirus in 5 Min - Fresh Machine Learning #7
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
31 TensorFlow in 5 Minutes (tutorial)
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
32 Build a Recurrent Neural Net in 5 Min
Build a Recurrent Neural Net in 5 Min
Siraj Raval
33 Build a Simulation in 5 Min
Build a Simulation in 5 Min
Siraj Raval
34 Build a TensorFlow Image Classifier in 5 Min
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
35 Tensorboard Explained in 5 Min
Tensorboard Explained in 5 Min
Siraj Raval
36 Generate Music in TensorFlow
Generate Music in TensorFlow
Siraj Raval
37 Build a Game Bot (LIVE)
Build a Game Bot (LIVE)
Siraj Raval
38 Deep Learning Frameworks Compared
Deep Learning Frameworks Compared
Siraj Raval
39 Introduction - Learn Python for Data Science #1
Introduction - Learn Python for Data Science #1
Siraj Raval
40 Build a Neural Network (LIVE)
Build a Neural Network (LIVE)
Siraj Raval
41 Twitter Sentiment Analysis - Learn Python for Data Science #2
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
42 Recommendation Systems - Learn Python for Data Science #3
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
43 Predicting Stock Prices - Learn Python for Data Science #4
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
44 Pong Neural Network (LIVE)
Pong Neural Network (LIVE)
Siraj Raval
45 Deep Dream in TensorFlow - Learn Python for Data Science #5
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
46 Visualizing Data with D3.js (LIVE)
Visualizing Data with D3.js (LIVE)
Siraj Raval
47 Genetic Algorithms - Learn Python for Data Science #6
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
48 Enter Siraj [Music Video]
Enter Siraj [Music Video]
Siraj Raval
49 Build a Web Scraper (LIVE)
Build a Web Scraper (LIVE)
Siraj Raval
50 Why is P vs NP Important?
Why is P vs NP Important?
Siraj Raval
51 How to Make a Neural Network (LIVE)
How to Make a Neural Network (LIVE)
Siraj Raval
52 How to Make an Amazing Tensorflow Chatbot Easily
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
53 How to Make an Amazing Video Game Bot Easily
How to Make an Amazing Video Game Bot Easily
Siraj Raval
54 How to Make a Tensorflow Neural Network (LIVE)
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
55 How to Make a Simple Tensorflow Speech Recognizer
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
56 Joel Shor - Really Quick Questions with an Awesome Google Engineer
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
57 How to Make a Path Planning Algorithm Easily (LIVE)
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
58 The Best Way to Prepare a Dataset Easily
The Best Way to Prepare a Dataset Easily
Siraj Raval
59 Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
60 How to Make a Tic Tac Toe Neural Network Easily (LIVE)
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval

This video tutorial teaches viewers how to implement research papers into code, specifically focusing on neural style transfer using deep learning, and demonstrates the use of various tools such as archive sanity, git kyv, and PyTorch. Viewers will learn how to read and dissect research papers, identify key components, and reproduce results. The tutorial also covers the application of fine-tuning to LLMs and CV models, and demonstrates how to generate images using CV. By following this tutorial,

Key Takeaways
  1. Use the highest-level library to get something working as fast as possible
  2. Create a prototype with the highest level of detail possible
  3. Extract the semantic content of an image using a content extractor
  4. Extract the style of an image using a style extractor
  5. Merge the content of one image with the style of another
  6. Create naming conventions for content image, style image, and final synthesized image
  7. Define multi-dimensional arrays to create image variables
  8. Concatenate arrays into a single tensor
  9. Synthesize a white noise image
  10. Extract content and style of synthesized image
💡 The key insight from this tutorial is that implementing research papers into code can be a powerful way to learn and apply deep learning techniques to real-world problems, and that using the right tools and libraries can make this process much easier.

Related AI Lessons

Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →