Research to Code - Machine Learning tutorial
Skills:
Reading ML Papers90%LLM Engineering90%Fine-tuning LLMs90%Paper Reproduction85%Modern CV Models85%
Key Takeaways
This video tutorial by Siraj Raval covers the process of implementing research papers into code, specifically focusing on neural style transfer using deep learning, and demonstrates the use of various tools such as archive sanity, git kyv, and PyTorch.
Full Transcript
that feeling when you read a great paper but there's no code hello world it's Suraj and the practice of actually implementing a technique from a research paper into code is supremely useful to learn how it all works in this video we'll implement the model from neural style transfer a landmark paper that introduced the idea of applying filters in the style of a given artist to any image using deep learning if we just want the code for the paper it's best to first search the web to see if that code already exists this saves us a lot of time since implementing it isn't a simple task we can find a bunch of research papers using the popular tool archive sanity it indexes the latest papers submitted to the open journal archive there's also Twitter and reddit for keeping up to date with the field but a lot of time the code isn't linked to the paper in a post we can use a tool called git kyv which links papers with code to see if the code exists if it's not there we can go straight to github and search for a few of the keywords from the papers title to see if anything promising shows up if there's no code there well it's time to code it ourselves so how do you choose which paper to implement ask yourself what part of the machine learning research pipeline interests you the most are you really into neural networks how about unsupervised learning or attention mechanisms or stochastic models or evolutionary computing or cell folding cardboard you've got to first figure out what makes you excited for me personally it's either novel optimization techniques or generative models using probabilistic programming list them out in your notes then start searching for important papers in that field the best paper is the one you actually enjoy reading there are a lot of papers out there so be sure to pick one that's well written usually these come out of top-tier universities or research teams in smaller universities that have been tackling the problem for years I tend to look for papers with an industry focus a lot of papers from academia are cryptic and lacking in detail some intentionally so because their goal is to publish as many papers as possible that look good on the surface industry focused papers have real-life applicability so they are easier to reproduce so onto our neural style transfer paper I've got a great video called how to read a research paper that I've linked to in the video description it all boils down to carefully read the paper from start to finish multiple times as necessary there will be a lot or a few terms that you don't understand as you read it make a note of them you can look them up later if we read the paper a few times and still don't understand the gist of it we can follow the tree of citations at the bottom of the page and read relevant papers and if there's a paywall just pirated because Yolo once we've traversed the whole tree of knowledge as all papers are built on previous knowledge will be better equipped to interpret this paper before we start building our model when to first pay attention to the input data that was used by the author's if we use a different training set with images that aren't say high definition but the author's used high definition images there's a chance our algorithm won't perform as well as it did for the authors our main task will be to understand the variables and operators of the model that the authors chose to use were essentially translating math equations in the paper into code and data so before jumping into the code we have to fully understand the equations and processes in these equations notations for variables and operators can change from one mathematical convention to another and from one research group to another we should know what each variable is whether it's a scalar or a matrix and what every operator is doing on these variables a paper is a succession of equations so we'll need to know how we'll plug the output of equation 1 into the input of equation 2 once we've read and understood the paper it's time to create a prototype this can be a very time-consuming process the more detail we put into it so to start off let's use the highest-level library we can to get something working as fast as possible Karos is a great deep learning library that lets us build neural networks in python focused on vast experimentation good old Special K wait that's taken the paper details a system that generates an image with the same content as a base image but with the style of a different picture so there are three parts to the workflow a Content extractor a style extractor and a merger in the first part the content extractor they found a way to separate the semantic content of an image it says they used a convolutional neural network called vgg 19 table nets or neural networks that are well suited for image classification tasks and vgg 19 was trained on thousands of images and is capable of classifying images right out of the box it looks like they use the output of one of the hidden layers as a content extractor that makes sense the hidden layers of a confident extract high-level features of an image and the deeper the layer the more high level the attributes will be at the layer identifies between taking an image as input and output a guess as to what it is a CNN is transformations to turn the image pixels into an internal understanding of the content of the image we can use one of the intermediate semantic representations in a continent to compare the contents of two images if we pass two different images through a confident after being passed through a few hidden layers their representations will be very close in raw value if we pass both the final image and the content image and find the distance between the intermediate representations of those images we have the content loss the equation is listed as such this summation notation makes the concept look harder than it really is we make a list of layers where we want to compute the content loss we pass both images through the network until it's at a particular layer in the list take it out of that layer square the difference between each corresponding value in the output and sum them all up we do this for every layer in the list and sum those up we're also multiplying each of the representations by some value alpha called content weight after finding their differences and squaring the second part of the workflow was to extract the style of an image it looks like they used the same idea as the content extractor meaning they use the output of a hidden layer but they added an additional step it used a correlation estimator based on the gram matrix of the filters of a given hidden layer sounds complicated but if we read on it seems like what that does is it destroys the semantics of the image but preserves its basic components making an excellent texture extractor a gram matrix results from multiplying a matrix with the transpose of itself and because every column is multiplied with every row in the matrix we can think of the spatial information that was contained in the original representations to have been distributed this gram matrix contains all sorts of information about the image the texture shapes and style once we have that gram matrix we can find the distance between the gram Tracie's of the intermediate representations of both our image and the style image to find out how similar they are in style and it's all multiplied by some value beta known as the style wait for the last part they needed to blend the content of one image with the style of another and they of course framed it as an optimization problem as machine learning papers tend to do and in an optimization problem some cost function is minimized iteratively during training to achieve a goal their cost function penalized the synthesized image if its content was not equal to the desired content in its style was not equal to the desired style but the content and the style loss were added together to get the cost function they then performed back propagation to minimize the cost by getting the gradients of the final image and iteratively changing it to look more and more like the stylized content image I use an optimization technique that's terribly named called l-bfgs which isn't as popular as say stochastic gradient descent if we do a bit of research it looks like it's a second-order optimization scheme meaning it uses the derivative of the derivative that gets closer to the global minimum but the iteration cost is also bigger looks like this will likely be the term we'll need to spend the most time learning about but first let's create some naming conventions we've got a Content image a style image in a final synthesized image we can start coding this model in Karros sequentially has a list of steps to help us organize our thoughts here it looks like carrots doesn't use the l-bfgs optimizer so we can use Sai Pi for that part it's going to be important to document everything here as we code since there are a lot of moving parts we'll define some multi-dimensional arrays to help us create image variables then concatenate them all into a single tensor they first synthesized a white noise image then extracted the content and style of it we can input our tensor into the VG g16 model using care they calculated the distance between the content of the image and the original content image as well as the distance between the style of the image in the original style image we can extract data from specific layers using their numbering for both loss functions both distances were used to calculate the cost function and thus the gradient as is the case in machine learning if the gradient is zero we are done optimizing but if it's not we'll run another iteration of optimization that'll generate a new final image that's closer to the content image content wise and closer to the style image style wise and if the preset number of iterations is achieved finish otherwise we'll go back to the start after a couple of iterations we can check the result in our local directory and it seems to work well enough we can go back and tweak the parameters as necessary to get a result we're comfortable with now that we have a prototype version done if we want we can write a more detailed precise version in pure Python or a lower-level deep learning library like tensorflow do you want to be the very best like no one ever was well hit the subscribe button and it'll happen for now I've got to use PI torch so thanks for watching
Original Description
A lot of times, research papers don't have an associated codebase that you can browse and run yourself. In cases like that, you'll have to code up the paper yourself. That is easier said than done, and in this video i'll show you how you should read and dissect a research paper so you can quickly implement it programmatically. The paper we'll be implementing in this video is called Neural Style transfer, that applies artistic filters to an image using 3 loss functions. Its a great starting point, i'll demo it using code, animations, and math. Enjoy!
Code for this video:
https://github.com/llSourcell/Research_to_Code
Please Subscribe! And like. And comment. That's what keeps me going.
Want more education? Connect with me here:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology
instagram: https://www.instagram.com/sirajraval
Linkedin: https://www.linkedin.com/in/sirajraval/
github + code website is:
http://www.gitxiv.com/
More learning resources;
https://www.youtube.com/watch?v=-mu3TYZ_udM&t=2s
https://www.youtube.com/watch?v=SHTOI0KtZnU
https://medium.com/artists-and-machine-intelligence/neural-artistic-style-transfer-a-comprehensive-look-f54d8649c199
https://github.com/anishathalye/neural-style
Join us in the Wizards Slack channel:
http://wizards.herokuapp.com/
And please support me on Patreon:
https://www.patreon.com/user?u=3191693
Signup for my newsletter for exciting updates in the field of AI:
https://goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Siraj Raval · Siraj Raval · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
What is Bitcoin?
Siraj Raval
5 Ways to Use Bitcoin
Siraj Raval
BTC Fever - Siraj [Music Video]
Siraj Raval
5 Reasons to Build Decentralized Apps
Siraj Raval
The Interplanetary File System
Siraj Raval
How to Build a Dapp in 3 min
Siraj Raval
Life Before Smartphones
Siraj Raval
4 Ways to Use Smart Contracts
Siraj Raval
3 Dapps You HAVE to See
Siraj Raval
Char's Life as a BitTorrent Engineer
Siraj Raval
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
Build a Neural Net in 4 Minutes
Siraj Raval
Sentiment Analysis in 4 Minutes
Siraj Raval
The Hackathon Life
Siraj Raval
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
Build a Chatbot - ML for Hackers #6
Siraj Raval
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
Build a Recurrent Neural Net in 5 Min
Siraj Raval
Build a Simulation in 5 Min
Siraj Raval
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
Tensorboard Explained in 5 Min
Siraj Raval
Generate Music in TensorFlow
Siraj Raval
Build a Game Bot (LIVE)
Siraj Raval
Deep Learning Frameworks Compared
Siraj Raval
Introduction - Learn Python for Data Science #1
Siraj Raval
Build a Neural Network (LIVE)
Siraj Raval
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
Pong Neural Network (LIVE)
Siraj Raval
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
Visualizing Data with D3.js (LIVE)
Siraj Raval
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
Enter Siraj [Music Video]
Siraj Raval
Build a Web Scraper (LIVE)
Siraj Raval
Why is P vs NP Important?
Siraj Raval
How to Make a Neural Network (LIVE)
Siraj Raval
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
How to Make an Amazing Video Game Bot Easily
Siraj Raval
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
The Best Way to Prepare a Dataset Easily
Siraj Raval
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval
More on: Reading ML Papers
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
How to Learn a Hard Technical Skill Without Burning Out
Dev.to · Anas Kalthoum | FreeBrain
After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.
Medium · Machine Learning
How AI Learns with Less Labeled Data
Medium · Machine Learning
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
🎓
Tutor Explanation
DeepCamp AI