Research to Code - Machine Learning tutorial

Siraj Raval · Beginner ·📐 ML Fundamentals ·7y ago

Skills: Reading ML Papers90%LLM Engineering90%Fine-tuning LLMs90%Paper Reproduction85%Modern CV Models85%

Key Takeaways

This video tutorial by Siraj Raval covers the process of implementing research papers into code, specifically focusing on neural style transfer using deep learning, and demonstrates the use of various tools such as archive sanity, git kyv, and PyTorch.

Full Transcript

that feeling when you read a great paper but there's no code hello world it's Suraj and the practice of actually implementing a technique from a research paper into code is supremely useful to learn how it all works in this video we'll implement the model from neural style transfer a landmark paper that introduced the idea of applying filters in the style of a given artist to any image using deep learning if we just want the code for the paper it's best to first search the web to see if that code already exists this saves us a lot of time since implementing it isn't a simple task we can find a bunch of research papers using the popular tool archive sanity it indexes the latest papers submitted to the open journal archive there's also Twitter and reddit for keeping up to date with the field but a lot of time the code isn't linked to the paper in a post we can use a tool called git kyv which links papers with code to see if the code exists if it's not there we can go straight to github and search for a few of the keywords from the papers title to see if anything promising shows up if there's no code there well it's time to code it ourselves so how do you choose which paper to implement ask yourself what part of the machine learning research pipeline interests you the most are you really into neural networks how about unsupervised learning or attention mechanisms or stochastic models or evolutionary computing or cell folding cardboard you've got to first figure out what makes you excited for me personally it's either novel optimization techniques or generative models using probabilistic programming list them out in your notes then start searching for important papers in that field the best paper is the one you actually enjoy reading there are a lot of papers out there so be sure to pick one that's well written usually these come out of top-tier universities or research teams in smaller universities that have been tackling the problem for years I tend to look for papers with an industry focus a lot of papers from academia are cryptic and lacking in detail some intentionally so because their goal is to publish as many papers as possible that look good on the surface industry focused papers have real-life applicability so they are easier to reproduce so onto our neural style transfer paper I've got a great video called how to read a research paper that I've linked to in the video description it all boils down to carefully read the paper from start to finish multiple times as necessary there will be a lot or a few terms that you don't understand as you read it make a note of them you can look them up later if we read the paper a few times and still don't understand the gist of it we can follow the tree of citations at the bottom of the page and read relevant papers and if there's a paywall just pirated because Yolo once we've traversed the whole tree of knowledge as all papers are built on previous knowledge will be better equipped to interpret this paper before we start building our model when to first pay attention to the input data that was used by the author's if we use a different training set with images that aren't say high definition but the author's used high definition images there's a chance our algorithm won't perform as well as it did for the authors our main task will be to understand the variables and operators of the model that the authors chose to use were essentially translating math equations in the paper into code and data so before jumping into the code we have to fully understand the equations and processes in these equations notations for variables and operators can change from one mathematical convention to another and from one research group to another we should know what each variable is whether it's a scalar or a matrix and what every operator is doing on these variables a paper is a succession of equations so we'll need to know how we'll plug the output of equation 1 into the input of equation 2 once we've read and understood the paper it's time to create a prototype this can be a very time-consuming process the more detail we put into it so to start off let's use the highest-level library we can to get something working as fast as possible Karos is a great deep learning library that lets us build neural networks in python focused on vast experimentation good old Special K wait that's taken the paper details a system that generates an image with the same content as a base image but with the style of a different picture so there are three parts to the workflow a Content extractor a style extractor and a merger in the first part the content extractor they found a way to separate the semantic content of an image it says they used a convolutional neural network called vgg 19 table nets or neural networks that are well suited for image classification tasks and vgg 19 was trained on thousands of images and is capable of classifying images right out of the box it looks like they use the output of one of the hidden layers as a content extractor that makes sense the hidden layers of a confident extract high-level features of an image and the deeper the layer the more high level the attributes will be at the layer identifies between taking an image as input and output a guess as to what it is a CNN is transformations to turn the image pixels into an internal understanding of the content of the image we can use one of the intermediate semantic representations in a continent to compare the contents of two images if we pass two different images through a confident after being passed through a few hidden layers their representations will be very close in raw value if we pass both the final image and the content image and find the distance between the intermediate representations of those images we have the content loss the equation is listed as such this summation notation makes the concept look harder than it really is we make a list of layers where we want to compute the content loss we pass both images through the network until it's at a particular layer in the list take it out of that layer square the difference between each corresponding value in the output and sum them all up we do this for every layer in the list and sum those up we're also multiplying each of the representations by some value alpha called content weight after finding their differences and squaring the second part of the workflow was to extract the style of an image it looks like they used the same idea as the content extractor meaning they use the output of a hidden layer but they added an additional step it used a correlation estimator based on the gram matrix of the filters of a given hidden layer sounds complicated but if we read on it seems like what that does is it destroys the semantics of the image but preserves its basic components making an excellent texture extractor a gram matrix results from multiplying a matrix with the transpose of itself and because every column is multiplied with every row in the matrix we can think of the spatial information that was contained in the original representations to have been distributed this gram matrix contains all sorts of information about the image the texture shapes and style once we have that gram matrix we can find the distance between the gram Tracie's of the intermediate representations of both our image and the style image to find out how similar they are in style and it's all multiplied by some value beta known as the style wait for the last part they needed to blend the content of one image with the style of another and they of course framed it as an optimization problem as machine learning papers tend to do and in an optimization problem some cost function is minimized iteratively during training to achieve a goal their cost function penalized the synthesized image if its content was not equal to the desired content in its style was not equal to the desired style but the content and the style loss were added together to get the cost function they then performed back propagation to minimize the cost by getting the gradients of the final image and iteratively changing it to look more and more like the stylized content image I use an optimization technique that's terribly named called l-bfgs which isn't as popular as say stochastic gradient descent if we do a bit of research it looks like it's a second-order optimization scheme meaning it uses the derivative of the derivative that gets closer to the global minimum but the iteration cost is also bigger looks like this will likely be the term we'll need to spend the most time learning about but first let's create some naming conventions we've got a Content image a style image in a final synthesized image we can start coding this model in Karros sequentially has a list of steps to help us organize our thoughts here it looks like carrots doesn't use the l-bfgs optimizer so we can use Sai Pi for that part it's going to be important to document everything here as we code since there are a lot of moving parts we'll define some multi-dimensional arrays to help us create image variables then concatenate them all into a single tensor they first synthesized a white noise image then extracted the content and style of it we can input our tensor into the VG g16 model using care they calculated the distance between the content of the image and the original content image as well as the distance between the style of the image in the original style image we can extract data from specific layers using their numbering for both loss functions both distances were used to calculate the cost function and thus the gradient as is the case in machine learning if the gradient is zero we are done optimizing but if it's not we'll run another iteration of optimization that'll generate a new final image that's closer to the content image content wise and closer to the style image style wise and if the preset number of iterations is achieved finish otherwise we'll go back to the start after a couple of iterations we can check the result in our local directory and it seems to work well enough we can go back and tweak the parameters as necessary to get a result we're comfortable with now that we have a prototype version done if we want we can write a more detailed precise version in pure Python or a lower-level deep learning library like tensorflow do you want to be the very best like no one ever was well hit the subscribe button and it'll happen for now I've got to use PI torch so thanks for watching

Original Description

A lot of times, research papers don't have an associated codebase that you can browse and run yourself. In cases like that, you'll have to code up the paper yourself. That is easier said than done, and in this video i'll show you how you should read and dissect a research paper so you can quickly implement it programmatically. The paper we'll be implementing in this video is called Neural Style transfer, that applies artistic filters to an image using 3 loss functions. Its a great starting point, i'll demo it using code, animations, and math. Enjoy! Code for this video: https://github.com/llSourcell/Research_to_Code Please Subscribe! And like. And comment. That's what keeps me going. Want more education? Connect with me here: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology instagram: https://www.instagram.com/sirajraval Linkedin: https://www.linkedin.com/in/sirajraval/ github + code website is: http://www.gitxiv.com/ More learning resources; https://www.youtube.com/watch?v=-mu3TYZ_udM&t=2s https://www.youtube.com/watch?v=SHTOI0KtZnU https://medium.com/artists-and-machine-intelligence/neural-artistic-style-transfer-a-comprehensive-look-f54d8649c199 https://github.com/anishathalye/neural-style Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ And please support me on Patreon: https://www.patreon.com/user?u=3191693 Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 0 of 60

← Previous Next →

What is Bitcoin?

What is Bitcoin?

5 Ways to Use Bitcoin

5 Ways to Use Bitcoin

BTC Fever - Siraj [Music Video]

BTC Fever - Siraj [Music Video]

5 Reasons to Build Decentralized Apps

5 Reasons to Build Decentralized Apps

The Interplanetary File System

The Interplanetary File System

How to Build a Dapp in 3 min

How to Build a Dapp in 3 min

Life Before Smartphones

Life Before Smartphones

4 Ways to Use Smart Contracts

4 Ways to Use Smart Contracts

3 Dapps You HAVE to See

3 Dapps You HAVE to See

Char's Life as a BitTorrent Engineer

Char's Life as a BitTorrent Engineer

4 Reasons AlphaGo is a Huge Deal

4 Reasons AlphaGo is a Huge Deal

Build a Neural Net in 4 Minutes

Build a Neural Net in 4 Minutes

Sentiment Analysis in 4 Minutes

Sentiment Analysis in 4 Minutes

The Hackathon Life

The Hackathon Life

Your First ML App - Machine Learning for Hackers #1

Your First ML App - Machine Learning for Hackers #1

Build an AI Composer - Machine Learning for Hackers #2

Build an AI Composer - Machine Learning for Hackers #2

Build a Game AI - Machine Learning for Hackers #3

Build a Game AI - Machine Learning for Hackers #3

Build a Movie Recommender - Machine Learning for Hackers #4

Build a Movie Recommender - Machine Learning for Hackers #4

Build an AI Artist - Machine Learning for Hackers #5

Build an AI Artist - Machine Learning for Hackers #5

Build a Chatbot - ML for Hackers #6

Build a Chatbot - ML for Hackers #6

Build an AI Reader - Machine Learning for Hackers #7

Build an AI Reader - Machine Learning for Hackers #7

Build an AI Writer - Machine Learning for Hackers #8

Build an AI Writer - Machine Learning for Hackers #8

Build a Chatbot w/ an API - ML for Hackers #9

Build a Chatbot w/ an API - ML for Hackers #9

One-Shot Learning - Fresh Machine Learning #1

One-Shot Learning - Fresh Machine Learning #1

Generative Adversarial Nets - Fresh Machine Learning #2

Generative Adversarial Nets - Fresh Machine Learning #2

Tone Analysis - Fresh Machine Learning #3

Tone Analysis - Fresh Machine Learning #3

Generate Rap Lyrics - Fresh Machine Learning #4

Generate Rap Lyrics - Fresh Machine Learning #4

Build an Autoencoder in 5 Min - Fresh Machine Learning #5

Build an Autoencoder in 5 Min - Fresh Machine Learning #5

Build a Self Driving Car in 5 Min - Fresh Machine Learning #6

Build a Self Driving Car in 5 Min - Fresh Machine Learning #6

Build an Antivirus in 5 Min - Fresh Machine Learning #7

Build an Antivirus in 5 Min - Fresh Machine Learning #7

TensorFlow in 5 Minutes (tutorial)

TensorFlow in 5 Minutes (tutorial)

Build a Recurrent Neural Net in 5 Min

Build a Recurrent Neural Net in 5 Min

Build a Simulation in 5 Min

Build a Simulation in 5 Min

Build a TensorFlow Image Classifier in 5 Min

Build a TensorFlow Image Classifier in 5 Min

Tensorboard Explained in 5 Min

Tensorboard Explained in 5 Min

Generate Music in TensorFlow

Generate Music in TensorFlow

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

Deep Learning Frameworks Compared

Deep Learning Frameworks Compared

Introduction - Learn Python for Data Science #1

Introduction - Learn Python for Data Science #1

Build a Neural Network (LIVE)

Build a Neural Network (LIVE)

Twitter Sentiment Analysis - Learn Python for Data Science #2

Twitter Sentiment Analysis - Learn Python for Data Science #2

Recommendation Systems - Learn Python for Data Science #3

Recommendation Systems - Learn Python for Data Science #3

Predicting Stock Prices - Learn Python for Data Science #4

Predicting Stock Prices - Learn Python for Data Science #4

Pong Neural Network (LIVE)

Pong Neural Network (LIVE)

Deep Dream in TensorFlow - Learn Python for Data Science #5

Deep Dream in TensorFlow - Learn Python for Data Science #5

Visualizing Data with D3.js (LIVE)

Visualizing Data with D3.js (LIVE)

Genetic Algorithms - Learn Python for Data Science #6

Genetic Algorithms - Learn Python for Data Science #6

Enter Siraj [Music Video]

Enter Siraj [Music Video]

Build a Web Scraper (LIVE)

Build a Web Scraper (LIVE)

Why is P vs NP Important?

Why is P vs NP Important?

How to Make a Neural Network (LIVE)

How to Make a Neural Network (LIVE)

How to Make an Amazing Tensorflow Chatbot Easily

How to Make an Amazing Tensorflow Chatbot Easily

How to Make an Amazing Video Game Bot Easily

How to Make an Amazing Video Game Bot Easily

How to Make a Tensorflow Neural Network (LIVE)

How to Make a Tensorflow Neural Network (LIVE)

How to Make a Simple Tensorflow Speech Recognizer

How to Make a Simple Tensorflow Speech Recognizer

Joel Shor - Really Quick Questions with an Awesome Google Engineer

Joel Shor - Really Quick Questions with an Awesome Google Engineer

How to Make a Path Planning Algorithm Easily (LIVE)

How to Make a Path Planning Algorithm Easily (LIVE)

The Best Way to Prepare a Dataset Easily

The Best Way to Prepare a Dataset Easily

Catherine Olsson - Really Quick Questions with an OpenAI Engineer

Catherine Olsson - Really Quick Questions with an OpenAI Engineer

How to Make a Tic Tac Toe Neural Network Easily (LIVE)

How to Make a Tic Tac Toe Neural Network Easily (LIVE)

This video tutorial teaches viewers how to implement research papers into code, specifically focusing on neural style transfer using deep learning, and demonstrates the use of various tools such as archive sanity, git kyv, and PyTorch. Viewers will learn how to read and dissect research papers, identify key components, and reproduce results. The tutorial also covers the application of fine-tuning to LLMs and CV models, and demonstrates how to generate images using CV. By following this tutorial,

Key Takeaways

Use the highest-level library to get something working as fast as possible
Create a prototype with the highest level of detail possible
Extract the semantic content of an image using a content extractor
Extract the style of an image using a style extractor
Merge the content of one image with the style of another
Create naming conventions for content image, style image, and final synthesized image
Define multi-dimensional arrays to create image variables
Concatenate arrays into a single tensor
Synthesize a white noise image
Extract content and style of synthesized image

💡 The key insight from this tutorial is that implementing research papers into code can be a powerful way to learn and apply deep learning techniques to real-world problems, and that using the right tools and libraries can make this process much easier.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Reading ML Papers

View skill →

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

Claude 3.7 Sonnet API | Build a Research Assistant

Claude 3.7 Sonnet API | Build a Research Assistant

I Built An Obsidian AI Research Assistant with Oz...

I Built An Obsidian AI Research Assistant with Oz...

Related AI Lessons

How to Learn a Hard Technical Skill Without Burning Out

Learn how to acquire hard technical skills without burnout by creating a sustainable learning plan

Dev.to · Anas Kalthoum | FreeBrain

After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.

Learn what makes a standout ML candidate after interviewing over 100 applicants

Medium · Machine Learning

How AI Learns with Less Labeled Data

Discover how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection

Medium · Machine Learning

Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2

Learn the basics of the TypeScript compiler to write better JavaScript code

Medium · JavaScript

Learn Deep Learning by Hand (Beginner's Guide - Part 1)