Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases · Beginner ·🏭 MLOps & LLMOps ·6y ago

Skills: ML Maths Basics90%Supervised Learning80%ML Pipelines70%CV Basics60%Modern CV Models50%

Key Takeaways

The video demonstrates the application of deep learning techniques for image to LaTeX translation using a pre-existing dataset and TensorFlow, with a focus on encoder-decoder approaches and convolutional neural networks. The presentation covers the development of a model architecture, hyperparameter tuning, and deployment of a web app using Flask and Bootstrap.

Full Transcript

all right hi guys where is the image too late hacks team and we have been working really hard on this project for the past eight weeks and we want to share with you guys what we have done so far so real quick it's our team and our team has four members Danny ll-look and myself Kathy and those are is a picture of our face and the real quickly what is the objective of our project which is of course using deep learning to translate an image of a mathematic equation to latex code and here's an example this is a picture of mathematic equation and here is a corresponding latex code and I wanted to talk about a little bit why we wanted to do this project because as a data scientist or for many researchers we often find people in the need of trying to find the late hacks if it's to write a paper write a jupiter notebook or even answering Stack Overflow questions so but latex code is very daunting to start with and a lot of people just don't want to write them so we figured why don't we automate this process and give the time back to people focus on what really matters to them which is their work so that led us to this project and continue so the first is of course we need to see if we have data set and lucky for us this is a problem that has been previously worked on and there is free available data set on the internet you can just download so that give us a Hestia a head start and here are some example of the data set you can see already that our training data set or images are not the same size but that's not the only characteristics of our data set another thing is they're heavily pre processed by the prior research team which turns out to be a great limitation of our model and will be discussed later in the presentation and a few other things I wanted to stress about our data set is its we're essentially dealing with a very high dimensionality problem because if you think about it they're up to 400 different syntax in latex code and our model has to not only pick the rising text but make sure they put them into correct order so we're talking about tens of thousands potential dimensionality as an output and that's just our creative difficulty strands to our project and the ways that being said we create out of base models so what we did is we can us research online and see its is real quick and dirty way that we can just create a base model and see where that take us to so interestingly we find one of the tensorflow tutorial which is doing very a little bit similar thing but that is for image captioning so basically you know give an image and you make a summary of it the reason I say is similar is because they are also using an encoder and a decoder which is kind of the approach that we want to take so we follow that tutorial we created our vanilla base model what we did is we rescale our image into the same size just for the time sake and we create a vanilla see an encoder and the output is being pushed into a decoder which is created by GL constructed by GRU layer we also implemented a pin Batman sorry I can never pronounce that word badness style attention to our model we made sure to overfit one batch to make sure there's no bug and we got a number that is extremely close to the beta to zero so we said okay we can't ring on the entire data set and you can see that after 14 a pox of training our entire training data set we call loss at point 6 and we cannot get a lower so that is a point and we figure ok we need to really become creative and a figure out a way to create a model that is fitting for our own problem so that will leads us to the architecture of the model and the results which will be talked about by out all right so our final model architecture basically consists of three main components the first is still a convolutional neural network that encodes the image the only difference is it doesn't have any fully connected layers so it can handle input images that are of different sizes and the data set does have so we don't have to have that strict pre-processing step the output of the encoder is a feature grid and then the next component of our architecture is the row encoder and what that does is it basically applies a recurrent neural network across each of the rows of that feature grid right and the recurrent neural networks use lsdm cells to do that and then the output of their own coder gets fed to the final key component of the model which is a decoder that decoder is another recurrent neural network using LSD um's that also applies a looong style attention mechanism and then at each time step the output gets fed to a fully connected softmax layer to classify the latex symbol so for the training experiments we tried a bunch of different hyper parameters listed here and then the best configuration that we were able to find was the ones highlighted in green and since it's all mangled up oh I'm gonna say it was a stochastic gradient descent with momentum adaptive learning rate based on the validation score after each epoch the I don't know how to pronounce initial weight initialization for the convolutional layers only and in a large batch size of 32 okay so when we wanted to look at how good our model was the first thing that we looked at was the loss which is essentially the value of the error function that the model optimizes against during training for this project we use the categorical cross entropy loss function and for comparison reasons that plot shows what the best loss of our baseline model the one that Kathy talked about earlier is and also for another comparison this is the best loss that the state-of-the-art model got and by state-of-the-art I'm simply referring to a previous study done by an NLP group at Harvard on that same dataset and this was our loss across different training iterations on the data set another way to look at the performance of the of our model was through an evaluation metric and that metric is the perplexity score we didn't come up with that it's what the previous studies use for the same data set and on the bottom right corner is a table showing the the score for our model for the soda model on both the training set and the test set the test set is not the validation set it's another's never we didn't tune the hyper parameters on that set so once we had a model that we were happy with we wanted to deploy it so that we could interact with it and this is our pipeline so we have a web app that uses flask in bootstrap and then in the backend we have a server that's flask and tensorflow 2.0 so our model so we have a demo for you guys well we're gonna we're gonna send over this equation here and it's on YouTube what was that stay tuned so this is this is the website here so we choose a file and then we have that image that I showed convert and then that's the latex code and put that into a tech file and then render it render it there and that's the equation that it that it predicted so this is the the input and output side by side so it did a pretty pretty good it did confuse a theta for a q and then it like made the L a subscript and then the also the bottom of the Sigma but it's pretty pretty good and we noticed that most of our samples that we tested perform like this but we wanted to try something more interesting and so we use this equation here and what's interesting is when we fed this into the model it only looked at the top part it only predicted that and it did it really well so we think we think that it's just because of how rigid the data set was and how it was pre processed so just a couple of takeaways and for the future work on this one so yeah I mean the advantages for us was well we had an existing data set which makes life much easier and we had existing research to well done which also makes life easier but still they were challenges the data still needed to be processed properly and that still takes time and also right now the data processing pipeline is pretty rigid for us the second aspect the workflow and tooling at least for us took a lot of time to figure out how to properly establish the workflow what worked for us tensorflow 2.0 has this sub classing API which allows us to do multiple experiments pretty quickly in some sense the other two the second one of the kind of obvious but you know throwing more compute power and just spinning multiple VM instances and doing multiple experiments in parallel definitely helped and the third part which is like sort of surprised was surprising to us just increasing the bat size helped a lot and not just that like increasing the bat size and the initializations helped quickly change the loss for future work what we could extend it to well one of them would be you know change the encoder which is right now just CNN do something that's more state-of-the-art we would want to automate the pre-processing so right now it only takes you know images where it's the equation is sort of centered we would want to make it more generic than that and if that would be that I think would be a pretty cool thing to be able to do and then finally there's just data generation and augmentation and we could you know just from the existing data set we could make many more equations from there and the last part would be the Bayesian hyper parameter operating optimization so right now and we haven't been able to get to that point we should be [Applause]

Original Description

This presentation was a part of the Applied Deep Learning Fellowship held at the Weights and Biases HQ in the spring of 2019. Applied Deep Learning Fellowship: https://www.wandb.com/applied-deep-learning For more tutorials: https://www.wandb.com/classes To learn more about Weights & Biases: https://www.wandb.com/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 26 of 60

← Previous Next →

0. What is machine learning?

0. What is machine learning?

Weights & Biases

1. Build Your First Machine Learning Model

1. Build Your First Machine Learning Model

Weights & Biases

Intro to ML: Course Overview

Intro to ML: Course Overview

Weights & Biases

2. Multi-Layer Perceptrons

2. Multi-Layer Perceptrons

Weights & Biases

3. Convolutional Neural Networks

3. Convolutional Neural Networks

Weights & Biases

Weights & Biases at OpenAI

Weights & Biases at OpenAI

Weights & Biases

Why Experiment Tracking is Crucial to OpenAI

Why Experiment Tracking is Crucial to OpenAI

Weights & Biases

4. Autoencoders

4. Autoencoders

Weights & Biases

5. Sentiment Analysis

5. Sentiment Analysis

Weights & Biases

6. Recurrent Neural Networks [RNNs]

6. Recurrent Neural Networks [RNNs]

Weights & Biases

7. Text Generation using LSTMs and GRUs

7. Text Generation using LSTMs and GRUs

Weights & Biases

8. Text Classification Using Convolutional Neural Networks

8. Text Classification Using Convolutional Neural Networks

Weights & Biases

9. Hybrid LSTMs [Long Short-Term Memory]

9. Hybrid LSTMs [Long Short-Term Memory]

Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Weights & Biases

Weights and Biases - Developer Tools for Deep Learning

Weights and Biases - Developer Tools for Deep Learning

Weights & Biases

Introducing Weights & Biases

Introducing Weights & Biases

Weights & Biases

10. Seq2Seq Models

10. Seq2Seq Models

Weights & Biases

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

Weights & Biases

12. One-shot learning for teaching neural networks to classify objects never seen before

12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Weights & Biases

14. Data Augmentation | Keras

14. Data Augmentation | Keras

Weights & Biases

15. Batch Size and Learning Rate in CNNs

15. Batch Size and Learning Rate in CNNs

Weights & Biases

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Weights & Biases

Grading Rubric for AI Applications with Sergey Karayev (2019)

Grading Rubric for AI Applications with Sergey Karayev (2019)

Weights & Biases

16. Video Frame Prediction using CNNs and LSTMs (2019)

16. Video Frame Prediction using CNNs and LSTMs (2019)

Weights & Biases

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases

17. Build and Deploy an Emotion Classifier (2019)

17. Build and Deploy an Emotion Classifier (2019)

Weights & Biases

Applied Deep Learning - Data Management with Josh Tobin (2019)

Applied Deep Learning - Data Management with Josh Tobin (2019)

Weights & Biases

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Weights & Biases

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Weights & Biases

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Weights & Biases

Designing a Machine Learning Project with Neal Khosla (2019)

Designing a Machine Learning Project with Neal Khosla (2019)

Weights & Biases

Lukas Beiwald on ML Tools and Experiment Management (2019)

Lukas Beiwald on ML Tools and Experiment Management (2019)

Weights & Biases

Building Machine Learning Teams with Josh Tobin (2019)

Building Machine Learning Teams with Josh Tobin (2019)

Weights & Biases

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Weights & Biases

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Weights & Biases

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Weights & Biases

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Weights & Biases

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Weights & Biases

Organizing ML projects — W&B walkthrough (2020)

Organizing ML projects — W&B walkthrough (2020)

Weights & Biases

Brandon Rohrer — Machine Learning in Production for Robots

Brandon Rohrer — Machine Learning in Production for Robots

Weights & Biases

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Weights & Biases

My experiments with Reinforcement Learning with Jariullah Safi

My experiments with Reinforcement Learning with Jariullah Safi

Weights & Biases

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Weights & Biases

Testing Machine Learning Models with Eric Schles

Testing Machine Learning Models with Eric Schles

Weights & Biases

How Linear Algebra is not like Algebra with Charles Frye

How Linear Algebra is not like Algebra with Charles Frye

Weights & Biases

Predicting Protein Structures using Deep Learning with Jonathan King

Predicting Protein Structures using Deep Learning with Jonathan King

Weights & Biases

Rachael Tatman — Conversational AI and Linguistics

Rachael Tatman — Conversational AI and Linguistics

Weights & Biases

Reformer by Han Lee

Reformer by Han Lee

Weights & Biases

Sequence Models with Pujaa Rajan

Sequence Models with Pujaa Rajan

Weights & Biases

GitHub Actions & Machine Learning Workflows with Hamel Husain

GitHub Actions & Machine Learning Workflows with Hamel Husain

Weights & Biases

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Weights & Biases

Jack Clark — Building Trustworthy AI Systems

Jack Clark — Building Trustworthy AI Systems

Weights & Biases

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Weights & Biases

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Weights & Biases

Antipatterns in open source research code with Jariullah Safi

Antipatterns in open source research code with Jariullah Safi

Weights & Biases

Attention for time series forecasting & COVID predictions - Isaac Godfried

Attention for time series forecasting & COVID predictions - Isaac Godfried

Weights & Biases

Made with ML - Goku Mohandas

Made with ML - Goku Mohandas

Weights & Biases

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Weights & Biases

Deep Learning Salon by Weights & Biases

Deep Learning Salon by Weights & Biases

Weights & Biases

This video teaches how to develop a deep learning model for image to LaTeX translation using a pre-existing dataset and TensorFlow, with a focus on encoder-decoder approaches and convolutional neural networks. The presentation covers the development of a model architecture, hyperparameter tuning, and deployment of a web app using Flask and Bootstrap. By following this tutorial, viewers can learn how to build and deploy their own image to LaTeX translation models.

Key Takeaways

Rescale the image to the same size
Create a vanilla encoder and decoder
Implement a pinning attention mechanism
Train the model on the entire dataset
Train a model using stochastic gradient descent with momentum and adaptive learning rate
Initialize weights for convolutional layers
Use categorical cross entropy loss function and perplexity score as evaluation metrics
Deploy a web app using Flask and Bootstrap for user interaction with the model

💡 The use of a pre-existing dataset and TensorFlow's sub-classing API allows for quick experimentation and development of a deep learning model for image to LaTeX translation.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

DevOps Took 10 Years to Mature.

MLOps is distinct from DevOps and solves unique problems, requiring a different approach

Medium · DevOps

Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI

Learn how Praesto, a Kubernetes Operator, optimizes ML model caching for Node-Local storage with CSI, reducing costs and improving performance

Medium · DevOps

Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx

Learn to deploy DeepSeek R1 with vLLM and Nginx for production-ready environments, moving beyond local development

Dev.to · Shannon Dias

MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages

Learn to build production monitoring for your MCP server to minimize outages and ensure smooth operation

Pole Pruner How A Rope Lever Shears High Branches

Innoforge Studio