Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases · Beginner ·🏭 MLOps & LLMOps ·6y ago

Key Takeaways

The video demonstrates the application of deep learning techniques for image to LaTeX translation using a pre-existing dataset and TensorFlow, with a focus on encoder-decoder approaches and convolutional neural networks. The presentation covers the development of a model architecture, hyperparameter tuning, and deployment of a web app using Flask and Bootstrap.

Full Transcript

all right hi guys where is the image too late hacks team and we have been working really hard on this project for the past eight weeks and we want to share with you guys what we have done so far so real quick it's our team and our team has four members Danny ll-look and myself Kathy and those are is a picture of our face and the real quickly what is the objective of our project which is of course using deep learning to translate an image of a mathematic equation to latex code and here's an example this is a picture of mathematic equation and here is a corresponding latex code and I wanted to talk about a little bit why we wanted to do this project because as a data scientist or for many researchers we often find people in the need of trying to find the late hacks if it's to write a paper write a jupiter notebook or even answering Stack Overflow questions so but latex code is very daunting to start with and a lot of people just don't want to write them so we figured why don't we automate this process and give the time back to people focus on what really matters to them which is their work so that led us to this project and continue so the first is of course we need to see if we have data set and lucky for us this is a problem that has been previously worked on and there is free available data set on the internet you can just download so that give us a Hestia a head start and here are some example of the data set you can see already that our training data set or images are not the same size but that's not the only characteristics of our data set another thing is they're heavily pre processed by the prior research team which turns out to be a great limitation of our model and will be discussed later in the presentation and a few other things I wanted to stress about our data set is its we're essentially dealing with a very high dimensionality problem because if you think about it they're up to 400 different syntax in latex code and our model has to not only pick the rising text but make sure they put them into correct order so we're talking about tens of thousands potential dimensionality as an output and that's just our creative difficulty strands to our project and the ways that being said we create out of base models so what we did is we can us research online and see its is real quick and dirty way that we can just create a base model and see where that take us to so interestingly we find one of the tensorflow tutorial which is doing very a little bit similar thing but that is for image captioning so basically you know give an image and you make a summary of it the reason I say is similar is because they are also using an encoder and a decoder which is kind of the approach that we want to take so we follow that tutorial we created our vanilla base model what we did is we rescale our image into the same size just for the time sake and we create a vanilla see an encoder and the output is being pushed into a decoder which is created by GL constructed by GRU layer we also implemented a pin Batman sorry I can never pronounce that word badness style attention to our model we made sure to overfit one batch to make sure there's no bug and we got a number that is extremely close to the beta to zero so we said okay we can't ring on the entire data set and you can see that after 14 a pox of training our entire training data set we call loss at point 6 and we cannot get a lower so that is a point and we figure ok we need to really become creative and a figure out a way to create a model that is fitting for our own problem so that will leads us to the architecture of the model and the results which will be talked about by out all right so our final model architecture basically consists of three main components the first is still a convolutional neural network that encodes the image the only difference is it doesn't have any fully connected layers so it can handle input images that are of different sizes and the data set does have so we don't have to have that strict pre-processing step the output of the encoder is a feature grid and then the next component of our architecture is the row encoder and what that does is it basically applies a recurrent neural network across each of the rows of that feature grid right and the recurrent neural networks use lsdm cells to do that and then the output of their own coder gets fed to the final key component of the model which is a decoder that decoder is another recurrent neural network using LSD um's that also applies a looong style attention mechanism and then at each time step the output gets fed to a fully connected softmax layer to classify the latex symbol so for the training experiments we tried a bunch of different hyper parameters listed here and then the best configuration that we were able to find was the ones highlighted in green and since it's all mangled up oh I'm gonna say it was a stochastic gradient descent with momentum adaptive learning rate based on the validation score after each epoch the I don't know how to pronounce initial weight initialization for the convolutional layers only and in a large batch size of 32 okay so when we wanted to look at how good our model was the first thing that we looked at was the loss which is essentially the value of the error function that the model optimizes against during training for this project we use the categorical cross entropy loss function and for comparison reasons that plot shows what the best loss of our baseline model the one that Kathy talked about earlier is and also for another comparison this is the best loss that the state-of-the-art model got and by state-of-the-art I'm simply referring to a previous study done by an NLP group at Harvard on that same dataset and this was our loss across different training iterations on the data set another way to look at the performance of the of our model was through an evaluation metric and that metric is the perplexity score we didn't come up with that it's what the previous studies use for the same data set and on the bottom right corner is a table showing the the score for our model for the soda model on both the training set and the test set the test set is not the validation set it's another's never we didn't tune the hyper parameters on that set so once we had a model that we were happy with we wanted to deploy it so that we could interact with it and this is our pipeline so we have a web app that uses flask in bootstrap and then in the backend we have a server that's flask and tensorflow 2.0 so our model so we have a demo for you guys well we're gonna we're gonna send over this equation here and it's on YouTube what was that stay tuned so this is this is the website here so we choose a file and then we have that image that I showed convert and then that's the latex code and put that into a tech file and then render it render it there and that's the equation that it that it predicted so this is the the input and output side by side so it did a pretty pretty good it did confuse a theta for a q and then it like made the L a subscript and then the also the bottom of the Sigma but it's pretty pretty good and we noticed that most of our samples that we tested perform like this but we wanted to try something more interesting and so we use this equation here and what's interesting is when we fed this into the model it only looked at the top part it only predicted that and it did it really well so we think we think that it's just because of how rigid the data set was and how it was pre processed so just a couple of takeaways and for the future work on this one so yeah I mean the advantages for us was well we had an existing data set which makes life much easier and we had existing research to well done which also makes life easier but still they were challenges the data still needed to be processed properly and that still takes time and also right now the data processing pipeline is pretty rigid for us the second aspect the workflow and tooling at least for us took a lot of time to figure out how to properly establish the workflow what worked for us tensorflow 2.0 has this sub classing API which allows us to do multiple experiments pretty quickly in some sense the other two the second one of the kind of obvious but you know throwing more compute power and just spinning multiple VM instances and doing multiple experiments in parallel definitely helped and the third part which is like sort of surprised was surprising to us just increasing the bat size helped a lot and not just that like increasing the bat size and the initializations helped quickly change the loss for future work what we could extend it to well one of them would be you know change the encoder which is right now just CNN do something that's more state-of-the-art we would want to automate the pre-processing so right now it only takes you know images where it's the equation is sort of centered we would want to make it more generic than that and if that would be that I think would be a pretty cool thing to be able to do and then finally there's just data generation and augmentation and we could you know just from the existing data set we could make many more equations from there and the last part would be the Bayesian hyper parameter operating optimization so right now and we haven't been able to get to that point we should be [Applause]

Original Description

This presentation was a part of the Applied Deep Learning Fellowship held at the Weights and Biases HQ in the spring of 2019. Applied Deep Learning Fellowship: https://www.wandb.com/applied-deep-learning For more tutorials: https://www.wandb.com/classes To learn more about Weights & Biases: https://www.wandb.com/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 26 of 60

1 0. What is machine learning?
0. What is machine learning?
Weights & Biases
2 1. Build Your First Machine Learning Model
1. Build Your First Machine Learning Model
Weights & Biases
3 Intro to ML: Course Overview
Intro to ML: Course Overview
Weights & Biases
4 2. Multi-Layer Perceptrons
2. Multi-Layer Perceptrons
Weights & Biases
5 3. Convolutional Neural Networks
3. Convolutional Neural Networks
Weights & Biases
6 Weights & Biases at OpenAI
Weights & Biases at OpenAI
Weights & Biases
7 Why Experiment Tracking is Crucial to OpenAI
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
8 4. Autoencoders
4. Autoencoders
Weights & Biases
9 5. Sentiment Analysis
5. Sentiment Analysis
Weights & Biases
10 6. Recurrent Neural Networks [RNNs]
6. Recurrent Neural Networks [RNNs]
Weights & Biases
11 7. Text Generation using LSTMs and GRUs
7. Text Generation using LSTMs and GRUs
Weights & Biases
12 8. Text Classification Using Convolutional Neural Networks
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
13 9. Hybrid LSTMs [Long Short-Term Memory]
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
14 Toyota Research Institute on Experiment Tracking with Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
15 Weights and Biases - Developer Tools for Deep Learning
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
16 Introducing Weights & Biases
Introducing Weights & Biases
Weights & Biases
17 10. Seq2Seq Models
10. Seq2Seq Models
Weights & Biases
18 11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
19 12. One-shot learning for teaching neural networks to classify objects never seen before
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
20 13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
21 14. Data Augmentation | Keras
14. Data Augmentation | Keras
Weights & Biases
22 15. Batch Size and Learning Rate in CNNs
15. Batch Size and Learning Rate in CNNs
Weights & Biases
23 Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
24 Grading Rubric for AI Applications with Sergey Karayev  (2019)
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
25 16. Video Frame Prediction using CNNs and LSTMs (2019)
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
27 17.  Build and Deploy an Emotion Classifier (2019)
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
28 Applied Deep Learning - Data Management with Josh Tobin (2019)
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
29 Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
30 Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
31 Troubleshooting and Iterating ML Models with Lee Redden (2019)
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
32 Designing a Machine Learning Project with Neal Khosla (2019)
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
33 Lukas Beiwald on ML Tools and Experiment Management (2019)
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
34 Building Machine Learning Teams with Josh Tobin (2019)
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
35 Pieter Abeel on Potential Deep Learning Research Directions  (2019)
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
36 Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
37 Five Lessons for Team-Oriented Research with Peter Welder (2019)
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
38 Applied Deep Learning - Rosanne Liu on AI Research (2019)
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
39 Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
40 Organizing ML projects — W&B walkthrough (2020)
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
41 Brandon Rohrer — Machine Learning in Production for Robots
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
42 Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
43 My experiments with Reinforcement Learning with Jariullah Safi
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
44 Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
45 Testing Machine Learning Models with Eric Schles
Testing Machine Learning Models with Eric Schles
Weights & Biases
46 How Linear Algebra is not like Algebra with Charles Frye
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
47 Predicting Protein Structures using Deep Learning with Jonathan King
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
48 Rachael Tatman — Conversational AI and Linguistics
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
49 Reformer by Han Lee
Reformer by Han Lee
Weights & Biases
50 Sequence Models with Pujaa Rajan
Sequence Models with Pujaa Rajan
Weights & Biases
51 GitHub Actions & Machine Learning Workflows with Hamel Husain
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
52 Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
53 Jack Clark — Building Trustworthy AI Systems
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
54 Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
55 Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
56 Antipatterns in open source research code with Jariullah Safi
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
57 Attention for time series forecasting & COVID predictions - Isaac Godfried
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
58 Made with ML - Goku Mohandas
Made with ML - Goku Mohandas
Weights & Biases
59 Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
60 Deep Learning Salon by Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases

This video teaches how to develop a deep learning model for image to LaTeX translation using a pre-existing dataset and TensorFlow, with a focus on encoder-decoder approaches and convolutional neural networks. The presentation covers the development of a model architecture, hyperparameter tuning, and deployment of a web app using Flask and Bootstrap. By following this tutorial, viewers can learn how to build and deploy their own image to LaTeX translation models.

Key Takeaways
  1. Rescale the image to the same size
  2. Create a vanilla encoder and decoder
  3. Implement a pinning attention mechanism
  4. Train the model on the entire dataset
  5. Train a model using stochastic gradient descent with momentum and adaptive learning rate
  6. Initialize weights for convolutional layers
  7. Use categorical cross entropy loss function and perplexity score as evaluation metrics
  8. Deploy a web app using Flask and Bootstrap for user interaction with the model
💡 The use of a pre-existing dataset and TensorFlow's sub-classing API allows for quick experimentation and development of a deep learning model for image to LaTeX translation.

Related AI Lessons

DevOps Took 10 Years to Mature.
MLOps is distinct from DevOps and solves unique problems, requiring a different approach
Medium · DevOps
Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI
Learn how Praesto, a Kubernetes Operator, optimizes ML model caching for Node-Local storage with CSI, reducing costs and improving performance
Medium · DevOps
Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx
Learn to deploy DeepSeek R1 with vLLM and Nginx for production-ready environments, moving beyond local development
Dev.to · Shannon Dias
MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages
Learn to build production monitoring for your MCP server to minimize outages and ensure smooth operation
Dev.to AI
Up next
Pole Pruner How A Rope Lever Shears High Branches
Innoforge Studio
Watch →