Image to LaTeX - Applied Deep Learning Fellowship (2019)
Key Takeaways
The video demonstrates the application of deep learning techniques for image to LaTeX translation using a pre-existing dataset and TensorFlow, with a focus on encoder-decoder approaches and convolutional neural networks. The presentation covers the development of a model architecture, hyperparameter tuning, and deployment of a web app using Flask and Bootstrap.
Full Transcript
all right hi guys where is the image too late hacks team and we have been working really hard on this project for the past eight weeks and we want to share with you guys what we have done so far so real quick it's our team and our team has four members Danny ll-look and myself Kathy and those are is a picture of our face and the real quickly what is the objective of our project which is of course using deep learning to translate an image of a mathematic equation to latex code and here's an example this is a picture of mathematic equation and here is a corresponding latex code and I wanted to talk about a little bit why we wanted to do this project because as a data scientist or for many researchers we often find people in the need of trying to find the late hacks if it's to write a paper write a jupiter notebook or even answering Stack Overflow questions so but latex code is very daunting to start with and a lot of people just don't want to write them so we figured why don't we automate this process and give the time back to people focus on what really matters to them which is their work so that led us to this project and continue so the first is of course we need to see if we have data set and lucky for us this is a problem that has been previously worked on and there is free available data set on the internet you can just download so that give us a Hestia a head start and here are some example of the data set you can see already that our training data set or images are not the same size but that's not the only characteristics of our data set another thing is they're heavily pre processed by the prior research team which turns out to be a great limitation of our model and will be discussed later in the presentation and a few other things I wanted to stress about our data set is its we're essentially dealing with a very high dimensionality problem because if you think about it they're up to 400 different syntax in latex code and our model has to not only pick the rising text but make sure they put them into correct order so we're talking about tens of thousands potential dimensionality as an output and that's just our creative difficulty strands to our project and the ways that being said we create out of base models so what we did is we can us research online and see its is real quick and dirty way that we can just create a base model and see where that take us to so interestingly we find one of the tensorflow tutorial which is doing very a little bit similar thing but that is for image captioning so basically you know give an image and you make a summary of it the reason I say is similar is because they are also using an encoder and a decoder which is kind of the approach that we want to take so we follow that tutorial we created our vanilla base model what we did is we rescale our image into the same size just for the time sake and we create a vanilla see an encoder and the output is being pushed into a decoder which is created by GL constructed by GRU layer we also implemented a pin Batman sorry I can never pronounce that word badness style attention to our model we made sure to overfit one batch to make sure there's no bug and we got a number that is extremely close to the beta to zero so we said okay we can't ring on the entire data set and you can see that after 14 a pox of training our entire training data set we call loss at point 6 and we cannot get a lower so that is a point and we figure ok we need to really become creative and a figure out a way to create a model that is fitting for our own problem so that will leads us to the architecture of the model and the results which will be talked about by out all right so our final model architecture basically consists of three main components the first is still a convolutional neural network that encodes the image the only difference is it doesn't have any fully connected layers so it can handle input images that are of different sizes and the data set does have so we don't have to have that strict pre-processing step the output of the encoder is a feature grid and then the next component of our architecture is the row encoder and what that does is it basically applies a recurrent neural network across each of the rows of that feature grid right and the recurrent neural networks use lsdm cells to do that and then the output of their own coder gets fed to the final key component of the model which is a decoder that decoder is another recurrent neural network using LSD um's that also applies a looong style attention mechanism and then at each time step the output gets fed to a fully connected softmax layer to classify the latex symbol so for the training experiments we tried a bunch of different hyper parameters listed here and then the best configuration that we were able to find was the ones highlighted in green and since it's all mangled up oh I'm gonna say it was a stochastic gradient descent with momentum adaptive learning rate based on the validation score after each epoch the I don't know how to pronounce initial weight initialization for the convolutional layers only and in a large batch size of 32 okay so when we wanted to look at how good our model was the first thing that we looked at was the loss which is essentially the value of the error function that the model optimizes against during training for this project we use the categorical cross entropy loss function and for comparison reasons that plot shows what the best loss of our baseline model the one that Kathy talked about earlier is and also for another comparison this is the best loss that the state-of-the-art model got and by state-of-the-art I'm simply referring to a previous study done by an NLP group at Harvard on that same dataset and this was our loss across different training iterations on the data set another way to look at the performance of the of our model was through an evaluation metric and that metric is the perplexity score we didn't come up with that it's what the previous studies use for the same data set and on the bottom right corner is a table showing the the score for our model for the soda model on both the training set and the test set the test set is not the validation set it's another's never we didn't tune the hyper parameters on that set so once we had a model that we were happy with we wanted to deploy it so that we could interact with it and this is our pipeline so we have a web app that uses flask in bootstrap and then in the backend we have a server that's flask and tensorflow 2.0 so our model so we have a demo for you guys well we're gonna we're gonna send over this equation here and it's on YouTube what was that stay tuned so this is this is the website here so we choose a file and then we have that image that I showed convert and then that's the latex code and put that into a tech file and then render it render it there and that's the equation that it that it predicted so this is the the input and output side by side so it did a pretty pretty good it did confuse a theta for a q and then it like made the L a subscript and then the also the bottom of the Sigma but it's pretty pretty good and we noticed that most of our samples that we tested perform like this but we wanted to try something more interesting and so we use this equation here and what's interesting is when we fed this into the model it only looked at the top part it only predicted that and it did it really well so we think we think that it's just because of how rigid the data set was and how it was pre processed so just a couple of takeaways and for the future work on this one so yeah I mean the advantages for us was well we had an existing data set which makes life much easier and we had existing research to well done which also makes life easier but still they were challenges the data still needed to be processed properly and that still takes time and also right now the data processing pipeline is pretty rigid for us the second aspect the workflow and tooling at least for us took a lot of time to figure out how to properly establish the workflow what worked for us tensorflow 2.0 has this sub classing API which allows us to do multiple experiments pretty quickly in some sense the other two the second one of the kind of obvious but you know throwing more compute power and just spinning multiple VM instances and doing multiple experiments in parallel definitely helped and the third part which is like sort of surprised was surprising to us just increasing the bat size helped a lot and not just that like increasing the bat size and the initializations helped quickly change the loss for future work what we could extend it to well one of them would be you know change the encoder which is right now just CNN do something that's more state-of-the-art we would want to automate the pre-processing so right now it only takes you know images where it's the equation is sort of centered we would want to make it more generic than that and if that would be that I think would be a pretty cool thing to be able to do and then finally there's just data generation and augmentation and we could you know just from the existing data set we could make many more equations from there and the last part would be the Bayesian hyper parameter operating optimization so right now and we haven't been able to get to that point we should be [Applause]
Original Description
This presentation was a part of the Applied Deep Learning Fellowship held at the Weights and Biases HQ in the spring of 2019.
Applied Deep Learning Fellowship: https://www.wandb.com/applied-deep-learning
For more tutorials: https://www.wandb.com/classes
To learn more about Weights & Biases: https://www.wandb.com/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Weights & Biases · Weights & Biases · 26 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
▶
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
0. What is machine learning?
Weights & Biases
1. Build Your First Machine Learning Model
Weights & Biases
Intro to ML: Course Overview
Weights & Biases
2. Multi-Layer Perceptrons
Weights & Biases
3. Convolutional Neural Networks
Weights & Biases
Weights & Biases at OpenAI
Weights & Biases
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
4. Autoencoders
Weights & Biases
5. Sentiment Analysis
Weights & Biases
6. Recurrent Neural Networks [RNNs]
Weights & Biases
7. Text Generation using LSTMs and GRUs
Weights & Biases
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
Introducing Weights & Biases
Weights & Biases
10. Seq2Seq Models
Weights & Biases
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
14. Data Augmentation | Keras
Weights & Biases
15. Batch Size and Learning Rate in CNNs
Weights & Biases
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
Testing Machine Learning Models with Eric Schles
Weights & Biases
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
Reformer by Han Lee
Weights & Biases
Sequence Models with Pujaa Rajan
Weights & Biases
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
Made with ML - Goku Mohandas
Weights & Biases
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
DevOps Took 10 Years to Mature.
Medium · DevOps
Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI
Medium · DevOps
Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx
Dev.to · Shannon Dias
MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI