Pytorch TensorBoard Tutorial
Key Takeaways
This video tutorial covers the use of PyTorch TensorBoard for visualizing loss functions, accuracy, hyperparameter search, image visualization, weight visualization, and tensorboard's visual embedding or projector. The tutorial demonstrates how to use various functions from PyTorch's TensorBoard, including add_scalar, add_image, and add_histogram.
Full Transcript
hello and welcome back for another PI torch video so I got a pretty cool video for you today pension board has some pretty cool stuff that we're gonna go through but first of all we need to actually be able to install tension board so what I did is I have Ty torch one point four point oh and I installed the latest preview version of tensor board which is pip install TB nightly so if you got any difficulty using attention board or installing it I suggest trying those two out especially if you're on Windows but so if we get now that we got that covered let's actually go to the code so here we got some pretty basic code if you don't know how to code a basic CNN or don't understand this code check out my video on that we're pretty we're gonna assume you have some working code and then build tensor board on top of that code and what we're going to start with is we're gonna import first of all from - from torch doc utils tensor board we're gonna import summary writer okay essentially this is to print to tensor board so that we can visualize it and yeah so regarding the code I just have a very simple convolution on the network which is built for ya so it's built for the in this case the Emnes data set which we just import we have a data loader we initialize some hyper parameters and then we just train the network so nothing out of the ordinary the only thing I did at the end here is calculate a running training accuracy so yeah just a training accuracy at the end here and let's see we want to first of all we're gonna have we're gonna define a writer which we're gonna do summary right there the one we just imported we're gonna do an F string actually we're gonna do runs /m inist and then let's do let's do trying out fence report something like that so what this says is it's gonna write to the the runs folder which is a subfolder of the folder we're currently at and then it's gonna write to an eminence older and then trying out tensor board okay so we've just specified where it's gonna write all the data files that tension board can read then at the end of every batch here what we're gonna do is we're gonna do writer dot add scaler and we're gonna do so training loss we're just gonna start with actually printing the loss as it decreases for each batch and then we're also gonna do another graph for the training accuracy we're going to do training loss loss and then what we need to do is we need to add some global step we're just gonna call it a step and we're gonna define it here so step is zero and then we're gonna do step plus equals one so when it's added one data point it's gonna move one more step for the next one and we're gonna do another thing we write add scalar training accuracy and then so I computed a running training accuracy on the batch so we can do running trained accuracy as float the number of correct for this mini batch and then float the shape 0 just the number of examples that we send in for just this specific batch and so we just that's the value that we're gonna track and then again global step equals step okay that's the basics let's run this and I'll show you how it looks in tensor board so the code has run for a few epochs and we so I use anaconda if you don't use it I guess you just use the command line but essentially we move to the folder that we're like the model the is training at and then we're going to do tensor board log there and then runs runs because that's where we that's the like the default folder name for where we store all of the things related to tensor board but so we run this and we get back let's see so we get back that we're gonna use look at this link this URL then when we go to that we see something like this so this is essentially so this is the loss which records for each so the the loss for each batch which explains why it varies and then the training accuracy alright let's do something more interesting now because we can use that to to track hyper parameter search for example let's say we're interested in knowing which batch size and which learning rate match well together so what we could do is we could define a an array batch sizes which let's say we're interested in batch size of 164 128 and 1,024 and let's say we're treating a couple of different learning rate say we have 0.1 0.01 0.001 and yeah just another 0 I just want to say that this is not the best method for doing high parameter search you would actually do them in log space and randomized but for our sake it's it's a good enough we just want to learn tensor board so what we're going to do is for batch size in batch sizes and then for learning rate in learning rate then we're gonna define all of this stuff so we're gonna run this for each of those and also one more thing we need to do actually put the train loader yeah we need to put the I think we do that yeah we need to move the Train loader because of the batch size and we need to move the optimizer let's actually move the losses well I don't think we need to do that but let's move it anyways like this and then let's also put the writer here and let's say let's say we do runs let's do we say m-miss and then we do mini batch size of patch size learning rate and then let's just put the learning rate so because we're gonna plot all of these let's see we're gonna plot all of those loss functions and all of those actors in the same plot so we want to have some sort of name to actually distinguish them and let's see is there anything else unexpected a dented blog yeah yeah one more this and that should be enough I think one more thing we need to do actually is I need to remove those so just to not store the last ones that we did so let's see like the the folder here runs let's remove that folder so it's empty and I think that should be it so remember we're not actually using this learning right now we could just as well remove it we're not using this patch size either and so let's let's run this and I'll come back to you when it's all done actually let's just do a few changes let's change the number of epochs to one and let's just remove the 128 and just have three batch sizes so one small one intermediate and one large and all the different learning rate and then just one other thing is let's put the network inside here as well so that we kind of reset the parameters and also model dot train yeah so we changed numbering for just it doesn't take too long to run all right I'll get back when this is done so here we have the plot of the training actors in treating loss in the tension board so as you can see it's kind of many and we could analyze it like this but just to make it a little bit simpler let's do like search for just a batch size of two and then we'll we can compare just four of those we can see that so let's see we can see that the learning rate of 0.001 and also the smallest learning rate are the two best ones so for a very small batch size we want quite a low learning rate in this case let's change this to what was it 64 yeah yeah so we can see that 0.1 is just too large it doesn't train at all and I opened oh oh one seems to be the best the smallest learning rate is not really yeah okay so actually 0.0 so quite a large learning rate is best for this batch size and a thousand 24 let's see is so now it seems that again 0.01 is the best 0.1 is not that bad and 0.001 is not that bad either so the trend that we can see is that smaller batch size seems to correlate with a lower learning rate which kind of makes sense is we're doing a lot more update and when we're computing gradients for a larger batch size we the gradient is more exact so we can afford having a larger learning rate that's sort of the interpretation we can have but one thing that is a little bit problematic is that so the plot is it's kind of not that nice to look at we can make it better as we did like searching just for specific ones but I'm gonna show you another way of having these these hyper parameters to make it a little bit easier to look at so we're gonna do is we're gonna go back to the code and we're gonna let's see here we're gonna first so I store let's see the accuracy she's just an empty array we're actually gonna add accuracies dot append running run training accuracy and then we're gonna use something to to visualize the hyper parameters in a bit perhaps better way so we're gonna do writer add h params and then we're gonna do in a dictionary we're gonna do a learning rate which is just a learning rate and then batch size or besides just batch size and then we're gonna do another another dictionary we're gonna do accuracy which is going to be the son of accuracies / the link to just an average of the accuracies so this is not an exact accuracy it's just yeah it's just to give a an understanding of how potential would work she might actually in practice you would do this differently I guess you would have a get accurate function and just compute the exact accuracy after that specific that specific epoch so let's see we want to do also is we want to add loss which is the sum of losses divided by the length losses and yeah actually this should be so this should be at the end of the of the epoch so after we've gone through with the LD the batch all the batches see is there anything else I don't think so so but I'm gonna run this again and we're gonna see how that plays out differently then then how it looked now so it's finally done training so as we can see we have the same plot as we did before this ugly looking graph that hard to look at this H params is the new one and first of all you have sort of a table of all of the ekor issues which might be useful you can do some metrics here you can bound the accuracy and do some stuff here we're gonna look at is the parallel coordinates view which is pretty cool I'm just going to make this larger so let's see here okay so now we can see we can see sort of the loss function and we can sort of see which accuracy and the batch size in the learning rate that corresponds to so we can sort of see that the absolute worst you can do is have a small batch size with a very large learning rate and then if we for example look at yeah so a large batch size with a very small learning rate is also is bad let's see having sort of yeah having a large learning rate with a larger batch size is not ideal in this case but it's not terrible it seems to actually do some training and yeah so we can sort of draw the same conclusion as we did when we looked at the graph but let's see for a small batch size let's see we can see that okay very small learning rate is actually very good one thing you could also do is you can highlight let's see if I can do that we can highlight just a few of those and we can see sort of okay what what were the the like the smallest loss values where did they come from so the absolute best seems to be in this case having a very small batch size with a relatively low learning weight see what is also good also having a little bit larger and learning rate and then okay batch um intermediate large batch size with 64 and a high learning rate this also seems to be quite good and yeah a thousand 24 with a large relatively larger learning way it also seems okay yeah so this is another way to kind of do the hyper parameter search and you can play around with this with different things you can do sort of using different optimizers you can do yeah different learning rates different regularization parameter really whatever you you want to have all right now let's see I'm gonna show you another cool thing we want to do which is let's say that we're not performing any transformations but let's say we were to performing some transformations to our images and we want to actually see like how did they look like in in our in our training and what we can do is we can do image grid is torch vision the utils that make grid of in this case data because data is the images and then we can do brighter that add image em missed images and then image grid and let's also see yeah okay so I'm gonna do one more thing is we're gonna also let's say we want to see how the weights of our network changes as we train it which might be useful for some debugging purposes debugging purposes let's say that their weights are stuck so they don't update at all if we can visualize in some way we can see it okay well the the layers I mean the weights in this specific layer is not changing at all yeah and how we would do that is writer dot add histogram and let's call it FC one let's actually look at the last linear layer so the distribution of the weights for the last linear layer this one we could also do it for the comm layer just for yeah just let's let's just pick the last new layer for simplicity and then we'll do model dot FC 1 dot weight okay so we're doing two things at the same time here we're visualizing the images of our specific batch that we're sending in we're also visualizing the weight of the last linear layer for each batch and yeah let's not do all of these again cuz I already saw how that looked like and yeah let's pick just just a single batch size and a single in rate just to make the training a little bit faster and one more thing is that you would actually like to have so you would like to remove this folder or change and like add another folder it might mess up the graph if you plot again so let's just remove this one it's not right okay well let's see we have to I guess close down the tensor board and then we can remove it I guess yeah yeah so let's now run this and let's start the tension board and let's see how that looks like now there are no no now it's just a single plot right we're just doing for a single one and now we can see that we have sort of new sections here we have images so this is these are the images for our batch right these are so 8 times 8 64 images in our specific batch and also we we can sort of trace the images for each batch now these just look like exact like amnesty images but let's say that we would have some transformations they might look different then we want to actually see well how do they look like did they make sense etc and then so we're gonna look at histograms so this is pretty cool yeah we want to see this one so we can sort of see that the so they these are the distribution of the weights and so we can see that sort of it looks like a normal distribution almost and sort of pretty small weights 0.005 seems to be the the most common value now we can sort of see most importantly that the the value seems to change as we iterate if for example they would all look the same and then we would probably say that okay well something's wrong with our model can be useful for debugging purposes so now that we understand those parts of of using tensor board let's look at something very very interesting and cool let's let's try to do some visual what they call visual embeddings you kind of visualize how the model does the predictions first of all let's just move this writer here to at the bottom so we can do like and plot things to tensor board and what we're gonna do is we're gonna add first of all we're gonna add classes which is gonna be just a array of might be actually be I know it's gonna be an array I think Oh 1 2 3 4 5 6 7 8 and 9 so those are just the classes we have right the digit digit and then we're gonna do one thing here we're gonna see where it's best to put it the Stewart here but I'll call it features and we're gonna do data dot reshape data shape 0-1 then we're gonna do writer dot also going to class labels which is classes and then label for label in targets and targets is what is the correct values in this case for the class and then we're gonna add it down here writer dot add and bedding and we're gonna do features and then metadata we're gonna set to class labels and then the label image is gonna be data dot yeah so we're gonna do is that on squeeze and then that I mention which is one I'm gonna explain this just quickly the data that we have now is so the number of examples which is 64 and then it's 28 by 28 but they add embeddings expects to have the number of channels here which in this case is just 1 right so we're just gonna add another dimension here so that it becomes like this 64 128 28 to essentially add one dimension we just sing a single value in that dimension that's what this does and then we're also gonna do we're gonna do global step it's going to be equal to batch index and yeah we're gonna have just a single batch size and learning rate just to make it simple and yeah let's see if this works and run it okay Adam betting got any method ok metadata okay approximately ten hours later actually all I said here was correct we don't have to do this in this case actually the data is already in that form mmm I think I was looking at the fashion M minused which has that but in this case the amnesty cassette is just an one twenty twenty eight so it's in the correct shape already so we don't have to do anything we can just run it as it is now and let's see how it looks in a tensor board so actually let's change the batch size to 256 just to be able to get a more images to look at in our project projector which we're going to see soon and also let's not plot the embedding for each batch if we have 60,000 training examples and we have 256 as our batch size we're gonna have about 230 batches that we're gonna train so let's do if batch index equals 230 then let's add the embedding and also we yeah so this was targets before but let's look at how a model actually predicts the labels so let's do predictions here and we're gonna use that instead of targets so yeah I think I move the class labels from from here to to below here and so we're gonna see how the model predict we're gonna see a visualization of how the model predicts the images now let's train this and let's see how it looks in pencil board so it is on training and as we can see we have the same loss function or the accuracy and loss as we had before but just a single graph in this case since we removed all the other batches in the learning rate and again we have the images here larger images since we have 256 as our batch size and also we can see sort of the distribution and also the high parameter as we did before so that's what we saw previously now what's added as the new thing is this projector thing this projector thing is pretty cool it has a different methods that we're going to look at principal component analysis which essentially just does a it does a sort of a projection of of the in this case 784 dimensions down to just three so we can kind of visualize how the in this case the model separate the different images so we can sort of see that there are some clusters here like the ones here are sort of close to each other and yeah yeah so the zeroes on this side is also quite close the PCA so the principal component analysis is a little bit difficult to sort of visualize what it does so we can use this T distributed yeah t-sne and yeah I'm no expert on these methods I don't think we have to we just we can use them and sort of appreciate sort of how they visualize the the images in this case and the predictions but the TSN is quite a popular method and it has sort of a few hyper parameters the perplexity and the learning rate that you can play around with let's just pick perplexity fifteen and sort of have the learning rate to ten and then let's just run this for I don't know a thousand iterations and two thousand and so remember that this is like how our model actually predicts the images so we can sort of use this as a debugging purposes I guess we can sort of see well where does the model have difficulty which which does it mistake for other images and so that can sort of contribute to what should we gather more data on and and so on but let's see so we can see first of all we have see where are the zeros over here so we can sort of see here that there's a cluster here of zeros right and yeah some of them like some threes are close but because it'll sort of see that there's a cluster of years here but this might indicate sort of some zeros are difficult like this zero seems to be very close yeah this is actually a zero okay yeah okay so this one is the correct value of this one is a zero but I mean I would probably say that's a three yeah all right but so you can see like some might be missed like a labeled incorrectly so that's another thing you can use this for but let's see what would be difficult perhaps sevens and nines yeah so we can maybe see here that some sevens in also fours yeah so nine four and seven seems to be quite close to each other and 36 is over here though seems to be quite quite good all right well you can sort of play around with this and you can see how the model predict which is very very cool and very awesome method that tensor flow has implicated but yeah so that's the the tensor board introduction or overview of all the sort of methods we can use it for and tools that it has if you find the video useful then please like the video if you have any comments then write them in a comment below and thank you so much for watching the video
Original Description
An in-depth guide to tensorboard with examples in plotting loss functions, accuracy, hyperparameter search, image visualization, weight visualization as well as tensorboards visual embedding or projector.
The functions we look at from Pytorchs tensorboard are add_scalar, add_image, add_histogram, add_embedding and add_hparams.
❤️ Support the channel ❤️
https://www.youtube.com/channel/UCkzW5JSFwvKRjXABI-UTAkQ/join
Paid Courses I recommend for learning (affiliate links, no extra cost for you):
⭐ Machine Learning Specialization https://bit.ly/3hjTBBt
⭐ Deep Learning Specialization https://bit.ly/3YcUkoI
📘 MLOps Specialization http://bit.ly/3wibaWy
📘 GAN Specialization https://bit.ly/3FmnZDl
📘 NLP Specialization http://bit.ly/3GXoQuP
✨ Free Resources that are great:
NLP: https://web.stanford.edu/class/cs224n/
CV: http://cs231n.stanford.edu/
Deployment: https://fullstackdeeplearning.com/
FastAI: https://www.fast.ai/
💻 My Deep Learning Setup and Recording Setup:
https://www.amazon.com/shop/aladdinpersson
GitHub Repository:
https://github.com/aladdinpersson/Machine-Learning-Collection
✅ One-Time Donations:
Paypal: https://bit.ly/3buoRYH
▶️ You Can Connect with me on:
Twitter - https://twitter.com/aladdinpersson
LinkedIn - https://www.linkedin.com/in/aladdin-persson-a95384153/
Github - https://github.com/aladdinpersson
OUTLINE:
0:00 - Introduction
2:08 - Initializing SummaryWriter
2:55 - Creating a loss and accuracy plot
5:40 - Doing Hyperparameter search
16:36 - Visualizing Dataset Images and Network Weights
20:55 - Tensorboard Embedding Projector
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Aladdin Persson · Aladdin Persson · 42 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
▶
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
computeCost.m Linear Regression Cost Function - Machine Learning
Aladdin Persson
gradientDescent.m Gradient Descent Implementation - Machine Learning
Aladdin Persson
Neural Network from scratch - Part 1 (Standard Notation)
Aladdin Persson
Neural Network from scratch - Part 2 (Forward Propagation)
Aladdin Persson
Neural Network from scratch - Part 3 (Backward Propagation)
Aladdin Persson
Neural Network from scratch - Part 4 (With Python)
Aladdin Persson
sigmoid.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunction.m - Programming Assignment 2 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunctionReg.m - Programming Assignment 2 Machine Learning
Aladdin Persson
lrCostFunction.m - Programming Assignment 3 Machine Learning
Aladdin Persson
oneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predictOneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 3 Machine Learning
Aladdin Persson
Caesar Cipher Encryption and Decryption with example
Aladdin Persson
Cryptography: Caesar Cipher Python
Aladdin Persson
Vigenere Cipher Explained (with Example)
Aladdin Persson
Cryptography: Vigenere Cipher Python
Aladdin Persson
Hill Cipher Explained (with Example)
Aladdin Persson
Cryptography: Hill Cipher Python
Aladdin Persson
Interval Scheduling Greedy Algorithm: Python
Aladdin Persson
Weighted Interval Scheduling Algorithm Explained
Aladdin Persson
Weighted Interval Scheduling Python Code
Aladdin Persson
Sequence Alignment | Needleman Wunsch Algorithm
Aladdin Persson
Sequence Alignment | Needleman Wunsch in Python
Aladdin Persson
Codility BinaryGap Python
Aladdin Persson
Codility CyclicRotation Python
Aladdin Persson
Derivation Linear Regression with Gradient Descent
Aladdin Persson
Linear Regression Gradient Descent From Scratch in Python
Aladdin Persson
Pytorch Neural Network example
Aladdin Persson
Pytorch CNN example (Convolutional Neural Network)
Aladdin Persson
Pytorch LeNet implementation from scratch
Aladdin Persson
Pytorch VGG implementation from scratch
Aladdin Persson
Pytorch GoogLeNet / InceptionNet implementation from scratch
Aladdin Persson
How to save and load models in Pytorch
Aladdin Persson
How to build custom Datasets for Images in Pytorch
Aladdin Persson
Pytorch Transfer Learning and Fine Tuning Tutorial
Aladdin Persson
Pytorch Data Augmentation using Torchvision
Aladdin Persson
Pytorch Quick Tip: Weight Initialization
Aladdin Persson
Pytorch Quick Tip: Using a Learning Rate Scheduler
Aladdin Persson
Pytorch ResNet implementation from Scratch
Aladdin Persson
Pytorch TensorBoard Tutorial
Aladdin Persson
Pytorch DCGAN Tutorial (See description for updated video)
Aladdin Persson
Naive Bayes from Scratch - Machine Learning Python
Aladdin Persson
Spam Classifier using Naive Bayes in Python
Aladdin Persson
K-Nearest Neighbor from scratch - Machine Learning Python
Aladdin Persson
Linear Regression Normal Equation Python
Aladdin Persson
SVM from Scratch - Machine Learning Python (Support Vector Machine)
Aladdin Persson
Neural Network from Scratch - Machine Learning Python
Aladdin Persson
Pytorch RNN example (Recurrent Neural Network)
Aladdin Persson
Pytorch Bidirectional LSTM example
Aladdin Persson
Pytorch Text Generator with character level LSTM
Aladdin Persson
Logistic Regression from Scratch - Machine Learning Python
Aladdin Persson
K-Means Clustering from Scratch - Machine Learning Python
Aladdin Persson
Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Aladdin Persson
Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Aladdin Persson
Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Aladdin Persson
Paper Review: Sequence to Sequence Learning with Neural Networks
Aladdin Persson
Pytorch Seq2Seq Tutorial for Machine Translation
Aladdin Persson
Pytorch Seq2Seq with Attention for Machine Translation
Aladdin Persson
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
Stop Overfitting With Basically One Line of Code
Medium · AI
Stop Overfitting With Basically One Line of Code
Medium · Machine Learning
Stop Overfitting With Basically One Line of Code
Medium · Data Science
Chapters (6)
Introduction
2:08
Initializing SummaryWriter
2:55
Creating a loss and accuracy plot
5:40
Doing Hyperparameter search
16:36
Visualizing Dataset Images and Network Weights
20:55
Tensorboard Embedding Projector
🎓
Tutor Explanation
DeepCamp AI