Debugging TensorFlow with TensorBoard plugins (TensorFlow Dev Summit 2018)
Key Takeaways
TensorFlow Debugger and TensorBoard plugins for debugging and visualizing TensorFlow models, including tools like TensorBoard, TensorFlow, and tfdbg.
Full Transcript
[Music] well thank you everybody for being here um today we're going to be giving a talk about the new tensor tensorflow debugger which comes included with tensorboard and it's uh it's basically a debugger like you would see in an ide that lets you step and set breakpoints and models and watch tensors but before we do that i'd like to give you some background on tensorboard and some of the other developments that happened in the last year which we unfortunately don't have too much time to go into um but tensor tensorboard is basically um a web application it's a suite of web applications that was um authored by about 20 people and it's all packed into a two megabyte command line web server that works offline and tensorboard can be used for many purposes with all the different plugins that have been baked into it the one you're all probably most familiar with for those of you who have used tensorboard is the scalars dashboard you can plot anything you want it um it could be like loss curves etc accuracy and these things like sort of help us understand like whether or not our model is converging on optimal solutions and here is a really interesting underutilized feature um called the embedding projector and this was originally written by google so we could do things like you know project our data into a 3d space see how things cluster like if you're doing mnist the sevens go over here and the nines go there and we actually recently what you see on the screen is we got a really cool contribution from francois luz at ibm research and he sent us some pull requests on our github repository since we developed in the open and what he did was he basically added interactive label editing so you can sort of like go in there and change change things as algorithms like tsne give your data sort of reveal the structure of your data to learn more search google for interactive supervision with tensorboard this is another really amazing contribution that we received from a university student named chris anderson it's called the beholder plugin and this basically gives you a real-time visual glimpse into tensorflow data structures like for example as your training script is running it's real time and it does require a hard drive by the way it doesn't work with something like gcs at this point in time i think this could be a very useful tool going forward in terms of model explainability now tensorboard also has some new plugins for optimization cloud recently contributed a tpu profiling plug-in and tpu hardware is a little different from what many of you might be used to and tensorboard with this plug-in can really help you get the most out of your hardware and ensure that it's being properly utilized now the tensorboard ecosystem part of the goal of this talk before we get into the demo is um i want to attract more folks in the community to get involved with tensorboard development we use many of the tools you're familiar with such as typescript and polymer we also use some tools you might not be familiar with like basil but we use it for very good reasons you can also explore our readmes if you go to that folder for all the different plugins we wrote originally now with tensorboard the reason why this is a little bit um just a little bit more challenging compared to some of the other web application you may have used in the past or written in the past is we deal with very challenging requirements like this thing needs to work offline um it needs to be able to build regardless of like corporate or national firewalls that may block certain urls when it's downloading things for example one of the first things i did when i joined the tensorboard team wasn't actually visualizing machine learning but rather adding a contribution to bazel which helps downloads be carrier grade internationally um and there there are a whole variety of challenges like when it comes to an application like this but those burdens are things we've mostly solved for you and here is a concrete example writing that toilsome thousand line file was what it took to make tensorboard look beautiful offline anywhere in the world without having to ping fonts.google.com and that was that's just one of the many burdens that the tensorboard team carries on behalf of plug-in authors now i want to give you a quick introduction for sean xing who is the author of this tensorflow debugger and with the help of chi xing so as i mentioned earlier tensorboard has always done a good job being the flashlight that gives us broad overviews of what's happening on inside these black boxes of models what the tensorflow debugger plug-in does is it turns that flashlight into an x-ray using this plug-in you you can literally watch the tensors as they flow in real time while having complete control over the entire process this x-ray is what's going to make it possible for you to pinpoint problems we've previously found difficult to identify perhaps down to the tiniest nan at the precise moments when they happen that's why we call it an x-ray it reveals the graph of math beneath the abstractions we love such as keras or estimator or as was recently announced today swift whichever tools you're using this can potentially be a very helpful troubleshooting tool so i'd like to introduce its author shanxing who can show you a demo thank you very much okay so um in a moment the screencast will start great thank you justine for the um generous intro so i'm shanxing and i'm very glad and honored to present the featured plug-in for tensorboard among many awesome amazing plugins that people have created for tensorboard so far so the debugger plug-in for some of you who may know tensorflow debugger or tfdbg it's the graphical user interface of tfdbg and tfdbg has had only a command line interface until recently so like the command line interface the debugger plug-in will allow you to look into internals of a running tensorflow model but in a much more intuitive interactive and richer environments in the browser so in this example um in this talk i'm going to show you two examples one example of how to use the tool to understand and probe and visualize a working model that doesn't have any bugs in it i'm also going to show you how to use the tool to debug a model with a bug in it so you can see how to use the tool to get to the root cause of problems and fix them so first let's take a look at the first example and that's the example being shown on the right part of the screen right now so it's a very simple tensorflow program that does some regression using some generated synthetic data and if we run a program in the console we can see it works we can see a constant decrease in the loss value during training now even though the model works we have no knowledge like how the model works and that's mainly because in graph mode session.run is a black box that wraps all the computation in one single line of python code now what if we want to look into the model what if we want to look at the matrix multiplication results in a dense layer and what if we want to look at the gradient on the mat model operation and so forth the tensorflow debugger or tensorboard debugger plug-in is a tool can allow you to do that so to start the tool we start the usual tensorboard binary with a special flag debugger port so in this case we specify the port to be 7000 now once the battery is running we can navigate to our tensorboard url in the browser now at this startup um the plugin tells you that it's waiting for connections from tensorflow session.run and that's because we haven't started the program yet it also provides you code snippets that you can use whether you are using tf.session estimators or keras models and in this example we're using tf.sessions so we're going to copy and paste two lines of code into our program the first line is an import line and the second line is a line that wraps your original tf.session object with a special wrapper that has the information of where to connect the port number now without our um with our program instrumented we can start the program again now as soon as the program starts we can see the graphical user interface in the browser switch to a mode that shows you um the graph of the session.run in two ways in a tree view on the left and in a graph on the right on the bottom left corner you can also see what session. is currently executing the tree structure on the right corresponds to name scopes in your model for example the dance layer um corresponds to the dance name scope corresponds to the dance layer one thing you can do is to open the source code view to look at the correspondence between your graph nodes and which lines of the python program created those nodes if you click matmal for example you can see the corresponding node in the graph view you can also see which line of the python source code is responsible for creating that node in this case it's where we call dance layer as expected if you click another if you click the loss tensor you will see the corresponding node in the graph again and you can also see on where it's created in the python source code and it's where we call the mean squared error and the gradients name scope corresponds to the back propagation part of the model so that's how the model is trained you can click around poke around and explore how a tensorflow model does optimization and backup propagation for if you're interested and these nodes are all created when we call gradient descent optimizer you can continue to any node of a graph and pause there so we have just continued to on the math model note in in the dense layer you can also continue to the gradient on the mat small node which we just did and in the bottom right corner of the screen you're looking succinct summaries of the tensor values we have continued to you can look at their data type their shape and also the range of their values so um in the health pills in the so-called health pills you can look at how many of those values are zero or negative or positive and so forth if you're hovering your mouse cursor over those health pills you can get more information such as the mean and the standard deviation of the values in the tensor so next we can click these links to open a detailed view of those tensors so in these detailed views you can apply numpy style slicing to reduce the dimensionality so it's easier to look at the values of high dimensional tensors so we have just reduced the dimension from two to one so we're looking at the value as a curve now next we're going to continue to the last tensor which is a scalar and yeah it's a scalar and the shape is an empty list as we can see here we can switch to the history full history modes we can look at how the value changes as the model is being trained so um with the full history mode enabled we're going to continue over a number of session dot runs like 50 of them so we can see in real time how the loss value decreases and how the values on the maximum and its gradient change so that's how you can use the tool as an x-ray or x-ray animator for your models to to have a better understanding of how your model works so next let's take a look at an example of a broken model so that's the debug mnist model that we ship with tensorflow that's the only broken model we shift we ship with tensorflow as far as i know and i'm proud to be the author of it and if we run the model we can see the model doesn't quite work after two iterations of training the accuracy gets stuck at about ten percent now we suspect that there might be some bad numerical values like nan not a number or infinities in the model but we're not sure on which nodes of the graph are responsible for generating those nands and infinities so to answer that question we can use the debugger plug-in tool we start a banner again and we do a refresh in our browser and then we can start our debug m example with a special flag to connect to the debugger plugin so again um we're looking at the graph now in order to find the nodes responsible for the infinities or nets we can check the checkpoints to activate watch points for all the tensors and we can use the conditional breakpoint feature to continue running the model until any tensor contains infinity or nands so right now you're seeing a list of tensor values those are a complete list of tensors um involved in training the model so in a moment the model is going to stop and that's because it has hit an infinity in the tensor called cross entropy slash log now we can see in the health pill we can also see in the detailed tensor view those orange lines and those show you the infinity values now the question is why do those infinity values happen so we can go back to the source code and find the line of python code where it's created and that's why we call tf.log we can also open up the graph view and we see the input so we can trace the inputs in this case the input is the softmax tensor so we can click expand and highlight to look at the value of the input to log which is softmax and now we know that there are indeed five values of the tensor which are zero and the reason for the infinity is because we're taking log of zero now with that knowledge we can go back to our source code and fix it although we're not going to do this in this demo here all right so that's the tensorboard debugger plugin and i encourage you to use it explore it and hopefully it will help you understand your model better and help you fix bugs much more quickly you can just use this simple command line tensorboard with a special flag and with that i would like to hand this back to justine well thank you sean xing i thought that was a really interesting demo and um it is and it was a great leap forward for tensorboard and it really shows that one of the things we've been doing recently is rather than being a simple read-only reporting tool we're trying to explore more interactive directions as we've shown you today and this is something that folks who are productionizing tensorboard such as kubeflow should take into consideration we also want to attract more contributors we have two approaches for this where you can develop an official repo and send us pull requests we do our work in the open this does need approval on security footprint etc and there is an escape hatch if that doesn't work out you can independently develop plug-ins you can create custom static builds without anyone's approval you can do whatever you want because part of the goal on this team is to liberate the tools with that said i just want to thank all of you for attending and i want to thank those of you watching on youtube if you like this talk please hashtag twitter or you know reach out thank you again [Music] you
Original Description
Watch this demo of the TensorFlow Debugger, an interactive web GUI for controlling the execution of TensorFlow models, setting breakpoints, stepping through graph nodes, watching tensors flow in real-time, and pinpointing problems down to the tiniest NaN. This tool now comes included with TensorBoard via its open plugin API.
Speakers: Justine Tunney and Shanqing Cai
TensorFlow Dev Summit 2018 All Sessions playlist → https://goo.gl/Lsaq1R
Subscribe to the TensorFlow channel → https://goo.gl/ht3WGe
event: TensorFlow Dev Summit 2018; re_ty: Publish; product: TensorFlow - General; fullname: Justine Tunney, Shanqing Cai; event: TensorFlow Dev Summit 2018;
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from TensorFlow · TensorFlow · 13 of 60
1
2
3
4
5
6
7
8
9
10
11
12
▶
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
The TensorFlow YouTube Channel is Here!
TensorFlow
Answering Your TF Questions #AskTensorFlow
TensorFlow
Chatting With the TensorFlow Community (TensorFlow Meets)
TensorFlow
All About TensorFlow Code (Coding TensorFlow)
TensorFlow
TensorFlow: an ML platform for solving impactful and challenging problems
TensorFlow
Keynote (TensorFlow Dev Summit 2018)
TensorFlow
tf.data: Fast, flexible, and easy-to-use input pipelines (TensorFlow Dev Summit 2018)
TensorFlow
Eager Execution (TensorFlow Dev Summit 2018)
TensorFlow
Machine Learning in JavaScript (TensorFlow Dev Summit 2018)
TensorFlow
Training Performance: A user’s guide to converge faster (TensorFlow Dev Summit 2018)
TensorFlow
The Practitioner's Guide with TF High Level APIs (TensorFlow Dev Summit 2018)
TensorFlow
Distributed TensorFlow (TensorFlow Dev Summit 2018)
TensorFlow
Debugging TensorFlow with TensorBoard plugins (TensorFlow Dev Summit 2018)
TensorFlow
TensorFlow Lite (TensorFlow Dev Summit 2018)
TensorFlow
Searching Over Ideas (TensorFlow Dev Summit 2018)
TensorFlow
Reconstructing Fusion Plasmas (TensorFlow Dev Summit 2018)
TensorFlow
Nucleus: TensorFlow toolkit for Genomics (TensorFlow Dev Summit 2018)
TensorFlow
Open Source Collaboration (TensorFlow Dev Summit 2018)
TensorFlow
Swift for TensorFlow - TFiwS (TensorFlow Dev Summit 2018)
TensorFlow
TensorFlow Hub (TensorFlow Dev Summit 2018)
TensorFlow
Applied AI at The Coca-Cola Company (TensorFlow Dev Summit 2018)
TensorFlow
Real-World Robot Learning (TensorFlow Dev Summit 2018)
TensorFlow
TensorFlow Extended (TFX) (TensorFlow Dev Summit 2018)
TensorFlow
Project Magenta (TensorFlow Dev Summit 2018)
TensorFlow
TensorFlow Dev Summit 2018 - Livestream
TensorFlow
Introducing TensorFlow Lite (Coding TensorFlow)
TensorFlow
TensorFlow Dev Summit 2018 Highlights
TensorFlow
Jeff Dean, Head of AI at Google discusses the impact of ML (TensorFlow Meets)
TensorFlow
TensorFlow Mobile vs. TF Lite and More! #AskTensorFlow
TensorFlow
Using TensorFlow to enable research & production across many fields (TensorFlow Meets)
TensorFlow
Teaching TensorFlow for Deep Learning at Stanford University (TensorFlow Meets)
TensorFlow
TensorFlow Lite for Android (Coding TensorFlow)
TensorFlow
Using the tf.data API to build input pipelines (TensorFlow Meets)
TensorFlow
Training Models in the Cloud & the Benefits of AI Toolkits #AskTensorFlow
TensorFlow
Execute operations immediately with TensorFlow's Eager Execution (TensorFlow Meets)
TensorFlow
TensorFlow Lite for iOS (Coding TensorFlow)
TensorFlow
Get started with TensorFlow's High-Level APIs (Google I/O '18)
TensorFlow
TensorFlow for JavaScript (Google I/O '18)
TensorFlow
TensorFlow in production: TF Extended, TF Hub, and TF Serving (Google I/O '18)
TensorFlow
Get started with TensorFlow's High-Level APIs in 5 mins | Google I/O 2018
TensorFlow
TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)
TensorFlow
TensorFlow Lite for mobile developers (Google I/O '18)
TensorFlow
Advances in machine learning and TensorFlow (Google I/O '18)
TensorFlow
Distributed TensorFlow training (Google I/O '18)
TensorFlow
Classification using neural networks & ML regression models #AskTensorFlow
TensorFlow
TensorFlow and Keras in R - Josh Gordon meets with J.J. Allaire (TensorFlow Meets)
TensorFlow
Focus on your experiment with TensorFlow Estimators (TensorFlow Meets)
TensorFlow
How to get started with AI/ML, retraining models, & more! #AskTensorFlow
TensorFlow
TensorFlow - the deep learning solution for mobile platforms (TensorFlow Meets)
TensorFlow
MiniGo: TensorFlow Meets Andrew Jackson (TensorFlow Meets)
TensorFlow
The growth of TensorFlow with added support for JS & Swift (TensorFlow Meets)
TensorFlow
At the intersection of TensorFlow & nuclear physics (TensorFlow Meets)
TensorFlow
NVidia TensorRT: high-performance deep learning inference accelerator (TensorFlow Meets)
TensorFlow
Try TensorFlow.js in your browser (Coding TensorFlow)
TensorFlow
TensorFlow Hub: reusing machine learning modules (TensorFlow Meets)
TensorFlow
How to use TensorFlow in PyCharm (TensorFlow Tip of the Week)
TensorFlow
Training models faster with TensorFlow Hub (TensorFlow Meets)
TensorFlow
Prepare your dataset for machine learning (Coding TensorFlow)
TensorFlow
Using ML to predict insulin use for Type 1 Diabetes (TensorFlow Meets)
TensorFlow
TFX: an end-to-end machine learning platform for TensorFlow (TensorFlow Meets)
TensorFlow
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Common Next.js Errors (and How I Solved Them)
Dev.to · gary killen
Applying Scalability in Backend (CodeBuddy)
Medium · LLM
Why Every Backend Developer Should Learn Nginx Before Going to Production
Medium · DevOps
Connecting Frontend to Backend: A Backend Engineer’s Reality Check
Medium · Programming
🎓
Tutor Explanation
DeepCamp AI