Inside TensorFlow: Quantization aware training

TensorFlow · Beginner ·🧬 Deep Learning ·5y ago

Skills: LLM Engineering70%ML Pipelines60%

Key Takeaways

Quantization aware training in TensorFlow using TensorFlow/Keras API

Full Transcript

so hi everyone uh my name is pulkit i am from the tensorflow model optimization toolkit team and i'm here to talk to you about quantization of their training today um this is the work of several people within my team um once i'm doing windows on the call and uh and it's also built on the work a lot of other people have done i'm just presenting it so let's get started um so the plan for the next star is that i take you through the fundamentals of quantization i'll then talk about the tensorflow slash keras api that we use to achieve it and based on how much time we'll have we can dig into the internals of how it has been implemented uh but first to kind of set some background like just why bother with something like this why did we need this in the first place so the kind the core reason why we do most of the work we do is to optimize ml modules to optimize ml models so that we end up with smaller and faster models and the optimization process is basically nothing but the process of transforming your machine learning model so it is more efficient to execute this results in faster compute lower memory disk and battery usage which is all around beneficial to the user experience and it has serious benefits to user users it can unlock cases which are otherwise not possible on device speed recognition uh mobile vision on the phone etc and this is kind of where the model optimization toolkit comes in uh it is a suite of tensorflow ecosystem tools and different techniques that we use to optimize the model so they're smaller and faster our goal is to ensure that it is general across different models in different hardware you can find us at this link and this kind of brings me to conversation because quantization is probably the most important technique that we have in our community and just to kind of before we dive a little bit deeper into quantization why is it important because it can achieve like pretty awesome things like you can end up with modules uh based on the quant scheme that we use models which are 4x smaller 1.5 x to 4x faster compute lower power consumption and not just that quantization uh because you can often implement it using fixed point only operations it allows for execution on specialized accelerators like https psps etc so you get an additional benefit now kind of digging into like just how like quantization works uh basically your ml model is kind of like a data flow graph of like different tensor combinations right and you've got static parameters and like your dynamic parameters what quantization does is that it actually drops your static parameters to lower position so instead of being in like floating point 32-bit you have like intake parameters which are fixed point and also 8-bit instead of 32-bit also then you can actually execute your operations in your precision as well so you actually executed operations in let's say instead of floating point and again this allows you to this has the possibility to kind of run the entire model in lower fixed point precision which is much faster and this is kind of the type of quantization that we use which is like a uniform quantization which is kind of like a linear map from a well defined like limited range floating point space to kind of utilize the entire indeed space so the floating point uh range is like really really large you generally don't need that much or your weights are typically in a much smaller range and if you can map that range to let's say in this case 235 buckets you just kind of do a linear amount of these weights now so i mean it seems like everything is great we should just be able to go home you can basically execute all your modules in indeed indeed representation including computations and that should just be great right unfortunately there's no free lunch this does come at a cost and that cost is that quantization is lossy uh it is a lossy compression method you're going from floating point 32 bit precision to 8-bit precision to 2 to the power like to the order to the power 32 and this results in losses and these are various different types of losses and i'll go through all of them so you can kind of understand uh how these different horror problems manifest themselves so one is you've got information loss because you are representing weights uh as in date which are in actually floating point 32 so you are you basically have fewer buckets of information so your information representation loss is there the other is your computations are happening in intake at lower precision and then they get accumulated between 32 so we have computation loss so you can imagine that you actually let's say are taking two numbers which are imperfectly represented in your position when you're adding to imperfect numbers so the computation also then incurs a loss once you actually accumulate to in 32 you need to drop down to indeed again there's a rescale loss the final thing is that your when you actually convert your model for you know invitation you have inference optimizations that your converter or compiler creates and these optimizations well when you're in the floating point domain that is actually just fine but when you drop down to intake you can incur losses because your inference path and actual original path are a bit different one example is really fusing when you fusing their image so uh what is the problem with this uh well losses lead to accuracy drop and a fast but inaccurate model is not that helpful so quantization has amazing benefits for users in terms of performance but we need a way to get the best of both worlds and this is kind of where quantization comes in like the holy grail is very good performance and you also get accuracy and that's kind of what quantization is now quantization aware training what is it uh in a nutshell it is basically a training time technique to improve the accuracy of quantized models and in a very brief way the way it does it is that it introduces inference time quantization errors during the training so that the model training process and the optimization learn so robust parameters around that loss uh to delve in a bit deeper the way it actually kind of recovers this lost accuracy is is through a step of processes one is that the goal is to make the training path as similar as possible to the inference path once you do that you can kind of mimic the errors that the model experiences during influence so during influence whatever errors your model is experiencing all the different types of losses etc that we talked about those errors get experienced during training in the forward pass and once you actually have that well then your trainer you know backprop does its magic and optimization does its magic and then your trainer can learn parameters which uh which work well around it most of this kind of the conceptually all of this kind of innovation came out of like scramantas from the google mobile vision team where like if you actually look at the second bullet mode that's the reference paper where scamandas and a bunch of others worked on quantization of trade and training of neural networks uh for like various vision models and what we kind of do is like basically take those concepts and productionize it in a way that tensorflow users can really get the benefit of it uh the original like tf1 tooling in case you guys know it was created by swahili intense flow content quantize so uh now uh just uh like we discussed earlier our goal is to model the inference path and mimic the errors right so how do we kind of go about doing it one of the core tools to mimicking the errors is a concept called fake point where what we do is that in the forward pass of your uh model training process uh you're actually doing your model training in floating point but because the influence is in indeed or a lower position you want to kind of emulate that so what you do is that you take your tensors you the which are in float you drop them down to lower position let's say indeed and then you convert it back to floating point so what that does is that you actually introduce the quantization error that happens you're going to a lower precision and whatever loss you have that loss gets introduced uh the other very interesting point to kind of keep in mind is that uh let's say you introduce them once you introduce the losses in both your inputs and the weights both the floating point you drop them down to in indeed you bring them back into floating point uh they are actually the exact representation so what happens is that your floating point number is actually now mapping one to one to an intake bucket and because of that one is you're mapping the losses exactly but the other interesting thing is that now when you do your matmals uh your computation exactly limits what's happening in a date because all your floating point numbers line uh neatly with all the buckets uh in indeed uh the other part is that we need to kind of model the influence part so the way to model the inference path is that when your influence optimizations can fuse reload activations they can fold the bash normally to pawn they can do a bunch of these different types of optimizations what qit does is that it applies the same or similar transformations to your training graph and adjusts the fake insertions so so what happens is that your forward pass now actually really mimics or emulates the inference path so so you actually ensure that your fake ones etc are placed accordingly and once we kind of delve into these things later you'll also get to learn that we can actually configure these transforms as well so we allow you to be able to do that and it works it works pretty well so for example if you look at some of these vision models uh in most cases the once you apply quantization during training uh the accuracy is almost exactly the same as floating point there'll be like some minor losses but it's largely the same and and that's a big win because what this means is that now you can actually execute fully impaired models on uh specialized hardware and basically get floating point accuracy so all of this was kind of about like what quantization aware training is and what the concept is uh now we kind of dig into hey how is this api used how is this api built and how you can work with it just fyi though or please feel free to interrupt me at any point and you can ask me questions and i'll be happy to take them and i can obviously also take them at the end uh the other is now uh so one of the core principles that has kind of guided our design is that we want the easy to be like absolutely easy very quick to kind of get going with it but also make it possible to do very different things quantization away training and keras like we kind of broke it down into kind of three different types of like users one is users like most like app developers ml engineers who just want like write one or two lines train deploy to device and just kind of go ahead with it and and we have like use cases which can support that then you've got like mlm engineers and maybe researchers want to kind of con configure like individual layers and maybe some parts of the algorithm and then you've got like hardcore like chemical researchers and hardware designers who want to like use custom quantization algorithms target different types of hardware packets that are very very precise level and there's ways to kind of address all of these three so to start with the most basic use case uh what if you want to quantize an entire model right so this is typically what a keras model looks like red you import tensorflow you build a model you throw in a bunch of layers you compile the model you start training it right and uh now you want to you know train this with quantization so how do you do it um it's basically this that's it you just import our package the tensorflow model optimization and you say hey i want to quantize this model you pass in the existing model and then all of the rest of your code stays exactly the same we take care of pretty much everything uh ensure cloning the model ensuring the model has all the right transformations the right fake one placements etc you just do this and you can go ahead with it so it's like very entry level very simple uh you don't need to think too much about it they get towards deployment but as you kind of go further you might want to tweak things a little bit more like i don't want to maybe run my entire model as indeed uh or like quantize i only want like some parts to be quantized or some parts of my models computation actually very sensitive and i want to keep them in floating point so in that case you might want to quantize only a subset of your body and in that case the way you kind of go ahead with it is use kind of like a two-layer api so it's still very similar to the earlier case but now what you do is that other than saying hey i want to just quantize the model you first say i want to quantize annotate these specific layers so for example in this model you've got four less you're saying i want to quantize the convly and the relu but don't bother me about the other two layers and that's enough and then you kind of say hey apply quantization to this model and again the quantized model that you get out of it uh you can just take it compile train go ahead with your work and it ensures that only the optimizations and all the operations are applied to only the subset of layers uh that you are interested in and it handles the whole like data flow graph and all of that stuff for you the whole time hey pocket just to confirm the use case for selected annotation um yes why don't we just always annotate everything um so if you actually look at the previous case that is kind of the case where it does annotate everything like when you do quantize module it actually it means that you want to annotate all the layers so it will actually apply conversation to all the layers it is in the case where you want to only quantize a subset of the layers that we want the user to have the ability to say that i only want let's say i have a model which like hundred lines i want to only quantize like 10 of those layers what's the use case for that because i thought you know if you it quantize you for model quality reason etc you may want to just always quantize the whole thing uh no so what you might want to do is you might want to run part of your model and floating point and part of your module with let's say quantized operations if your model is so it could you could look at it from two angles one you could say that hey there is a certain subset of my module that is uh very performance sensitive so you can only run that part of the model as input or you could look at it in the other way where there's a certain subset of my model that is very quality sensitive where if you actually introduce losses that it hurts the quality a lot so in that case i'll leave that as floating point and i'll do the rest as indeed so instead of actually breaking up your model into different models uh which would actually be a lot of work for the user in this case you just quantize the specific part and our converter handles that actually yeah this makes sense um in practice do we need to provide users some guidance on which parts they need to pick like should there be some ex prof advisor or some output to tell them which parts they may want to quantize that's a very interesting point actually um so currently uh we don't provide any guidance for that part of the reason is that it actually varies quite largely across different models um uh different parts of models can be very sensitive for very different uh models so it's hard to like have like a kind of heuristic that works but you're right in that at least performance uh profiling is quite easy and that is something that we should be able to give users numbers uh pretty reasonably and we could also like we collect some metrics but we could also potentially bubble up uh some tensorboard style metrics uh which give a user some sense of maybe what layer is contributing uh most to the drop in accuracy that they're seeing but to tie back the loss to what layer is it's not always that straightforward oh sorry so additionally it's like it's not even just model varying it's you can use the same model on many different tasks and the accuracy could be changing a lot based on which task you're trying to solve so usually it's something that really needs the product team that's using this their expertise to decide which parts are the accuracy sensitive parts and where are they willing to spend a little bit more time to get good accuracy yeah i think providing the kind of mechanism is a nice foundation a possible product feature to build upon is some kind of auto quantization where users just specify some performance and quality constraints and we can do our own search as part of the training or that the fine tuning process to customize it that's a very good idea in fact one of the proposals that at some point we had running was an autoimmune based system to mix and match quantization sparsity compression and a bunch of these things but i think what we kind of realized was that we need to build more of the building blocks before we get to that but but but thanks so yeah so kind of continuing on this uh now the other thing that uh so just one quick comment on the previous one what you can also do is that you can quantize the entire model but you can override the quantization behavior in some layers and we will kind of see that so so let's say if you want to custom quantize it you have a bunch of different layers and all the layers you define but for some layers you want to quantize them in a very specific way and in that case we provide you with the ability to kind of quantize that layer exactly how you want and the way that works is again the api is very similar to what you were doing earlier but now you add an additional parameter so when you say quantize annotate layer my conf 2d you also say that hey uh this is the con this is the quantize config i need you to use and this quantize config will basically tell the layer how i want to quantize this specific layer and now we can take a look at what this kind of config looks like this config largely has like two important functions which is the get weights and quantizers and get activations and quantizers now any layer internally has a bunch of weights and a bunch of activations and you can kind of choose how you want to quantize them now if you kind of look at these functions it basically tells two things so as a layer you basically want to inform the quantization infrastructure about two things one is what is it that i want to be quantized so if i am a layer i may have let's say five weights kernel recurrent kernel etc etc and i may only want to quantize the kernel not the recurrent kernel and how i want to quantize them and the how is kind of encapsulated in this object called quantizer we'll delve into that a little bit later but for now it's just enough to understand that it is just an object that can quant that knows how to quantize a particular pencil so you just kind of pass that object and and it's kind of possible to configure it so for example in this case when you construct the weight quantizer you're saying that i wanted to be quantized with four bits instead of the default eight so so that's what you do so you have a you say i want to quantize this layer this is the quantized conflict and this quantized conflict specifies everything relevant to that layer and it's basically what the quantize uh moving on from there the same thing can be extended to quantizing your own layer so in the way that you quantize a layer which exists differently you can use the same approach to quantize your own custom layer so let's you have a fancy point b which contoured which has uh specific mathematical operations which are unique to your domain you can just pass it a quantized conflict just the same as you did earlier and you let's say specify the same types of functions weights and quantizers etc and you kind of go ahead and and now we can kind of get into the quantizer object itself so the way the quanta is config is to a layer it controls how the layer gets quantized the quantizer object is to a tensor it controls how a tensor gets condensed and it's a very simple object it basically has two functions it has a build function in the call function it follows the same kind of life cycle of keras layers so the build creates any variables that you need so for example a quantizer might need a range so you construct a min and bar variable min and max variable right and you can do that and you get the tensor shape and you can kind of choose whatever kind of variable that you want right and in the call that is when the graph is being constructed that is where you get like an input tensor you can modify it however you want and then you get like a resulting tension and you kind of pass that through and that's the quantized answer and you can do it differently based on like training time and work you can basically put whatever fancy algorithm that you want over here you can use like local state to construct a histogram and take a range based on that and apply it clipping based quantize and anything the other kind of crucial thing is there is a concept of model transforms and if you kind of paid attention a little bit earlier we were talking about how we want the training path to mimic the influence path and the training path and the influence part might do things like batch non-fusing value fusing a whole bunch of these types of things so for that you can kind of define this thing called model transforms again by default we provide you a repository of model transforms which take care of the built-in layers for the 8-bit default tensorflow quantization scheme that exists but if you want to do more fancy stuff you can write your model transforms and this is like a very simple example of a model transform so you define a pattern which hey if i have a batch norm which is preceded by a con or a depth wise bond then how you basically you can replace that subset of the graph and in this case we kind of have the same subset we just say that i want you to use this sort of quantized content and you can kind of provide that metadata and this is roughly the entire kind of qad api it gives you full control over quantization define fully custom quantization schemes specify how any layer should be quantized choose your own model transformations etc [Music] and in kind of summary like basically quantization aware training it allows you to recover model accuracy while getting the benefits of quantization it's a flexible api easy for getting stuff done but also allows for like a lot of flexible experimentation and allows you to simulate quantization loss of various different packets and schemes so kind of the main uh one of the really cool features is that if let's say you come up with kind of like a new hardware which has its completely custom scheme of operations different number of bits for cons different number of bits for lstms etc you can basically define all of that using uh some of the building blocks that we provide and even the model transformation stuff we'll talk a little bit more about it but it's basically like a full graph transformation library and it's kind of modeled after uh graph transformation in terms of you know so you can basically just apply that at one level uh higher uh if there are any questions i can answer but after this we can kind of dig into some aspects of how this works or how this is implemented internally uh to use this basically whatever i've discussed so far is all you need to know you don't need to know anything beyond that uh whatever comes forward is just basically if you're interested in knowing how it works internally if you're interested in building your own tools on top of kerosene so uh how are the qe keras api has implemented it basically uses like a lot of keras hackery one of kind of the core and basically the way it's designed is that it does a lot of injections into the keras layers as they are constructed so that the constructed graph that you have is condensed and one of the co-design principles was that you should be able to do this in the build process of the model because once you build it then that graph that you have that should be no different from any other graph that you have so all your training eval storage code or matrix code should just work the same it's basically no different than any etf graph that you have it's just that when you build your model you get a new model out which has all the goodies inside um and we want users to be able to apply it to their existing models and also you know reuse like existing model building code to kind of understand how this works you need to understand some of the core careless abstractions uh the core keras abstractions are basically like a layer and a model a layer basically represents a neural network layer and what it does is it basically constructs a chunk of tf graph uh using tensors and ops and a model is basically a network of less and it has a trainer email capability etc and a wrapper is basically like it's a layer itself but it's like in kind of object-oriented terminology it's like the most basic like kind of like delegate slash adapter pattern which basically allows you to inject into the behavior of a layer it kind of wraps in there so you can control the operations preceding a layer construction and after the layer construction layer in keras has like a very particular life cycle and that life cycle is important because that's how some of the other things get modeled based on that you've got an in it which is basically simple like plain old python object construction you've got the build method which basically lazily constructs the tf variables it's kind of like a constructor for the df graph signature it constructs your variables has the shape computation those sorts of things and then there is the call where you actually give it tensors and it constructs the df graph and you would like simply write like a simple keras layer like this let's say you have a linear layer you have a constructor you define the params in the build you construct the weights and then you have the call function you actually apply tf ops on the incoming tensors and you construct the graph if let's say you're constructing a model uh you define your model and that's when the in it is called when you actually build your model you pass it to shape that's when all the variables have been constructed that's when the layered build is called when you do a fit predict evaluate custom training loop whatever that's when the call is involved and the reason i'm saying all of this is because this kind of takes us to the wrapper where the wrapper is basically a layer which has a layer as a parameter so now when you have a build function you can do whatever you want before you build the layer and you can do whatever you want after you build it so you can inject your own variables and similarly with the call function you get their input tensors you can add other things to your graph that you want then you can construct the layer then add things after the graph has been constructed so one simple example would be let's say if you wanted to write like a clipping wrapper which just clips the tensor so you define a min and max in your build function your input tensors come in you clip the input tensor throw away the extremist weights and then you do a layout call with same tensor so everything else basically stays the same and this is something very simple but this is basically what most of the mod infrastructure kind of uses very heavily so we have a quantized wrapper for acuity we have pruning labor for sparsity we'll have other wrappers for various types of algorithms but basically the idea is that these wrappers modify graph construction the reality is a bit more complex especially for quantization because like you saw there's like the model transforms and fake points and all of that stuff that it does the model transformer is basically similar to the graph transform tool for tensorflow you can define tree like sub graphs and do full uh subgraph transformations on uh keras models so you can have like a directed acyclic graph based network and you can define a subset and you can replace that subset with what you want you can update point parameters it's kind of cool you can like keep applying a whole bunch of transformations it is actually independent of it's part of the model optimization toolkit as of now but we plan to move it to core cares it's independent of all of this stuff you can actually use it for general transformations on keras models and that's it so uh you can use these links for the quantization documentation um you can find us on uh github tensorflow model optimization uh feel free to reach out to us we actively working with clients there are a whole bunch of like kind of interesting use cases we've seen for so far from like micro to server to on device vision non-vision etc so yeah very happy to help [Music] you

Original Description

In this episode of Inside TensorFlow, Software Engineer Pulkit Bhuwalka presents quantization aware training. Pulkit will take us through the fundamentals of quantization aware training, TensorFlow/Keras API used to achieve this, and how it is implemented during this tutorial. Documentation → https://goo.gle/32MN60q Github → https://goo.gle/30YihDB Add the Inside TensorFlow playlist → https://goo.gle/Inside-TensorFlow Subscribe to the TensorFlow channel → https://goo.gle/TensorFlow

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from TensorFlow · TensorFlow · 0 of 60

← Previous Next →

The TensorFlow YouTube Channel is Here!

The TensorFlow YouTube Channel is Here!

Answering Your TF Questions #AskTensorFlow

Answering Your TF Questions #AskTensorFlow

Chatting With the TensorFlow Community (TensorFlow Meets)

Chatting With the TensorFlow Community (TensorFlow Meets)

All About TensorFlow Code (Coding TensorFlow)

All About TensorFlow Code (Coding TensorFlow)

TensorFlow: an ML platform for solving impactful and challenging problems

TensorFlow: an ML platform for solving impactful and challenging problems

Keynote (TensorFlow Dev Summit 2018)

Keynote (TensorFlow Dev Summit 2018)

tf.data: Fast, flexible, and easy-to-use input pipelines (TensorFlow Dev Summit 2018)

tf.data: Fast, flexible, and easy-to-use input pipelines (TensorFlow Dev Summit 2018)

Eager Execution (TensorFlow Dev Summit 2018)

Eager Execution (TensorFlow Dev Summit 2018)

Machine Learning in JavaScript (TensorFlow Dev Summit 2018)

Machine Learning in JavaScript (TensorFlow Dev Summit 2018)

Training Performance: A user’s guide to converge faster (TensorFlow Dev Summit 2018)

Training Performance: A user’s guide to converge faster (TensorFlow Dev Summit 2018)

The Practitioner's Guide with TF High Level APIs (TensorFlow Dev Summit 2018)

The Practitioner's Guide with TF High Level APIs (TensorFlow Dev Summit 2018)

Distributed TensorFlow (TensorFlow Dev Summit 2018)

Distributed TensorFlow (TensorFlow Dev Summit 2018)

Debugging TensorFlow with TensorBoard plugins (TensorFlow Dev Summit 2018)

Debugging TensorFlow with TensorBoard plugins (TensorFlow Dev Summit 2018)

TensorFlow Lite (TensorFlow Dev Summit 2018)

TensorFlow Lite (TensorFlow Dev Summit 2018)

Searching Over Ideas (TensorFlow Dev Summit 2018)

Searching Over Ideas (TensorFlow Dev Summit 2018)

Reconstructing Fusion Plasmas (TensorFlow Dev Summit 2018)

Reconstructing Fusion Plasmas (TensorFlow Dev Summit 2018)

Nucleus: TensorFlow toolkit for Genomics (TensorFlow Dev Summit 2018)

Nucleus: TensorFlow toolkit for Genomics (TensorFlow Dev Summit 2018)

Open Source Collaboration (TensorFlow Dev Summit 2018)

Open Source Collaboration (TensorFlow Dev Summit 2018)

Swift for TensorFlow - TFiwS (TensorFlow Dev Summit 2018)

Swift for TensorFlow - TFiwS (TensorFlow Dev Summit 2018)

TensorFlow Hub (TensorFlow Dev Summit 2018)

TensorFlow Hub (TensorFlow Dev Summit 2018)

Applied AI at The Coca-Cola Company (TensorFlow Dev Summit 2018)

Applied AI at The Coca-Cola Company (TensorFlow Dev Summit 2018)

Real-World Robot Learning (TensorFlow Dev Summit 2018)

Real-World Robot Learning (TensorFlow Dev Summit 2018)

TensorFlow Extended (TFX) (TensorFlow Dev Summit 2018)

TensorFlow Extended (TFX) (TensorFlow Dev Summit 2018)

Project Magenta (TensorFlow Dev Summit 2018)

Project Magenta (TensorFlow Dev Summit 2018)

TensorFlow Dev Summit 2018 - Livestream

TensorFlow Dev Summit 2018 - Livestream

Introducing TensorFlow Lite (Coding TensorFlow)

Introducing TensorFlow Lite (Coding TensorFlow)

TensorFlow Dev Summit 2018 Highlights

TensorFlow Dev Summit 2018 Highlights

Jeff Dean, Head of AI at Google discusses the impact of ML (TensorFlow Meets)

Jeff Dean, Head of AI at Google discusses the impact of ML (TensorFlow Meets)

TensorFlow Mobile vs. TF Lite and More! #AskTensorFlow

TensorFlow Mobile vs. TF Lite and More! #AskTensorFlow

Using TensorFlow to enable research & production across many fields (TensorFlow Meets)

Using TensorFlow to enable research & production across many fields (TensorFlow Meets)

Teaching TensorFlow for Deep Learning at Stanford University (TensorFlow Meets)

Teaching TensorFlow for Deep Learning at Stanford University (TensorFlow Meets)

TensorFlow Lite for Android (Coding TensorFlow)

TensorFlow Lite for Android (Coding TensorFlow)

Using the tf.data API to build input pipelines (TensorFlow Meets)

Using the tf.data API to build input pipelines (TensorFlow Meets)

Training Models in the Cloud & the Benefits of AI Toolkits #AskTensorFlow

Training Models in the Cloud & the Benefits of AI Toolkits #AskTensorFlow

Execute operations immediately with TensorFlow's Eager Execution (TensorFlow Meets)

Execute operations immediately with TensorFlow's Eager Execution (TensorFlow Meets)

TensorFlow Lite for iOS (Coding TensorFlow)

TensorFlow Lite for iOS (Coding TensorFlow)

Get started with TensorFlow's High-Level APIs (Google I/O '18)

Get started with TensorFlow's High-Level APIs (Google I/O '18)

TensorFlow for JavaScript (Google I/O '18)

TensorFlow for JavaScript (Google I/O '18)

TensorFlow in production: TF Extended, TF Hub, and TF Serving (Google I/O '18)

TensorFlow in production: TF Extended, TF Hub, and TF Serving (Google I/O '18)

Get started with TensorFlow's High-Level APIs in 5 mins | Google I/O 2018

Get started with TensorFlow's High-Level APIs in 5 mins | Google I/O 2018

TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)

TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)

TensorFlow Lite for mobile developers (Google I/O '18)

TensorFlow Lite for mobile developers (Google I/O '18)

Advances in machine learning and TensorFlow (Google I/O '18)

Advances in machine learning and TensorFlow (Google I/O '18)

Distributed TensorFlow training (Google I/O '18)

Distributed TensorFlow training (Google I/O '18)

Classification using neural networks & ML regression models #AskTensorFlow

Classification using neural networks & ML regression models #AskTensorFlow

TensorFlow and Keras in R - Josh Gordon meets with J.J. Allaire (TensorFlow Meets)

TensorFlow and Keras in R - Josh Gordon meets with J.J. Allaire (TensorFlow Meets)

Focus on your experiment with TensorFlow Estimators (TensorFlow Meets)

Focus on your experiment with TensorFlow Estimators (TensorFlow Meets)

How to get started with AI/ML, retraining models, & more! #AskTensorFlow

How to get started with AI/ML, retraining models, & more! #AskTensorFlow

TensorFlow - the deep learning solution for mobile platforms (TensorFlow Meets)

TensorFlow - the deep learning solution for mobile platforms (TensorFlow Meets)

MiniGo: TensorFlow Meets Andrew Jackson (TensorFlow Meets)

MiniGo: TensorFlow Meets Andrew Jackson (TensorFlow Meets)

The growth of TensorFlow with added support for JS & Swift (TensorFlow Meets)

The growth of TensorFlow with added support for JS & Swift (TensorFlow Meets)

At the intersection of TensorFlow & nuclear physics (TensorFlow Meets)

At the intersection of TensorFlow & nuclear physics (TensorFlow Meets)

NVidia TensorRT: high-performance deep learning inference accelerator (TensorFlow Meets)

NVidia TensorRT: high-performance deep learning inference accelerator (TensorFlow Meets)

Try TensorFlow.js in your browser (Coding TensorFlow)

Try TensorFlow.js in your browser (Coding TensorFlow)

TensorFlow Hub: reusing machine learning modules (TensorFlow Meets)

TensorFlow Hub: reusing machine learning modules (TensorFlow Meets)

How to use TensorFlow in PyCharm (TensorFlow Tip of the Week)

How to use TensorFlow in PyCharm (TensorFlow Tip of the Week)

Training models faster with TensorFlow Hub (TensorFlow Meets)

Training models faster with TensorFlow Hub (TensorFlow Meets)

Prepare your dataset for machine learning (Coding TensorFlow)

Prepare your dataset for machine learning (Coding TensorFlow)

Using ML to predict insulin use for Type 1 Diabetes (TensorFlow Meets)

Using ML to predict insulin use for Type 1 Diabetes (TensorFlow Meets)

TFX: an end-to-end machine learning platform for TensorFlow (TensorFlow Meets)

TFX: an end-to-end machine learning platform for TensorFlow (TensorFlow Meets)

This video teaches the fundamentals of quantization aware training in TensorFlow and how to implement it using the TensorFlow/Keras API. It covers the basics of QAT and its importance in model optimization. By watching this video, viewers can learn how to optimize their models and improve their performance.

Key Takeaways

Import necessary libraries
Load the model
Define the quantization aware training parameters
Implement QAT using TensorFlow/Keras API
Train and evaluate the model

💡 Quantization aware training can significantly improve the performance of deep learning models by reducing the precision of model weights and activations.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train