Inside TensorFlow: TF NumPy
Key Takeaways
This video discusses TensorFlow NumPy, a library that allows existing NumPy code to run faster and leverage accelerators like GPUs and TPUs, providing auto differentiation, compiler optimizations, and distributed execution. It also supports linear algebra, signal processing, and deep learning APIs, and is available in a stable version starting from 2.4.
Full Transcript
welcome everyone i'll be talking about tensorflow numpy i'm an engineering manager on the tensorflow team and uh i'll be joined with my colleague ben wong as well now we'll talk about how to take numpy and accelerate it using tensorflow so very high level summary is that you can take a numpy code uh you can dispatch your tensorflow runtime allowing your code to run faster and also to be able to leverage accelerators like gpus and tpus on top of that you get all the benefits of the tensorflow ecosystem this includes auto differentiation compiler optimizations like operator fusion and loop auto vectorization also distributed execution onto clusters and pods of accelerators you can also use tensorflow api seamlessly for example you can use linear algebra signal processing and deep learning apis of tensorflow you can also serialize your code used by a safe model and serve them on clusters of servers or even mobile so as a numpy user this should be an exciting set of new capabilities however we believe that this is a value addition for tensorflow users as well because numpy brings in a popular stable and time-tested api with extensive documentation and even a reference implementation this api is a great fit for a whole class of problems needed for power users and researchers as well and it should nicely complement the keras deep learning apis please check out the extensive guide we have put on tensorflow.org to get started the rest of talk is structured as follows so we will briefly talk about how to quickly get started with using this and also go into what is supported versus not we'll touch upon interoperability with numpy and tensorflow and work through some example code next we'll look at how to add new tensorflow new numpy operations finally we'll dive into some case studies of using this api so so the tensorflow numpy support is available in a stable version starting from 2.4 so for now um it is available in tf nightly once you've installed that you can start using it by importing the tensorflow.experimental.numpy module and then after that you can just start writing regular numpy code using this module so the example code here for example creates a 2d tensor of random values it clips it into some range and it does sigmoid computation on the value this code can run imperatively and so you can examine the shapes and values that you computed you can also look at what device this code is being placed on this so if you look at a device property it shows that the code the data is already placed on gpu and the computation has been happening on gpu if you have one available without having the user to do anything that's all great but how well does this work does it run your code faster we did a simple benchmark for this toy problem of sigmoid computation so the graph here shows on x-axis input size and on the y-axis the time taken to compute sigmoid on an input of that size the blue line is the time taken by numpy itself the orange line is tensorflow numpy running on cpus while the red line is tensorflow numpy displaced onto gpus the green line is with some compiler optimization cpu we'll go into compiler later in the talk so if you look at these lines on the left hand side of the plot you can see that the times are pretty similar in fact numpy might be running a little faster for very small input sizes this happens because today numpy's dispatch latency is much lower of the order of one microsecond so if your benchmark is dominated by dispatch latency numpy does a much better job today but however as the problem size grows so as you move towards the right hand side of the plot you see that as the flow starts becoming much faster um and towards the towards the right hand side which is like about one million input size it is you know seven x faster on this toy problem compared to numpy and gpu is of course much much faster so the summary is that you can take in existing code and without doing anything extra if you have gpus tensorflow magically leverages those gpus and makes a computation much faster and even on cpus you have some advantage because of tensorflow's highly optimized and multi-threaded kernels next we will talk in a little bit more detail about what is supported versus not and what flexibility numpy brings to the users so we already already have support for a large api surface of numpy around 200 api endpoints have already been added um and please check out the api documentation at the link to to find out more about what's supported what's currently not supported in terms of features is uh one big one is mutation so our ndis are currently immutable and this is something that we're going to be working on uh some of the d types like object and redirect area are not supported we also don't support for trend order of data views um also like numpy's apis either switch integration is not supported so having said that let's dive into some of the flexibility that numpy brings even to tensorflow users one big feature believe this will be useful is indexing so numpy's indexing is pretty powerful so besides the basic indexing as shown in this slide which is using single values adding new axes doing ranges and strides or using ellipses for going over a different number of dimensions um so that is basic indexing you can also do boolean indexing which is one of the indices could be a list of boolean values and in this case for example the second dimension says true false true and what that means is it will select the first and the third rows on the dimension it also supports advanced indexing which means indices can be tensors or sequences um so in this case we we show like a second dimension having a tuple and the third dimension having an nd and tensorflow and numpy defines semantics for how this works please check out the numpy indexing guide for more details vampire also brings in flexibility wire type promotion and its type inference is also different than what tensorflow natively supports today so for example numpy prefers wider types like n64 and float64 for converting lit fills to interiors also you can do things like adding an n64 with a float64 and it will be type promoted to float64. another powerful feature which is shared with tensorflow's shape broadcasting what that means is you can have inputs of different shape being passed to a function or an operator and numpy will define semantics for how to broadcast those inputs to a common shape and then applying the operator on it and the implementation would then leverage that to make it much more optimized instead of actually tiling the values so again check out the numpy dot casting guide for more details on these features so with this we'll talk about interoperation with numpy itself so notice i said like we support around 200 number api endpoints um however empire defines many more and so if as a user you want to leverage other functions are not currently supported you can do that uh in this example we show uh we start with creating a tensorflow india and then we pass it to a numpy function what does that do np dot sum y by put np dot sum which is calling it to an empire function uh with the tensorflow nd array how that works is when the function is called it will force the conversion from uh tensorflow and the array to a numpy entry array which might involve copying data um the function is called and then a numpy entry will be returned signal you can take this nd array now and pass it to a tensorflow numpy function so here we call tnp.sum um and that again triggers conversion from an important value to a tensorflow value because the function and returns a number in the array so this allows you to kind of take existing code and even if things are not supported you can at least get it working uh it might not work as well because of all these data copies but let's at least get your code running the semantics of the operators are a little bit more complicated so if you do like x plus y which is adding a tensorflow anti-array with a number in the array the semantics are defined by something called array priority so in so you can set this value when you're defining a new class um in our case we have defined conceptual numpy with a higher priority what that means is in this case the plus operator will be executed by tensorflow and so that will force a conversion from a why to a tensorflow in the array and the return value would be a tensorflow nd array as well you can also pass tensorflow and arrays to other apis that expect numpy inputs so here we show that you can call matplotlib histogram function with a tensorflow and array input and that works again via this conversion to numpy value seamlessly all right so next we'll jump into tensorflow interoperability so similar to numpy uh we can also start mixing and matching tensorflow numpy functions with tensorflow functions so here we import tensorflow first as stf uh we create a numpy nd array and then we call a tensorflow function tf.sigmoid um on this value and how that works again is this numpy india is converted to a tensorflow tensor and the return value is a tensorflow tensor as well this tensor can again be passed to a numpy function in this case dnp dot sum and again that that does a conversion to an nd array calls the function and the output is in the area as well operators similarly depend on the area priority currently tensorflow tensor has a higher priority which means the return value in this case would be tensorflow tensor as well i will note that compared to numpy here we don't trigger any data conversion data copies these copies um these these uh classes basically are 10 wrappers on each other so the conversion is zero copy now you can also do the conversion explicitly so if you you have a nd array you can convert it to a tensor by calling dot data property or you can take a tensor and convert it to numpy tensorflow and the array by calling as array and these conversion effect would happen without copying the underlying data so next we'll look at more examples of interoperability by working through an example so we'll show how you can take numpy functions and call different tensorflow functionalities so we'll start by looking at an input pipeline so you can call tensorflow's data set apis and then start using that with tensorflow numpy functions so in this case for example we pass the random we create random numbers using numpy api and then call from tensor slices on that we also in the map function we use the clip function that we have defined on these nd arrays and all of that works seamlessly and without uh data copies all over the place so this functionality is something that users do pretty commonly uh where they have input pipelines with numpy functions and they believe a lot of these could be converted to tensorflow numpy functions and run more efficiently we will work through another example where we take this input and then compute a very simple toy model on it and then compute gradients this demonstrates how gradients can work seamlessly through both tensorflow and numpy functions so we open the tensorflow gradient tape scope this is tensorflow's mechanism to define what code the gradients are computed to we do a watch and we'll go into details of these apis and then you'll notice that in the scope we now call what dnp and tf functions so we mix and match these functions um and again this all this works without overheads because the conversion is almost free and also the gradient node in this case can work through both these numpy functions as well as tensorflow functions so finally we call tape.gradient which computes the gradients through all of these function calls and return that to make it more interesting we will also show how to compute for example gradients and demonstrate tensorflow's iterative construct so tensorflow has both while loops and a higher level function called map function so how map function works is it takes in a function it applies it to each row of the input which is the second argument so we can use that to compute in this case for example gradients the way we do that is to take the gradient function we just defined and map it over all the rows of input x so given this this code now we can write idiomatic python control flow code now so in this case we show how to iterate over a data set and compute for example gradients for each element in the for each batch of elements and then given this now you can start applying optimizer rules to update your parameters so this is all great uh and we we anticipate that this code will be already faster than both faster than doing it in numpy and also it allows things like automatic gradients um through to your code so so we also will talk about how to make this code even faster so one of the tricks you can use is called trace compilation the way it works is by adding a decorator on any of the functions it triggers this machinery what it does is on on the first call to this function uh we will execute the code and the underlying machinery will observe what operations are being called uh it will take that trace it will compile it and store it and in subsequent calls it will invoke the trace instead of running the python code relatively um this gives tensorflow opportunity for applying different kind of optimizations like operator fusion and so on which provides a lot of speed up to the code another optimization um that provides a lot of speed up is auto vectorization the way it works is uh it takes a iterative code and it does a large rewrite on it so it will take the low body it will go over all the operations in the low body and replace them with operations of higher ranks and by doing that it can completely get rid of loops which provides large amount of speedups so in this case replacing map function with a vectorized map triggers this vectorization machinery and can provide large speedups to show how much speedups we can get we did again a small benchmark where we varied the batch size um and then measured the time taken um and this graph shows uh input size versus the time taken so the blue line is the original code with the map function and without completion the red plot is with compilation and the green plot is with both compilation and vectorization and you can see in notes that the y-axis is an unlock scale so notice that as the input size increases both compilation and vectorization provide huge amounts of speedups so to summarize we saw how to take a numpy code how to leverage tensorflow runtime to make it faster on gpus we also talked about you know some of the advantages that tesla brings in like auto differentiation um compilation with autobacterization and so on next prank is going to talk about how to add new numpy operations and then we'll also walk through some case studies of using this api uh hi everyone uh my name is pong uh i'll take you uh to peek into under the hood of vf numpy and in particular i'll show you how to add a new operation into tf numpy so adding new ops into tf number is the bread and the batter of tf numpy development the process of adding a new up is basically four steps so when you want to renew the add a new app up you have to read the official numpy dock for that op to comprehensively understand the behavior of blob and then you think a way think of a way to implement all the behaviors in python using tfops so you can look at the folder under numpy under numpy underscore ops for more information and we have a comprehensive uh test suite for numpy conformance that basically a test all combination of shape and d types so we have you need we need to run those tests to make sure that we conform to numpy in all the corner cases especially the types because d5 permission promotion are where most of the corner cases are so lastly this is a new uh compared to numpy so numpy doesn't allow incomplete ships but we want to support incomplete ships to be like uh more like a tf so we so there are some tricky aspects of how to handle incomplete shapes so i will show uh all this uh through three examples the first example is like the most the simplest the possible op that we can add so this is a bare bone up so as uh ashish mentioned we defined our own nd array class and it is just a thin wrapper around the tf tensor so if you are adding up that ha direct directly corresponds to a tf of like cosine then the work to do is really very simple uh you take the tf tensor out of the uh and the array and then you give the tensor to the corresponding tf up and when you get back the result tensor you apply the wrapper again to turn it back into nd rig and that's it so a little more uh extra here uh because we want to accept any array like arguments we need to convert that argument into the array first and secondly we have this uh decreator called npdoc so this decorator does two things firstly it copies the box string of the uh original numpy op or a link to the docs string into the document of this new python function and if you if you have extra dot string in the in your new uh function that it it will be appended to the official numpy dock string and secondly this uh uh np dock also does tf export which export this new symbol under the tf.experimental.numpy namespace that's our first example our second example is a little bit more uh complicated well the only complication is that now we have more than one arguments and as ashish mentioned before uh unlike tf in kf numpy we support uh the type promotion so we have to promote all the related arguments to the common d type so we have our own promote d type utility function and internally it calls the official numpad dot result type function so our type of motion rule are exactly the same as numpy type promotion data promotion rule so one technical challenge here is that this works for most of the time but in some corner cases especially when the arguments are native numpy types like nato python types like python integers or python floats numpy result type uh when it does data promotion it can be value sensitive it can decide the detail based on the actual value of the uh of argument for example if the argument is a python integer one it may choose like a small integer d type but if it's like a very large integer it may choose another integer d type that is fine in eager mode but we also support graph mode the usage within tf function and in that case x1 x2 may be symbolic tensor so the value is unavailable so in that case we actually we really can't do anything uh we uh instead of uh using the x1 x2 we basically give x x1 dot d type and x2 dot d type to npu result type so that may result in different type promotion results than numpy but those are really just the corner cases when x1 and x2 are python native values our last example uh is for showing the complication of uh handling of incomplete ships so in some of our ops uh because of the behavior of uh official numpy we need to or we want to do totally different computations based on the shape and the rank so in as a simplified uh python version we can just do a python branching on the shape and ramp but if we want to support incomplete shapes for arguments then there are some challenges firstly uh when we want to handle incomplete shape we shouldn't use a tensor.shape.rank or tensor.shape instead we should use tf.rank and tf.shape which will return the shape and rank as a tensor dynamic tensor instead of a static python value but then because the shapes are dynamic then we cannot use python branching we need to use tf.com so that's one requirement that requires to use tf.com but on the on the same time there is another requirement so if all the shapes of the inputs are already complete then many you numpy users expect the results to also be complete so uh there are two motivations for this first is that this is what most uh numpy users expect second is that in one of our use case which is tracks tracks actually requires this uh shape to do this shape calculation in their in their model initializing phase so they need to to have all the ops used by their model to completely to successfully propagate complete shapes from input to output so we need to support this but this uh uh this is a problem for tf count because uh the different branches in our tf count are doing totally different computations so tf count cannot infer a complete result shape from those different computations even if all the argument shapes are complete so in conclusion we cannot use tf dot count in this case so we have two contradicting requirements the solution is that we need to constant fold away because we are only dispatching according to shape and the shapes are already known for all arguments so actually the conditions in gravity of count are already already statically known at this time so we just need to uh leverage this fact and just eliminate count because the condition is already known the concrete solution is that uh we have our own version of tf cons which we call utils.com that uh does a tf get static value on its condition to extract the static value and then pick the true true branch or else branch uh as graph comp uh at graph building time but the way uh the problem we find is that the supports the cover the coverage of tf gets static value is not very great it doesn't support many common ops like add or greater or logical or so when we use those ops we need to use our own versions of those ops in our condition that more eagerly do constant folding within the op so like for add it will do that static value on the two arguments of add and if both of them are static then it returns a static passing value all right that's all for our examples for adding a python op then we'll talk about uh some case studies uh we have for using tf numpy so we'll talk about two case studies the first is uh jax so uh there are a lot of existing jax codes written by researchers so the goal is that we want to try to export those jacks code or jax library onto tensorflow for them to enjoy the ecosystem of tensorflow but the problem we have is that jax is not just a tf numpad it uses uh more apis that's beyond tf numpy so our solution is that outside of tm pi we have a tf numpy extension library that provides the a set of jack specific ips those apis include the famous uh function transformers injects like jit vgp vmap and some uh facilities for distributed training like pmap and psalm and also some mathematical functions that's outside of the outside of numpy like convolution and pulling and also a set of stateless oranges that's not part of numpy so uh we successfully uh done this parting for jack's unit test for numpy ops and also the stacks library which is a very simple layers library with injects and we also done it for the neuron tendons model our second case study is tracks tracks is the next generation of tensor to tensor library so it's a framework for state-of-the-art research for sequence to sequence modeling it's created and maintained by the inver inventors of transformer reformer and many famous nlp models so it was initially implemented on jax but now we have already exported it to tf numpy so users of tracks can easily do their tracks training tensorflow so uh so there's a difference between the other parting work so instead of doing just a one-off parting of the existing existing tracks code we actually move the tracks into a multi-back-end architecture so that the user can seamlessly switching between jack's backhand and the tf numpy backhand and even just the vanilla numpy backhand so they have a very great flexibility uh when they train their model so the reason we can achieve this is that tracks only uses a well-defined api surface and those apis are supported by both jacks and the tf numpy so in order to do to switch backhand there are two ways you can do it in command line by just giving one extra command line argument or you can do it more locally within the code and more programmatically so track interact there is a fast mass model and the first module first mass model have a used backend python scope manager so when you open a python scope using use backend all the code within that scope will use the backend you specified in used backend i also i want to mention we also have a tracks to keras converter inside tracks so that you can convert your tracks models into uh keras models and we also support save save the model so you can save your tracks model into a save the model and then later load it into tf equal system so uh that's all for my talk so if you want to learn more about tf numpy you can follow those links for further reading especially please look at the comprehensive guide we have for tf numpy and you can also look at the api documentation for specific specific documents for each empire op we also have some collapse so we have one collab that use a simple multi-layer model to train e-mail classification i'm list but it is a distributed training examples if it uses multiple gpus we also have a code app that shows you how to how you can use tf numpy along with keras and distribution strategy together also there is a tracks collab that shows how you can use tracks but with the tf numpy at backend uh that's all for our training talk today thank you [Music] you
Original Description
In this episode of Inside TensorFlow, Software Engineers Ashish Agarwal and Peng Wang present TensorFlow NumPy. Ashish and Peng will discuss how you can accelerate NumPy using TensorFlow, and bring all the benefits of the TensorFlow ecosystem to NumPy!
NumPy API on TensorFlow → https://goo.gle/2TMDi0Q
Module: tf.experimental.numpy → https://goo.gle/2JvuDhi
GitHub → https://goo.gle/3eiLUWg
Add the Inside TensorFlow playlist → https://goo.gle/Inside-TensorFlow
Subscribe to the TensorFlow channel → https://goo.gle/TensorFlow
#InsideTensorFlow #TFNumPy #TensorFlow
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from TensorFlow · TensorFlow · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
The TensorFlow YouTube Channel is Here!
TensorFlow
Answering Your TF Questions #AskTensorFlow
TensorFlow
Chatting With the TensorFlow Community (TensorFlow Meets)
TensorFlow
All About TensorFlow Code (Coding TensorFlow)
TensorFlow
TensorFlow: an ML platform for solving impactful and challenging problems
TensorFlow
Keynote (TensorFlow Dev Summit 2018)
TensorFlow
tf.data: Fast, flexible, and easy-to-use input pipelines (TensorFlow Dev Summit 2018)
TensorFlow
Eager Execution (TensorFlow Dev Summit 2018)
TensorFlow
Machine Learning in JavaScript (TensorFlow Dev Summit 2018)
TensorFlow
Training Performance: A user’s guide to converge faster (TensorFlow Dev Summit 2018)
TensorFlow
The Practitioner's Guide with TF High Level APIs (TensorFlow Dev Summit 2018)
TensorFlow
Distributed TensorFlow (TensorFlow Dev Summit 2018)
TensorFlow
Debugging TensorFlow with TensorBoard plugins (TensorFlow Dev Summit 2018)
TensorFlow
TensorFlow Lite (TensorFlow Dev Summit 2018)
TensorFlow
Searching Over Ideas (TensorFlow Dev Summit 2018)
TensorFlow
Reconstructing Fusion Plasmas (TensorFlow Dev Summit 2018)
TensorFlow
Nucleus: TensorFlow toolkit for Genomics (TensorFlow Dev Summit 2018)
TensorFlow
Open Source Collaboration (TensorFlow Dev Summit 2018)
TensorFlow
Swift for TensorFlow - TFiwS (TensorFlow Dev Summit 2018)
TensorFlow
TensorFlow Hub (TensorFlow Dev Summit 2018)
TensorFlow
Applied AI at The Coca-Cola Company (TensorFlow Dev Summit 2018)
TensorFlow
Real-World Robot Learning (TensorFlow Dev Summit 2018)
TensorFlow
TensorFlow Extended (TFX) (TensorFlow Dev Summit 2018)
TensorFlow
Project Magenta (TensorFlow Dev Summit 2018)
TensorFlow
TensorFlow Dev Summit 2018 - Livestream
TensorFlow
Introducing TensorFlow Lite (Coding TensorFlow)
TensorFlow
TensorFlow Dev Summit 2018 Highlights
TensorFlow
Jeff Dean, Head of AI at Google discusses the impact of ML (TensorFlow Meets)
TensorFlow
TensorFlow Mobile vs. TF Lite and More! #AskTensorFlow
TensorFlow
Using TensorFlow to enable research & production across many fields (TensorFlow Meets)
TensorFlow
Teaching TensorFlow for Deep Learning at Stanford University (TensorFlow Meets)
TensorFlow
TensorFlow Lite for Android (Coding TensorFlow)
TensorFlow
Using the tf.data API to build input pipelines (TensorFlow Meets)
TensorFlow
Training Models in the Cloud & the Benefits of AI Toolkits #AskTensorFlow
TensorFlow
Execute operations immediately with TensorFlow's Eager Execution (TensorFlow Meets)
TensorFlow
TensorFlow Lite for iOS (Coding TensorFlow)
TensorFlow
Get started with TensorFlow's High-Level APIs (Google I/O '18)
TensorFlow
TensorFlow for JavaScript (Google I/O '18)
TensorFlow
TensorFlow in production: TF Extended, TF Hub, and TF Serving (Google I/O '18)
TensorFlow
Get started with TensorFlow's High-Level APIs in 5 mins | Google I/O 2018
TensorFlow
TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)
TensorFlow
TensorFlow Lite for mobile developers (Google I/O '18)
TensorFlow
Advances in machine learning and TensorFlow (Google I/O '18)
TensorFlow
Distributed TensorFlow training (Google I/O '18)
TensorFlow
Classification using neural networks & ML regression models #AskTensorFlow
TensorFlow
TensorFlow and Keras in R - Josh Gordon meets with J.J. Allaire (TensorFlow Meets)
TensorFlow
Focus on your experiment with TensorFlow Estimators (TensorFlow Meets)
TensorFlow
How to get started with AI/ML, retraining models, & more! #AskTensorFlow
TensorFlow
TensorFlow - the deep learning solution for mobile platforms (TensorFlow Meets)
TensorFlow
MiniGo: TensorFlow Meets Andrew Jackson (TensorFlow Meets)
TensorFlow
The growth of TensorFlow with added support for JS & Swift (TensorFlow Meets)
TensorFlow
At the intersection of TensorFlow & nuclear physics (TensorFlow Meets)
TensorFlow
NVidia TensorRT: high-performance deep learning inference accelerator (TensorFlow Meets)
TensorFlow
Try TensorFlow.js in your browser (Coding TensorFlow)
TensorFlow
TensorFlow Hub: reusing machine learning modules (TensorFlow Meets)
TensorFlow
How to use TensorFlow in PyCharm (TensorFlow Tip of the Week)
TensorFlow
Training models faster with TensorFlow Hub (TensorFlow Meets)
TensorFlow
Prepare your dataset for machine learning (Coding TensorFlow)
TensorFlow
Using ML to predict insulin use for Type 1 Diabetes (TensorFlow Meets)
TensorFlow
TFX: an end-to-end machine learning platform for TensorFlow (TensorFlow Meets)
TensorFlow
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI