Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Weights & Biases · Beginner ·🔢 Mathematical Foundations ·6y ago

Skills: ML Maths Basics90%Supervised Learning60%Unsupervised Learning50%

Key Takeaways

The video introduces the Fréchet derivative, a lesser-known style for doing vector calculus that helps simplify calculations and reduce pain, and explores its applications in machine learning, linear algebra, and calculus.

Full Transcript

what we're going to talk about today is a sort of different approach to doing in particular vector calculus based off of something called the free shake derivative and the vector calculus is something that is at the center of machine learning so for one it's how we get our exact solutions for linear regression if you've ever done the normal equations in a stats class to figure out what the values of the weights are for your parameters in linear regression you people use vector calculus to come up with that answer when we calculate eigen vectors and eigen values we're using methods for vector calculus to determine what algorithms work and how well they work but maybe a little bit less basic and more central to the contemporary deep deep learning methods gradient descent is based off of gradients which are derivatives of effective functions that take in vectors back propagation as an algorithm it uses a whole bunch of different ideas from vector calculus these are really core ideas in deep learning but vector calculus is something it's you know it's essentially the same thing as what people would call multivariable calculus in at least the American high school system it's it's something that people have a lot of sort of like fear of or dislike for and I think the reason why is that vector calculus combines two things that people are a little sketchy on linear algebra and calculus and part of the reason why people are sketchy about linear algebra and calculus is that people explain both of those things the wrong way you'll hear people say that linear algebra is an algebra for solving equations with vectors and matrices instead of just numbers and they'll say that calculus is the study of methods for mathematically representing rates of change and areas and while these are definitely true statements and they give deep insight to these two mathematical disciplines they're not really the right way to think about either of these things when it comes to doing machine work the former uses a lot of intuition that people develop as pure mathematicians and the latter uses a lot of intuition that people develop as physicists or engineers but most people doing machine learning aren't either of those things maybe one of the common tool kits is more software engineering than hardware engineering signal processing things like that and so there are better ways to understand these two general mathematical ideas that make use of concepts that are more more close to the real houses of folks who are doing machine learning software development and software engineering everyday so linear algebra can be thought of as the study of functions that can be represented by arrays and these functions are also known as linear maps so at the last deep learning salon I about this approach to linear algebra this is a link there to the video on that so the question is is there a better way to understand calculus in the same way and the answer is yes if we think of calculus and in particular the derivative calculus which is what we mostly need for machine learning it's really about methods for linearly approximating functions so the there's this deep connection between what happens in linear algebra and what happens in calculus that isn't really taught when people take especially when people teach single variable calculus but sort of unifies this two sets of ideas and connects the things that we end up using all over the place in our end machine learning and deep learning so the way that we're gonna we're gonna approach this by actually defining the derivative in a different way and that's going to end up making our vector calculus easier so the name of this style of defining the derivative is called the fresh a derivative after a French mathematician who invented it and well it's only going to give us the you know it's gonna give us the same answers as what we learned in our calculus classes but it's gonna give us a new way of thinking and there are three major benefits to thinking this way one is that we're gonna end with a single sort of style a single definition a single method of like proof and of understanding that works for gradients of single variable vector and even matrix functions and even better this extends all the way into the heady realms of functional calculus it's not where what we going today but it's cool to know that this this single idea can go from the kinds of calculus that you know one can learn as a high school student up to this sort of most complicated kinds of calculus if you do in many problems and this was where I got the title for my talk all the indices disappear from our calculations we no longer have to juggle lots of eyes and J's and KS telling us which part of which matrix or which vector belongs to which which component of the of the gradient or the derivative everything will be done in a completely indexed free manner is less cluttered it's easier to follow it shortens things and for me it's it's just simpler and then lastly we're going to be using the little o notation instead of limits which is sort of akin to the Big O notation from computer science and this is something that maybe people have a bit better intuition about from having had to understand some of these things about how quickly and certain algorithms run so Tom stock had some big o statements about how long algorithms would take and but none of the talks so far of including limits if that gives you a sense for which of these kinds of things are more common in the toolkits and the end the practice of machine learning and software engineering so we're gonna spend most of our time talking just about how this gives us a way to do calculus on single variable functions but we will in the end sort of point to how it gets used on for vector calculus and basically this talk is kind of a teaser for the sequence of blog post where I go through how to actually use this set of techniques line by line every single manipulation like carefully explained and so if you want to get get back you'll have to check out that that sequence of blog posts I didn't want to be doing math in front of people for half an hour so the links for that will be at the end so a single variable function what I mean by that when we do calculus we're typically thinking about functions that take in and put out real numbers but real numbers aren't real at least for computers there's no real numbers inside computers so anytime that I say anything about real numbers you might substitute floating-point numbers instead so we're thinking about functions that take in one single floating-point number and put out another one so squaring adding two multiplying by five these are all the kinds of functions that we might be thinking about so the standard definition of a derivative that one learns in an in a calculus class is that the derivative is the limit of a certain ratio so it's the ratio between the change in output and the change in input as the change in input gets smaller so that limit there means make this change smaller make the this epsilon value go to zero and we say that the the limit of that ratio is the derivative so we're specifically defining the derivative at a point by means of a limit and I don't know about you but when I took calculus classes that was basically the moment for me when I like lost track of math it took me a long time to sort of come back around and get back to understanding and enjoying math the way I did before fabulous and so I have a preference for doing things with as few limits involved as possible so the free shada definition gives you the same results but it centers the concepts of approximation and linearity rather than limits and ratios so what the free shape definition says is that if the change in the value of the output at a point epsilon away is some function of x times epsilon plus something smaller we call that function the derivative so it's weird about this definition is that the thing we're defining isn't like a loan on one side of the equation the way we normally do so it's the sort of like operational definition rather than rather than a definition just in the usual sense and and the second thing that's that's sort of different about this definition is that instead of the limits appearing explicitly we've sort of hidden little way in this last little term highlighted in purple the little o of Epsilon we'll talk a lot about what exactly that term means but the nice thing is that very you know sort of loosely and intuitively it just means something smaller than Epsilon and so all the gnarliness with limits all the complexities of calculating this thing just sort of stuffed away inside this little o and we do most of our work in calculating our ingredients our derivatives by just doing a little bit of regular old algebra or linear algebra on all the rest of the stuff so then the so then finally so that's the centering of approximation the other thing is that what we have here is an equation for a line so the other way of sort of stating that definition is that if I can always approximate the behavior of F near X with a line passing through f of X with slope F prime of X we call the function that gives us the slope the derivative so this this our approximation of our function f which could be wobbling and wiggling all over the place it could be sine of X or cosine of X it could be something something exotic like the loss function of a neural network but we're always going to pretend that it's just a line and say we're going to approximate it with a line and we want the sort of best line to approximate it and the slope of that line is what we call the derivative so the key idea the key ideas are that we are going to approximate the function and the way we're going to do it is with a line and these are basically the important ideas about derivatives for machine learning especially deep learning so say like I just want to update my models parameters in order to make it perform better we have all learned lots of ways to do that but you know let's pretend a brief tabula rasa available big nerds moment how do I do that one way is that I could choose my new parameters such that my best linear guess says that the performance will be better right so so it's really hard to know what the loss is going to be at tens of thousands of values of the parameters right like it can be expensive han was pointing out that evaluating the loss for a transformer networking pretty quickly gets it out of memory errors so what we're gonna do is that we're gonna try and come up with a best linear guess that tells us what the how the function is going to behave at a different point and that's what we're gonna use to update our parameters it's not gonna be perfect in fact we know that as we move further and further away it gets worse that's another thing that this little o of epsilon tells us it says that the gap between these two things is going to get bigger at about the same rate that that value epsilon gets bigger so we don't want to change it too much that's the last piece that this that we can read off from this equation and this leads us directly to gradient descent gradient descent says that our new parameters on the left hand side of that assignment arrow at the bottom or our previous parameters plus some small number times that linear approximation the evaluated at our current value of the parameters so what we want to use our definition for is to figure out from knowledge of what a function is we want to know what is that functions derivative so the way we do it is we write out the left-hand side of our definition for the function we're interested in so the left-hand side of our our function definition was f of X plus Epsilon so if we consider the function x squared we would just be writing out X plus epsilon squared and X plus epsilon squared that's you know that's a binomial right if I were to multiply that out it's you know X plus epsilon times X plus Epsilon and so what we'll get out of that is we'll get an x squared and an epsilon squared and then we'll get one x times epsilon and one epsilon times X right so first outer inner last is how people talk about in the American school system so combining those two x times Epsilon well we've what we see is that X plus epsilon squared is equal to x squared plus 2x epsilon plus epsilon squared and then the next step is that we pattern match the right-hand sides so pattern matching is something that people are maybe more comfortable doing with from from computer science from programming functional programming languages elevate pattern matching to actually sort of the way some functions are defined so pattern matching is a is a sort of key skill in in computer science so we pattern match our two right hand sides so down here I've got I've got f of X plus something that's a function of x times epsilon plus something that's smaller than epsilon and those are exactly the three terms in that equation at the top x squared plus 2x that's a function of x times epsilon plus epsilon squared and for small things something squared is smaller than something so if I'm less than one then squaring makes me smaller so epsilon squared is also something smaller than Epsilon so little o of Epsilon so now that tells us that the derivative of x squared is 2x so so far you know it's so simple like x squared that's a function lots of people don't have that much trouble understanding the derivative of so we'll see it a little bit that this same pattern hat matching technique also applies when comes to vector calculus and in that case it's much easier to much easier to do so I want to talk a little bit about this little note turn and what what it means in part because hopefully this will help build your intuition and understanding for Big O every you know pretty often when I have to think about what go really means I have to go back to Wikipedia back to a whiteboard think about it for a while so hopefully this will help speed that process up the next time you have to do that so effectively the term is little o of epsilon if it gets smaller much faster than epsilon does it's like a fancy notion of what it means to be less than right one number is less than another number pretty easy to figure out this is a way of talking about how one function is in some sense less than another function so the technical definition is right there in the middle of the slide if as X gets small the ratio of these two functions goes to zero then we can replace f of X with little o of G of X which says f of X is less than G of X in this sense this means that for a small enough epsilon anything little of epsilon can be safely ignored so this is what we mean when we say that this derivative allows us to approximate the original function and so this is pretty close to the core idea of what limits were used for in the definition of the derivative but now we've sort of separated it out from the part where we calculate the derivative and put this on just the part where we figure out how bad our approximation is and it's a really nice sort of separation of concerns it's a way of abstracting away how bad our approximation is so it gives a couple of examples zero is gonna be smaller than X x squared is smaller than X and any number times x squared is also smaller than X so what that means is anytime we see any of these terms anything that looks like this in an equation that we're writing we can replace it with little o of X and continue so some terms that are not o of X one that's not that's something that doesn't get smaller at all and so it doesn't get smaller faster than X gets smaller X does not get smaller faster than X because they're the same so the ratio of those two things is always 1 and so that's the term that's not little of X if I multiply X by something the ratio of K X and X is going to be K that's also not 0 and so that term is not obex so there's lots of rules for manipulating these terms and I find them a lot easier to remember and to use then the whole cavalcade of complicated rules for calculating planets so we sort of pack away all of those rules into a is sort of like calculation strategy for working with little o terms and it pays a lot of dividends it makes a lot of derivative calculations much easier so it's similar in spirit to Big O right so Big O notation from pewter science let's just talk about how fast algorithms are well abstract away any irrelevant details so for example maybe we've got n data values and we want to check if a query is one of them if the data is in a list searching it will take time proportional to the length on the other hand if the data is in a binary tree famously searching it will take time proportional to the logarithm and in order to in order to find the data so if what we want to do with our data is search it a binary tree is better because o log n is faster than om now there's lots and lots of important details when it comes to actually implementing these algorithms to figuring out exactly how fast each one is but just generally if we're working with a really big number we know that the thing that is o log n is almost always going to be smaller than the thing that is o n so it saves us a lot of effort and work and cognitive load to use this Big O notation to think about things rather than trying to calculate the exact speeds and the exact number of operations that something-something cakes so it lets us abstract away those are all in details and talk about which algorithm is fastest avoiding those avoiding really specifying them similarly little o let's just talk about the best linear approximation while abstracting the way exactly how good or bad that approximation is because that's often a lot harder to really understand I can say that this thing is best without knowing how much better it is then then other choices or how much worse it is than the than the best possible choice so if you want to read more about these the term bizarre general term for all these ways of turning limits into these similar expressions they're called Landau symbols so I would I would definitely check that out there's lots of good tutorial material online about them okay so now that we understand how this free shave style of derivative lets us do the kinds of dirty one does in a 80 year BC sort of intro calculus class we now apply them to functions of factors and we can define the gradient in exactly the same way so when I say that something is a function of a vector for folks who do pure math I'm talking about something that takes in an element of RN a real valued vector and returns a real value and then the gradient is a function that takes in a vector but for folks who are doing machine learning what I'm thinking of is a function that takes in an array of floats and returns a single flip and then the gradient is something that takes in an array of floats and returns array of floats at the same shape so again the free Schade definition is going to Center approximation and linearity and what it says is that if the change in the value of the output at a point epsilon away is now the inner product of some function of X with epsilon plus something smaller we call that function the gradient we call it the vector derivative or just the derivative but people like to call it the gradient especially machine learning so the only real difference between this definition and the first one is that instead of taking the slope or the you know the value of the of the derivative and multiplying it by Epsilon we're taking this inner product or dot product or scalar product depending on your discipline between those two things so this inner product can be written many many ways I think the most important alternative way to write the linear product is in terms of matrix multiplication so when we write it that way instead of writing those two angled brackets we write vector transpose other vectors so the gradient is a vector epsilon is a vector and the way we multiply one vector with another is by taking the transpose of one and then doing matrix multiplication between the between the two of them so the port so the important things to note are that this reduces to the derivative for vectors of length one right so this inner product here this vector transpose is the sum of the products of the elements of the vectors go into each do like a little four loop over each vector the vectors if together and multiply those values so if there's only one thing then it's just multiply the two values together and that's exactly what we did when we did the scalar valued derivative so this is basically the same definition but now we've gone up one level and defined it in a way that works for both vectors and scalars and then sorry state I think I didn't mention the scalar is the term people use in linear algebra for just a single number the second term in this guy the the gradient transpose epsilon that's a linear function of epsilon for a fixed X so what I'm doing is I'm just multiplying these these things and adding them together that function is always going to be linear it defines effectively a a plane or a hyper plane which is the higher dimensional equivalent of a line so this is just like how we had the notion of the derivative as a line in one dimensions we have the notion of the derivative as like a plane or a hyper plane a linear function that approximates the original function so here's how we end up using our definition it looks exactly the same way that we that are the way we used our definition for the scalar derivative look so if we want to know what the derivative of this the l2 norm the squared norm people use this for regularization and machine learning gradients get calculated on it maybe you wonder what that gradient was what we do is we multiply it out the squared norm looks just like the the X plus epsilon squared that we had before we get one term that's the squared norm of X another term that's squared norm of epsilon and then a term in the middle that's basically two times x times epsilon so when we do things this way the connections between scalar calculus vector calculus and calculus with matrices become really really clear because we're using this sort of like same symbols and same tools for all of our four each one of these these subcategories sub-disciplines of calculus and again we just pattern match the right hand sides so we look for we look to make sure that there's one term that looks like the original function blue there's one term that looked smaller than Epsilon so that's a norman that smaller than the norm of x on so norm of epsilon squared again squaring makes things smaller so that guy's the guy in purple is our little o of norm APSA long term and then the then the middle term that's the term that gives us our gradient so it says that the gradient of the squared norm is 2x notice that there's a transpose built into that middle term so we drop the transpose when we decide what the vector is it's really easy to see with vectors because the gradient needs to be the same shape as the inputs or else we couldn't add them together when doing gradient descent but with matrices that's an important one that you're gonna need to watch out for the other modal gotcha is that people will often like to just write mu lo of Epsilon without the two the two straight lines around it that indicate the norm just as a notational convenience when writing little own stuff so when you're go out there and look for other people using the appreciate derivative style you might notice that they do that a little bit differently than what I do here so lastly we can use this to do derivatives of functions of matrices and the definition is exactly the same and this is I think the real killer act for the freaky derivative I found that every other approach just falls apart it's much more complicated really difficult to follow it comes to functions and matrices and I really suffered through trying to calculate derivatives for things like linear regression when these should be things that we can do pretty quickly and easily and the fresh a derivative lets us do that so it's literally the exact same definition but now with capital letters because people like to use capital letters when they're working so again we look at the value at a cop at a point epsilon away capital Epsilon looks like an e sorry I didn't come up with the Greek language but capital it's O so we look at the value at a point epsilon away it's now the inner product of some function of X with e plus something smaller and again we call that function the gradient so the only thing that's different here is we now need to know how to do an inner product and calculate a norm of a matrix which maybe people are a little bit less familiar with and they are with inner products of matrices I'm sorry abductors right the inner product infectors already showed up actually in Hans talk right with that cosine similarity the cosine similarity is just the normalized version of the dot product so what is the inner product of two matrices what is the norm of a matrix so there's this particular definition in terms of traces that you can find if you look online but the way to get that definition that I really like is that actually the inner product of two matrices is just turn each matrix into a vector and then calculate that inner product so that's this last line here it says vectorize X that's literally like flatten right in in numpy or in tensor flow or whatever your favorite tensor library is just flatten them out and then do the dot product between those two it blows my mind that this actually works and gives you a really good answer for what it means to do the inner product or two matrices but that's how it's done it turns out that the way that's easier to work with when we're doing our algebra when we're calculating stuff with the fresh a derivative is this thing in terms of the trace if you want more details about how me calculate matrix valued three bins and how we understand with this sort of maybe arcane and strange trace based expression through the inner product is check out the blog post that is linked in the corner of this slide and should show up in the chat about sort of why this is such a sensible way to calculate the lengths of matrices it turns out to be really closely related to things like so hope so I don't have an example of calculating matrix value derivative because you know maybe you should always leave people wanting a little bit more so if you want to see how to do it and and get some really cool examples you'll need to check out the series of blog posts but hopefully I've convinced you that these the these benefits are there and are real for thinking of vector and matrix calculus in terms of a derivative that we can write one definition one single sort of style of doing calculations that works for gradients of all kinds of functions that you may you'll notice that basically indices disappeared I used a few and in calculating those inner products but that was just to define those things don't need to use any indices at all and we kept the limits to a minimum by hiding them essentially inside our little o notation so that's those are I think the sort of three big benefits of using this ratio derivative style so the blog post series is linked at the top there so there's sort of three plus one blog posts I had initially intended to write just this linear regression one and one on deep linear neural networks which are really near and dear to my heart neural networks with no nonlinearities they're nice because you can do math on them even though they can't like solve an internet or anything like that but then while I was writing this blog post literally in the middle of it I came across another example just coincidentally where calculating the derivative the determinant it turns out to be much much easier in this in this way and it's something that you do sometimes I have to calculate I have another blog post on gaussians as exponential families this is an idea that you know was bigger in vogue when people like graphical models and things like that but there's gaussians the normal distribution multivariate normal have this really really elegant connection to information geometry and these very cool heady ideas but the way that this connection is often present is really hard to follow because there's a lot of vector calculus so I went ahead and calculated these things in the free Shay style because I couldn't find to be else doing it that way found that it just press things so much more so much shorter so much easier to easier to write down and it really emphasized what was important and unique in that case and not all the machinery and mechanics of calculating these these gradients and also a shout-out to the Terry Taos blog Terry Chows one of you know the greatest living mathematicians and if you need another endorsement for using the fresh a derivative that's the style he does in almost all of his calculations of derivatives and gradients and things so that's where I first came across this style and it's me doing calculus for neural networks for linear algebra for lots of things that I'm interested in just so much so much easier and it's really deepened my understanding of both linear algebra and calculus so that's all I have hopefully there are questions but if not if you can't think of any questions now feel free to hit me up via email or on on Twitter thanks a lot Thank You Charles he cactus a lot less or a little bit less if I'm being honest so thank you for that we have a few questions so Rachel really liked your talk he's a beautiful attack he was wondering if you have any other recommendations to go deeper into these topics in the way that you are teaching and I posted all of the links from your slides in the chat already but do you have any more yeah so I think the there are links in those blog posts to other places where I had come across these ideas I don't have them immediately on the top of my head so I guess yeah you could just send me your email I'll make sure to contact you specifically with some additional resources once I have a chance to look at it and then someone asked are lambdas the famous functional calculus lambda is the same as functional calculus so you're talking I'm guessing that this is about the lambda calculus for for as a model computation so you wanted to do calculus on lambda expressions you would need to do functional calculus but lambda calculus and functional calculus are different things and in the last question someone asked Venice Frasier's method applied in practice does it give the exact same results as the traditional methods and how can you verify this yeah so it gives the exhibits are exactly the same for anything except functional calculus basically once you get to the functional calculus it turns out that you need a little bit of extra machinery and there's the Frazier derivative and the gateaux derivative are the two ways people do it and those get slightly different answers so so you could prove that they give you the same answer so the you know that would be something you'd have to go about doing and I would say it's used I found that a lot of people who are you know doing stuff like developing automatic differentiation libraries use an approach much closer to this one and they than the like traditional populist approach so one person would be Michael Betancourt who goes Stan who which is a Bayesian inference library that also has automatic differentiation just like tensor flow and his style is a little bit different than this one but it also is centers approximation and linearity and things like that so so I don't know if you would explicitly say he does the fresh a derivative but it's the same same set of ideas I saw actually a question about from beaudion asking shouldn't it be little o of epsilon squared and this is actually an interesting point so I didn't get a chance to mention this but little o is kind of like a less than sign and people are usually used to working with Big O which is like a less than or equal to sign and so with little o the powers are a little bit different than they are with Big O so if you were if you were surprised that it wasn't an epsilon squared is he used to that way of writing Taylor expansions then that's the reason why because there is this tiny difference and I just slightly prefer it but you can do the whole thing with Big O instead

Original Description

Vector calculus is at the center of many ML methods, but it's an unfortunate source of agonizing pain. In this talk, Charles will introduce a lesser-known style for doing vector calculus based on the Fréchet derivative that helps keep the pain to a dull throb. Slides: http://wandb.me/2020-04-14-salon-frechet Blog post series link: http://wandb.me/blog1 Blog posts referenced in the presentation: https://charlesfrye.github.io/math/2018/02/28/how-big-is-a-matrix.html https://charlesfrye.github.io/stats/2019/07/05/gaussian-log-linear.html https://terrytao.wordpress.com/2013/01/13/matrix-identities-as-derivatives-of-determinant-identities/ The content of this and related lectures has now been packaged into a short course, "Math for Machine Learning": http://wandb.me/m4ml-videos Charles Frye (he/him/his) is a researcher studying neural network optimization at the Redwood Center for Theoretical Neuroscience at the University of California, Berkeley and a deep learning instructor at Weights & Biases. 👩🏼‍🚀Weights and Biases: We’re always free for academics and open source projects. Email carey@wandb.com with any questions or feature suggestions. - Blog: https://www.wandb.com/articles - Gallery: See what you can create with W&B - https://wandb.ai/fully-connected - Continue the conversation on our slack community - http://wandb.me/fs

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 52 of 60

← Previous Next →

0. What is machine learning?

0. What is machine learning?

Weights & Biases

1. Build Your First Machine Learning Model

1. Build Your First Machine Learning Model

Weights & Biases

Intro to ML: Course Overview

Intro to ML: Course Overview

Weights & Biases

2. Multi-Layer Perceptrons

2. Multi-Layer Perceptrons

Weights & Biases

3. Convolutional Neural Networks

3. Convolutional Neural Networks

Weights & Biases

Weights & Biases at OpenAI

Weights & Biases at OpenAI

Weights & Biases

Why Experiment Tracking is Crucial to OpenAI

Why Experiment Tracking is Crucial to OpenAI

Weights & Biases

4. Autoencoders

4. Autoencoders

Weights & Biases

5. Sentiment Analysis

5. Sentiment Analysis

Weights & Biases

6. Recurrent Neural Networks [RNNs]

6. Recurrent Neural Networks [RNNs]

Weights & Biases

7. Text Generation using LSTMs and GRUs

7. Text Generation using LSTMs and GRUs

Weights & Biases

8. Text Classification Using Convolutional Neural Networks

8. Text Classification Using Convolutional Neural Networks

Weights & Biases

9. Hybrid LSTMs [Long Short-Term Memory]

9. Hybrid LSTMs [Long Short-Term Memory]

Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Weights & Biases

Weights and Biases - Developer Tools for Deep Learning

Weights and Biases - Developer Tools for Deep Learning

Weights & Biases

Introducing Weights & Biases

Introducing Weights & Biases

Weights & Biases

10. Seq2Seq Models

10. Seq2Seq Models

Weights & Biases

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

Weights & Biases

12. One-shot learning for teaching neural networks to classify objects never seen before

12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Weights & Biases

14. Data Augmentation | Keras

14. Data Augmentation | Keras

Weights & Biases

15. Batch Size and Learning Rate in CNNs

15. Batch Size and Learning Rate in CNNs

Weights & Biases

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Weights & Biases

Grading Rubric for AI Applications with Sergey Karayev (2019)

Grading Rubric for AI Applications with Sergey Karayev (2019)

Weights & Biases

16. Video Frame Prediction using CNNs and LSTMs (2019)

16. Video Frame Prediction using CNNs and LSTMs (2019)

Weights & Biases

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases

17. Build and Deploy an Emotion Classifier (2019)

17. Build and Deploy an Emotion Classifier (2019)

Weights & Biases

Applied Deep Learning - Data Management with Josh Tobin (2019)

Applied Deep Learning - Data Management with Josh Tobin (2019)

Weights & Biases

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Weights & Biases

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Weights & Biases

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Weights & Biases

Designing a Machine Learning Project with Neal Khosla (2019)

Designing a Machine Learning Project with Neal Khosla (2019)

Weights & Biases

Lukas Beiwald on ML Tools and Experiment Management (2019)

Lukas Beiwald on ML Tools and Experiment Management (2019)

Weights & Biases

Building Machine Learning Teams with Josh Tobin (2019)

Building Machine Learning Teams with Josh Tobin (2019)

Weights & Biases

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Weights & Biases

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Weights & Biases

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Weights & Biases

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Weights & Biases

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Weights & Biases

Organizing ML projects — W&B walkthrough (2020)

Organizing ML projects — W&B walkthrough (2020)

Weights & Biases

Brandon Rohrer — Machine Learning in Production for Robots

Brandon Rohrer — Machine Learning in Production for Robots

Weights & Biases

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Weights & Biases

My experiments with Reinforcement Learning with Jariullah Safi

My experiments with Reinforcement Learning with Jariullah Safi

Weights & Biases

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Weights & Biases

Testing Machine Learning Models with Eric Schles

Testing Machine Learning Models with Eric Schles

Weights & Biases

How Linear Algebra is not like Algebra with Charles Frye

How Linear Algebra is not like Algebra with Charles Frye

Weights & Biases

Predicting Protein Structures using Deep Learning with Jonathan King

Predicting Protein Structures using Deep Learning with Jonathan King

Weights & Biases

Rachael Tatman — Conversational AI and Linguistics

Rachael Tatman — Conversational AI and Linguistics

Weights & Biases

Reformer by Han Lee

Reformer by Han Lee

Weights & Biases

Sequence Models with Pujaa Rajan

Sequence Models with Pujaa Rajan

Weights & Biases

GitHub Actions & Machine Learning Workflows with Hamel Husain

GitHub Actions & Machine Learning Workflows with Hamel Husain

Weights & Biases

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Weights & Biases

Jack Clark — Building Trustworthy AI Systems

Jack Clark — Building Trustworthy AI Systems

Weights & Biases

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Weights & Biases

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Weights & Biases

Antipatterns in open source research code with Jariullah Safi

Antipatterns in open source research code with Jariullah Safi

Weights & Biases

Attention for time series forecasting & COVID predictions - Isaac Godfried

Attention for time series forecasting & COVID predictions - Isaac Godfried

Weights & Biases

Made with ML - Goku Mohandas

Made with ML - Goku Mohandas

Weights & Biases

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Weights & Biases

Deep Learning Salon by Weights & Biases

Deep Learning Salon by Weights & Biases

Weights & Biases

The Fréchet derivative is a powerful tool for simplifying vector calculus and linear algebra calculations, with applications in machine learning and beyond. This video introduces the concept and explores its uses in various contexts.

Key Takeaways

Approximate the behavior of a function near a point with a line passing through the function's value at that point
Update the parameters of a model to make it perform better by choosing new parameters that minimize the loss function, approximated by a linear function
Write out the left-hand side of the function definition
Pattern match the right-hand side of the function definition
Calculate the derivative of the function
Multiply the squared norm out to find the gradient

💡 The Fréchet derivative provides a new way of thinking about vector calculus, making it easier to work with and understand, and can be used to simplify calculations in linear algebra and calculus

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

Super Mario is mathier than you think

Super Mario's world is full of mathematical concepts, making it a great example of how math is used in real-world problem-solving

MIT Technology Review

A Geometry Puzzle With 3 Circles

Solve a geometry puzzle involving 3 circles using mathematical reasoning and visualization techniques

Medium · Data Science

The Consecutive Integers Divisibility Trick

Learn the Consecutive Integers Divisibility Trick to simplify difficult proofs in mathematics and programming

Medium · Programming

The Mayans Invented Zero Before Most of the World — Here Is Their Number System in Python

Learn about the Mayan number system and its implementation in Python, highlighting the importance of zero in their base-20 system

Medium · Python

How to Open OSM Files (OpenStreetMap Data)

File Extension Geeks