1D convolution for neural networks, part 6: Input gradient

Brandon Rohrer · Intermediate ·📐 ML Fundamentals ·6y ago

Key Takeaways

Covers the topic of 1D convolution for neural networks, specifically the input gradient, as part of a series on convolutional neural networks

Full Transcript

our next step is to regroup these so that all of our partials with respect to a particular input value are all grouped together so this top two lines include all of the parcels with respect to X sub J so it's just all of the little small expressions from the previous set of equations just rearranged just reordered but now we're starting to gather up all of the contributions of this partial of Y with respect to an individual X element an individual input and if you look at them you can represent them then with the pattern so we see that the partial of X sub I plus K with respect to X of I is W sub K this is a shorthand way to represent all of the equations here and you can see that the pattern holds for any X sub I for any input element if we gather up all of the partials with respect to it we can take and represent all of those expressions with this shorthand so for any input X sub I the partial of the output X of I plus K is equal to W sub K so that's a neat little way to condense that and something that we're gonna make good use of now we can actually plug this back in to our chain rule where the input gradient is equal to the summation of the output gradient with respect to each of these partials we can then substitute in this expression W sub K we have the input gradient with respect to the output gradient times W sub K so this is a fairly slick way then to do our back propagation there's one more step we can do if we take W sub K and flip it left to right which we're going to represent with this left-handed arrow above it then everything that was minus K becomes plus K so we can change the sign on the K index in our output gradient and everything else stays the same so we just did a little trick by pre flipping this w sub K now this is a sliding dot product so it is an array which is our output gradient and we have this kernel our flipped W sub K and we're summing it over the full length of that kernel and for each value of our input X sub I so then we can represent that even more concisely as our input gradient is our output gradient convolve with the reversed version of our kernel so this is a really slick little result it says that the derivative of a convolution is a convolution with the kernel flipped there's a pleasing symmetry with that math is beautiful exhibit 673 very very slick

Original Description

Part of an 9-part series on 1D convolution for neural networks. Catch the rest at https://e2eml.school/321
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Brandon Rohrer · Brandon Rohrer · 52 of 60

1 Robot Learning with a Biologically-Inspired Brain (BECCA)
Robot Learning with a Biologically-Inspired Brain (BECCA)
Brandon Rohrer
2 BECCA talk at AGI 2011
BECCA talk at AGI 2011
Brandon Rohrer
3 Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel
Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel
Brandon Rohrer
4 BECCA listens to The Hobbit
BECCA listens to The Hobbit
Brandon Rohrer
5 Learning the building blocks of speech: BECCA extracts a hierarchy of audio features
Learning the building blocks of speech: BECCA extracts a hierarchy of audio features
Brandon Rohrer
6 BECCA listens for sound effects in The Hobbit
BECCA listens for sound effects in The Hobbit
Brandon Rohrer
7 BECCA finds movie trailers while watching the Big Bang Theory
BECCA finds movie trailers while watching the Big Bang Theory
Brandon Rohrer
8 Listening for unexpected sounds: BECCA detects anomalies in audio data
Listening for unexpected sounds: BECCA detects anomalies in audio data
Brandon Rohrer
9 Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features
Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features
Brandon Rohrer
10 Watching for the unexpected: BECCA detects anomalies in video data
Watching for the unexpected: BECCA detects anomalies in video data
Brandon Rohrer
11 BECCA finds a stationary target
BECCA finds a stationary target
Brandon Rohrer
12 BECCA finds a stationary target at 3X speed
BECCA finds a stationary target at 3X speed
Brandon Rohrer
13 BECCA watches the X-men and Bruce Lee
BECCA watches the X-men and Bruce Lee
Brandon Rohrer
14 BECCA plays Quidditch
BECCA plays Quidditch
Brandon Rohrer
15 BECCA chases a ball
BECCA chases a ball
Brandon Rohrer
16 BECCA chases a ball, part 2
BECCA chases a ball, part 2
Brandon Rohrer
17 Becca chases a ball, part 3
Becca chases a ball, part 3
Brandon Rohrer
18 BECCA creates features from MNIST
BECCA creates features from MNIST
Brandon Rohrer
19 How reinforcement learning works in Becca 7
How reinforcement learning works in Becca 7
Brandon Rohrer
20 Deep Learning Demystified
Deep Learning Demystified
Brandon Rohrer
21 How Data Science Works
How Data Science Works
Brandon Rohrer
22 How Convolutional Neural Networks work
How Convolutional Neural Networks work
Brandon Rohrer
23 How Bayes Theorem works
How Bayes Theorem works
Brandon Rohrer
24 How Deep Neural Networks Work
How Deep Neural Networks Work
Brandon Rohrer
25 Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)
Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)
Brandon Rohrer
26 How Support Vector Machines work / How to open a black box
How Support Vector Machines work / How to open a black box
Brandon Rohrer
27 How autocorrelation works
How autocorrelation works
Brandon Rohrer
28 Getting closer to human intelligence through robotics
Getting closer to human intelligence through robotics
Brandon Rohrer
29 A minimalist's guide to slicing and indexing pandas DataFrames
A minimalist's guide to slicing and indexing pandas DataFrames
Brandon Rohrer
30 How decision trees work
How decision trees work
Brandon Rohrer
31 Data scientist archetypes
Data scientist archetypes
Brandon Rohrer
32 How to use python's datetime package
How to use python's datetime package
Brandon Rohrer
33 How optimization for machine learning works, part 1
How optimization for machine learning works, part 1
Brandon Rohrer
34 How optimization for machine learning works, part 2
How optimization for machine learning works, part 2
Brandon Rohrer
35 How optimization for machine learning works, part 3
How optimization for machine learning works, part 3
Brandon Rohrer
36 How optimization for machine learning works, part 4
How optimization for machine learning works, part 4
Brandon Rohrer
37 How convolutional neural networks work, in depth
How convolutional neural networks work, in depth
Brandon Rohrer
38 How to pick a machine learning model 4: Splitting the data
How to pick a machine learning model 4: Splitting the data
Brandon Rohrer
39 How to pick a machine learning model 3: Choosing a loss function
How to pick a machine learning model 3: Choosing a loss function
Brandon Rohrer
40 How to pick a machine learning model 2: Separating signal from noise
How to pick a machine learning model 2: Separating signal from noise
Brandon Rohrer
41 How to pick a machine learning model 1: Choosing between models
How to pick a machine learning model 1: Choosing between models
Brandon Rohrer
42 How to pick a machine learning model 5: Navigating assumptions
How to pick a machine learning model 5: Navigating assumptions
Brandon Rohrer
43 What do neural networks learn?
What do neural networks learn?
Brandon Rohrer
44 Interview with iRobot's Director of Data Science Angela Bassa
Interview with iRobot's Director of Data Science Angela Bassa
Brandon Rohrer
45 How Backpropagation Works
How Backpropagation Works
Brandon Rohrer
46 Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization
Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization
Brandon Rohrer
47 1D convolution for neural networks, part 1: Sliding dot product
1D convolution for neural networks, part 1: Sliding dot product
Brandon Rohrer
48 1D convolution for neural networks, part 2: Convolution copies the kernel
1D convolution for neural networks, part 2: Convolution copies the kernel
Brandon Rohrer
49 1D convolution for neural networks, part 3: Sliding dot product equations longhand
1D convolution for neural networks, part 3: Sliding dot product equations longhand
Brandon Rohrer
50 1D convolution for neural networks, part 4: Convolution equation
1D convolution for neural networks, part 4: Convolution equation
Brandon Rohrer
51 1D convolution for neural networks, part 5: Backpropagation
1D convolution for neural networks, part 5: Backpropagation
Brandon Rohrer
1D convolution for neural networks, part 6: Input gradient
1D convolution for neural networks, part 6: Input gradient
Brandon Rohrer
53 1D convolution for neural networks, part 7: Weight gradient
1D convolution for neural networks, part 7: Weight gradient
Brandon Rohrer
54 1D convolution for neural networks, part 8: Padding
1D convolution for neural networks, part 8: Padding
Brandon Rohrer
55 1D convolution for neural networks, part 9: Stride
1D convolution for neural networks, part 9: Stride
Brandon Rohrer
56 The Four Grand Challenges of Robots in the Home
The Four Grand Challenges of Robots in the Home
Brandon Rohrer
57 How Convolution Works
How Convolution Works
Brandon Rohrer
58 The Softmax neural network layer
The Softmax neural network layer
Brandon Rohrer
59 Batch normalization
Batch normalization
Brandon Rohrer
60 Getting ready to learn Python, Mac edition #1: Files and directories
Getting ready to learn Python, Mac edition #1: Files and directories
Brandon Rohrer

Related Reads

📰
What Is MLIR and Why Does It Exist?
Learn about MLIR, a intermediate representation for machine learning models, and its purpose in optimizing ML workflows
Dev.to · Fedor Nikolaev
📰
Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model
Choosing the right machine learning development company is crucial for turning AI investments into measurable results, as it can make or break the success of AI projects
Medium · Machine Learning
📰
Data privacy in AI training: federated learning, differential privacy, and synthetic data
Learn how federated learning, differential privacy, and synthetic data preserve data privacy in AI training, and why they matter for secure machine learning
Dev.to AI
📰
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Learn to preprocess data by encoding and scaling features for better machine learning model performance
Medium · Machine Learning
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →