1D convolution for neural networks, part 6: Input gradient

Brandon Rohrer · Intermediate ·📐 ML Fundamentals ·6y ago

Skills: ML Maths Basics80%

Key Takeaways

Covers the topic of 1D convolution for neural networks, specifically the input gradient, as part of a series on convolutional neural networks

Full Transcript

our next step is to regroup these so that all of our partials with respect to a particular input value are all grouped together so this top two lines include all of the parcels with respect to X sub J so it's just all of the little small expressions from the previous set of equations just rearranged just reordered but now we're starting to gather up all of the contributions of this partial of Y with respect to an individual X element an individual input and if you look at them you can represent them then with the pattern so we see that the partial of X sub I plus K with respect to X of I is W sub K this is a shorthand way to represent all of the equations here and you can see that the pattern holds for any X sub I for any input element if we gather up all of the partials with respect to it we can take and represent all of those expressions with this shorthand so for any input X sub I the partial of the output X of I plus K is equal to W sub K so that's a neat little way to condense that and something that we're gonna make good use of now we can actually plug this back in to our chain rule where the input gradient is equal to the summation of the output gradient with respect to each of these partials we can then substitute in this expression W sub K we have the input gradient with respect to the output gradient times W sub K so this is a fairly slick way then to do our back propagation there's one more step we can do if we take W sub K and flip it left to right which we're going to represent with this left-handed arrow above it then everything that was minus K becomes plus K so we can change the sign on the K index in our output gradient and everything else stays the same so we just did a little trick by pre flipping this w sub K now this is a sliding dot product so it is an array which is our output gradient and we have this kernel our flipped W sub K and we're summing it over the full length of that kernel and for each value of our input X sub I so then we can represent that even more concisely as our input gradient is our output gradient convolve with the reversed version of our kernel so this is a really slick little result it says that the derivative of a convolution is a convolution with the kernel flipped there's a pleasing symmetry with that math is beautiful exhibit 673 very very slick

Original Description

Part of an 9-part series on 1D convolution for neural networks. Catch the rest at https://e2eml.school/321

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Brandon Rohrer · Brandon Rohrer · 52 of 60

← Previous Next →

Robot Learning with a Biologically-Inspired Brain (BECCA)

Robot Learning with a Biologically-Inspired Brain (BECCA)

BECCA talk at AGI 2011

BECCA talk at AGI 2011

Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel

Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel

BECCA listens to The Hobbit

BECCA listens to The Hobbit

Learning the building blocks of speech: BECCA extracts a hierarchy of audio features

Learning the building blocks of speech: BECCA extracts a hierarchy of audio features

BECCA listens for sound effects in The Hobbit

BECCA listens for sound effects in The Hobbit

BECCA finds movie trailers while watching the Big Bang Theory

BECCA finds movie trailers while watching the Big Bang Theory

Listening for unexpected sounds: BECCA detects anomalies in audio data

Listening for unexpected sounds: BECCA detects anomalies in audio data

Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features

Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features

Watching for the unexpected: BECCA detects anomalies in video data

Watching for the unexpected: BECCA detects anomalies in video data

BECCA finds a stationary target

BECCA finds a stationary target

BECCA finds a stationary target at 3X speed

BECCA finds a stationary target at 3X speed

BECCA watches the X-men and Bruce Lee

BECCA watches the X-men and Bruce Lee

BECCA plays Quidditch

BECCA plays Quidditch

BECCA chases a ball

BECCA chases a ball

BECCA chases a ball, part 2

BECCA chases a ball, part 2

Becca chases a ball, part 3

Becca chases a ball, part 3

BECCA creates features from MNIST

BECCA creates features from MNIST

How reinforcement learning works in Becca 7

How reinforcement learning works in Becca 7

Deep Learning Demystified

Deep Learning Demystified

How Data Science Works

How Data Science Works

How Convolutional Neural Networks work

How Convolutional Neural Networks work

How Bayes Theorem works

How Bayes Theorem works

How Deep Neural Networks Work

How Deep Neural Networks Work

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

How Support Vector Machines work / How to open a black box

How Support Vector Machines work / How to open a black box

How autocorrelation works

How autocorrelation works

Getting closer to human intelligence through robotics

Getting closer to human intelligence through robotics

A minimalist's guide to slicing and indexing pandas DataFrames

A minimalist's guide to slicing and indexing pandas DataFrames

How decision trees work

How decision trees work

Data scientist archetypes

Data scientist archetypes

How to use python's datetime package

How to use python's datetime package

How optimization for machine learning works, part 1

How optimization for machine learning works, part 1

How optimization for machine learning works, part 2

How optimization for machine learning works, part 2

How optimization for machine learning works, part 3

How optimization for machine learning works, part 3

How optimization for machine learning works, part 4

How optimization for machine learning works, part 4

How convolutional neural networks work, in depth

How convolutional neural networks work, in depth

How to pick a machine learning model 4: Splitting the data

How to pick a machine learning model 4: Splitting the data

How to pick a machine learning model 3: Choosing a loss function

How to pick a machine learning model 3: Choosing a loss function

How to pick a machine learning model 2: Separating signal from noise

How to pick a machine learning model 2: Separating signal from noise

How to pick a machine learning model 1: Choosing between models

How to pick a machine learning model 1: Choosing between models

How to pick a machine learning model 5: Navigating assumptions

How to pick a machine learning model 5: Navigating assumptions

What do neural networks learn?

What do neural networks learn?

Interview with iRobot's Director of Data Science Angela Bassa

Interview with iRobot's Director of Data Science Angela Bassa

How Backpropagation Works

How Backpropagation Works

Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization

Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization

1D convolution for neural networks, part 1: Sliding dot product

1D convolution for neural networks, part 1: Sliding dot product

1D convolution for neural networks, part 2: Convolution copies the kernel

1D convolution for neural networks, part 2: Convolution copies the kernel

1D convolution for neural networks, part 3: Sliding dot product equations longhand

1D convolution for neural networks, part 3: Sliding dot product equations longhand

1D convolution for neural networks, part 4: Convolution equation

1D convolution for neural networks, part 4: Convolution equation

1D convolution for neural networks, part 5: Backpropagation

1D convolution for neural networks, part 5: Backpropagation

1D convolution for neural networks, part 6: Input gradient

1D convolution for neural networks, part 6: Input gradient

1D convolution for neural networks, part 7: Weight gradient

1D convolution for neural networks, part 7: Weight gradient

1D convolution for neural networks, part 8: Padding

1D convolution for neural networks, part 8: Padding

1D convolution for neural networks, part 9: Stride

1D convolution for neural networks, part 9: Stride

The Four Grand Challenges of Robots in the Home

The Four Grand Challenges of Robots in the Home

How Convolution Works

How Convolution Works

The Softmax neural network layer

The Softmax neural network layer

Batch normalization

Batch normalization

Getting ready to learn Python, Mac edition #1: Files and directories

Getting ready to learn Python, Mac edition #1: Files and directories

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

Smaller, Slower, Wrong: What Aggressive Quantization Costs On-Device Inference

Aggressive quantization can lead to slower and less accurate on-device inference, highlighting the importance of balancing model size and performance

Smaller, Slower, Wrong: What Aggressive Quantization Costs On-Device Inference

Aggressive quantization can lead to slower and less accurate on-device inference, highlighting the importance of balancing model compression and performance

Medium · Machine Learning

Causal Inference in Finance: Moving Beyond “What Happened?” to “What Actually Worked?”

Learn to apply causal inference in finance to move beyond descriptive analytics and understand what actually drives outcomes

Medium · Machine Learning

does quantising a model reduce its performance ?[R]

Quantizing a model from fp32 to fp8 can reduce its performance due to information loss, but the extent of the loss depends on the model and task

Reddit r/MachineLearning

Dropout in Deep Learning