Batch vs Mini-Batch vs Stochastic Gradient Descent Explained | Deep Learning 9

ByteQuest · Beginner ·📐 ML Fundamentals ·7mo ago

About this lesson

In this video, we’re going to talk about the different ways Gradient Descent is actually used in machine learning: Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent. The idea is the same, but what changes is how much data we use before updating the weights. Batch Gradient Descent uses the entire dataset at once, so it’s slow but very stable and the loss curve moves smoothly. Stochastic Gradient Descent does the opposite and updates after every single data point, which makes it fast but extremely noisy and unstable. And finally, there’s Mini-Batch Gradient Descent, which is the version used in real applications—it processes the data in smaller batches like 32 or 64 samples, so it converges faster than full batch and is much more stable than noisy SGD. By the end of this video, you’ll know exactly how these three differ and why mini-batch became the standard choice in machine learning. Links for the Related videos:- Neural Networks:- https://youtu.be/sE6OaMndGZg BackPropagation:- https://youtu.be/nAMkcgxKwfA Activation Functions:- https://youtu.be/Kz7bAbhEoyQ Vanishing/Exploding gradients:- https://youtu.be/CzNFuL_5uig Data Normalization:- https://youtu.be/W2vqsTg-rDU 📚 Welcome to the Channel! If you're passionate about learning complex concepts in the simplest way possible, you're in the right place. I create visual explanations using animations to make topics more intuitive and engaging—especially in Algorithms, AI, machine learning, and beyond. 🎥 Animations created using Manim: Manim is an open-source Python library for creating mathematical animations. Learn more or try it yourself: 🔗 https://www.manim.community Let's Connect:- GitHub:- https://github.com/ByteQuest0 Reddit:- https://www.reddit.com/r/ByteQuest/

Original Description

In this video, we’re going to talk about the different ways Gradient Descent is actually used in machine learning: Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent. The idea is the same, but what changes is how much data we use before updating the weights. Batch Gradient Descent uses the entire dataset at once, so it’s slow but very stable and the loss curve moves smoothly. Stochastic Gradient Descent does the opposite and updates after every single data point, which makes it fast but extremely noisy and unstable. And finally, there’s Mini-Batch Gradient Descent, which is the version used in real applications—it processes the data in smaller batches like 32 or 64 samples, so it converges faster than full batch and is much more stable than noisy SGD. By the end of this video, you’ll know exactly how these three differ and why mini-batch became the standard choice in machine learning. Links for the Related videos:- Neural Networks:- https://youtu.be/sE6OaMndGZg BackPropagation:- https://youtu.be/nAMkcgxKwfA Activation Functions:- https://youtu.be/Kz7bAbhEoyQ Vanishing/Exploding gradients:- https://youtu.be/CzNFuL_5uig Data Normalization:- https://youtu.be/W2vqsTg-rDU 📚 Welcome to the Channel! If you're passionate about learning complex concepts in the simplest way possible, you're in the right place. I create visual explanations using animations to make topics more intuitive and engaging—especially in Algorithms, AI, machine learning, and beyond. 🎥 Animations created using Manim: Manim is an open-source Python library for creating mathematical animations. Learn more or try it yourself: 🔗 https://www.manim.community Let's Connect:- GitHub:- https://github.com/ByteQuest0 Reddit:- https://www.reddit.com/r/ByteQuest/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development
Medium · AI
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for advancing AI research
Medium · Data Science
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Explore the geometric assumptions underlying neural networks and their implications on manifold learning and projections
Medium · Deep Learning
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn about the hidden assumptions of neural geometry and how manifolds and projections impact neural network performance
Medium · LLM
Up next
Machine Learning Project for Final Year Students | ML Project Idea @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →