✕ Clear filters
76 lessons

🎮 Reinforcement Learning

RL algorithms, reward modelling, RLHF, policy gradients, Q-learning and multi-agent RL

All ▶ YouTube 279,033📚 External: Coursera 18,752🏛 Archive.org 625 | 📰 Articles →

Looking for written articles and micro-lessons? Switch to Reads.

You Won't Believe How This Cop Got Away With This... #police #lawyer
Reinforcement Learning
You Won't Believe How This Cop Got Away With This... #police #lawyer
Hampton Law Advanced 1w ago
How to build your own LLM from Scratch | Rakesh Gohel
Reinforcement Learning
How to build your own LLM from Scratch | Rakesh Gohel
Rakesh Gohel Advanced 1w ago
Preference Alignment & RLHF in LLMs Explained with Huggingface Practical | RLHF, PPO Part-3
Reinforcement Learning
Preference Alignment & RLHF in LLMs Explained with Huggingface Practical | RLHF, PPO Part-3
Sunny Savita Advanced 1w ago
Reinforcement Learning from Human Feedback (RLHF) - High-Level Intuition
Reinforcement Learning
Reinforcement Learning from Human Feedback (RLHF) - High-Level Intuition
SH AI Academy Advanced 2w ago
GLP-1s: Overdosing, Side Effects & Long-Term Risks | Dr. Abud Bakri & Dr. Andrew Huberman
Reinforcement Learning
GLP-1s: Overdosing, Side Effects & Long-Term Risks | Dr. Abud Bakri & Dr. Andrew Huberman
Huberman Lab Clips Advanced 1mo ago
The Types of LLM Fine-Tuning: SFT, RLHF, DPO, and LoRA Explained
Reinforcement Learning
The Types of LLM Fine-Tuning: SFT, RLHF, DPO, and LoRA Explained
SH AI Academy Advanced 1mo ago
Understanding Reinforcement Learning with Prime Intellect and Unsloth | Nemotron Labs
Reinforcement Learning
Understanding Reinforcement Learning with Prime Intellect and Unsloth | Nemotron Labs
NVIDIA Developer Advanced 2mo ago
Huggingface TRL vs Unsloth RL: Reinforcement Learning Frameworks. How to fine tuning LLMs - Gemma 4
Reinforcement Learning
Huggingface TRL vs Unsloth RL: Reinforcement Learning Frameworks. How to fine tuning LLMs - Gemma 4
Byte Goose AI. Advanced 2mo ago
S02E04 — The Model Was Getting Rewarded for Mistakes — Reward Model
Reinforcement Learning
S02E04 — The Model Was Getting Rewarded for Mistakes — Reward Model
AI X-Rayed Advanced 3mo ago
Unsloth RL Training. Nvidia NeMO RL using GRPO. Reinforcement Learning from Verifiable Rewards  RLVR
Reinforcement Learning
Unsloth RL Training. Nvidia NeMO RL using GRPO. Reinforcement Learning from Verifiable Rewards RLVR
AI Podcast Series. Byte Goose AI. Advanced 3mo ago
Can You Trust an LLM Judge? An RL Researcher's Take
Reinforcement Learning
Can You Trust an LLM Judge? An RL Researcher's Take
Deep Learning with Yacine Advanced 3mo ago
Deep Dive: Teaching Arcee Trinity Mini to Read Medical Research with RLVR and GRPO
Reinforcement Learning
Deep Dive: Teaching Arcee Trinity Mini to Read Medical Research with RLVR and GRPO
Julien Simon Advanced 4mo ago
#Meta rolls out new performance program with bigger #rewards for top #employees
Reinforcement Learning
#Meta rolls out new performance program with bigger #rewards for top #employees
Business Insider Advanced 5mo ago
Why LLMs Shouldn’t Follow Instructions (But Do)
Reinforcement Learning
Why LLMs Shouldn’t Follow Instructions (But Do)
ML Guy Advanced 5mo ago
[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI
Reinforcement Learning ⚡ AI Lesson
[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI
Latent Space Advanced 6mo ago
23. What is RLHF? Reinforcement Learning from Human Feedback Explained In Hindi
Reinforcement Learning
23. What is RLHF? Reinforcement Learning from Human Feedback Explained In Hindi
AI SayI Advanced 6mo ago
Training a Unitree G1 to Walk w/ Reinforcement Learning
Reinforcement Learning ⚡ AI Lesson
Training a Unitree G1 to Walk w/ Reinforcement Learning
Sentdex Advanced 6mo ago
Agent Reinforcement Fine Tuning – Will Hang & Cathy Zhou, OpenAI
Reinforcement Learning ⚡ AI Lesson
Agent Reinforcement Fine Tuning – Will Hang & Cathy Zhou, OpenAI
AI Engineer Advanced 6mo ago
Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 9: RL for LLMs
Reinforcement Learning
Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 9: RL for LLMs
Stanford Online Advanced 6mo ago
Why Every Skyrim AI Becomes a Stealth Archer
Reinforcement Learning ⚡ AI Lesson
Why Every Skyrim AI Becomes a Stealth Archer
Siraj Raval Advanced 7mo ago
LLM Fine-Tuning Crash Course: Finetune model on PDFs, Instruction FT, Preference Training (DPO/RLHF)
3:36:14
Reinforcement Learning ⚡ AI Lesson
LLM Fine-Tuning Crash Course: Finetune model on PDFs, Instruction FT, Preference Training (DPO/RLHF)
Sunny Savita Advanced 7mo ago
Keynote: Olmo-Thinking: Training a Fully Open Reasoning Model - Nathan Lambert
Reinforcement Learning ⚡ AI Lesson
Keynote: Olmo-Thinking: Training a Fully Open Reasoning Model - Nathan Lambert
PyTorch Advanced 8mo ago
Learn to align LLMs through post-training in this new course with AMD!
Reinforcement Learning
Learn to align LLMs through post-training in this new course with AMD!
DeepLearningAI Advanced 8mo ago
Strategy vs Plan: The Difference Every Comms Pro Gets Wrong
Reinforcement Learning
Strategy vs Plan: The Difference Every Comms Pro Gets Wrong
Joanna Parsons Advanced 10mo ago
Unified Agentic RAG - NEW AI for Medical Diagnosis
Reinforcement Learning
Unified Agentic RAG - NEW AI for Medical Diagnosis
Discover AI Advanced 10mo ago
3 Communication Mistakes That Make Leaders DISMISS Your Ideas
Reinforcement Learning
3 Communication Mistakes That Make Leaders DISMISS Your Ideas
Joanna Parsons Advanced 10mo ago
verl: Flexible and Scalable Reinforcement Learning Library for LLM Reasoning and Tool-Calling
Reinforcement Learning
verl: Flexible and Scalable Reinforcement Learning Library for LLM Reasoning and Tool-Calling
PyTorch Advanced 11mo ago
Reinforcement Learning Models - Live Review 2
Reinforcement Learning
Reinforcement Learning Models - Live Review 2
Dr Mehrdad Arashpour Advanced 11mo ago
The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)
Reinforcement Learning
The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)
Latent Space Advanced 11mo ago
Reinforcement learning with Unitree G1 humanoid - Dev w/ G1 P.5
Reinforcement Learning ⚡ AI Lesson
Reinforcement learning with Unitree G1 humanoid - Dev w/ G1 P.5
Sentdex Advanced 11mo ago
AI Singularity Discovered
Reinforcement Learning ⚡ AI Lesson
AI Singularity Discovered
Discover AI Advanced 11mo ago
Learn to post-train LLMs in this free course
Reinforcement Learning
Learn to post-train LLMs in this free course
DeepLearningAI Advanced 12mo ago
Let’s Talk Tokens: AMA on Reinforcement Fine-Tuning (RFT), GRPO, and AI Rewards
Reinforcement Learning
Let’s Talk Tokens: AMA on Reinforcement Fine-Tuning (RFT), GRPO, and AI Rewards
Predibase by Rubrik Advanced 1y ago
Stella Li   Spurious Rewards  Rethinking Training Signals in RLVR
Reinforcement Learning
Stella Li Spurious Rewards Rethinking Training Signals in RLVR
Cohere Advanced 1y ago
'It's Not the Land of 10,000 Things!'
Reinforcement Learning
'It's Not the Land of 10,000 Things!'
MLOps.community Advanced 1y ago
Tricks to Fine Tuning // Prithviraj Ammanabrolu // MLOps Podcast #318
Reinforcement Learning
Tricks to Fine Tuning // Prithviraj Ammanabrolu // MLOps Podcast #318
MLOps.community Advanced 1y ago
Why 90% of Machine Learning Is Labeling—and Why That Era Is Over
Reinforcement Learning
Why 90% of Machine Learning Is Labeling—and Why That Era Is Over
Dev In the Details Advanced 1y ago
Reward Models | Data Brew | Episode 40
Reinforcement Learning ⚡ AI Lesson
Reward Models | Data Brew | Episode 40
Databricks Advanced 1y ago
DPO | Direct Preference Optimization (DPO) architecture | LLM Alignment
Reinforcement Learning
DPO | Direct Preference Optimization (DPO) architecture | LLM Alignment
AILinkDeepTech Advanced 1y ago
How to Train LLMs to "Think" (o1 & DeepSeek-R1)
Reinforcement Learning
How to Train LLMs to "Think" (o1 & DeepSeek-R1)
Shaw Talebi Advanced 1y ago
Unlocking Enterprise AI: The DeepSeek Innovation Transforming Data Privacy
Reinforcement Learning ⚡ AI Lesson
Unlocking Enterprise AI: The DeepSeek Innovation Transforming Data Privacy
Lucidate Advanced 1y ago
RLHF : Reinforcement Learning through human Feedback ,PPO paper.
Reinforcement Learning
RLHF : Reinforcement Learning through human Feedback ,PPO paper.
Tanisha Choudhary Advanced 1y ago
10 Ways to Communicate Effectively At Work
Reinforcement Learning
10 Ways to Communicate Effectively At Work
Joanna Parsons Advanced 10mo ago
If You're the ONLY Internal Comms Person in Your Company, Watch This
Reinforcement Learning
If You're the ONLY Internal Comms Person in Your Company, Watch This
Joanna Parsons Advanced 12mo ago
New AI Framework: Post-Training
Reinforcement Learning
New AI Framework: Post-Training
Discover AI Advanced 1y ago
NO AI Self-Improvement w/ RL
Reinforcement Learning ⚡ AI Lesson
NO AI Self-Improvement w/ RL
Discover AI Advanced 1y ago
Knowledge Graphs w/ AI Agents form CRYSTAL (MIT)
Reinforcement Learning
Knowledge Graphs w/ AI Agents form CRYSTAL (MIT)
Discover AI Advanced 1y ago
AI Agents: NEW Inference Reasoning Q-NET (QLASS)
Reinforcement Learning
AI Agents: NEW Inference Reasoning Q-NET (QLASS)
Discover AI Advanced 1y ago
📚 Continue on Coursera External links · Free to audit
1 / 3 View all →
Generative AI Advance Fine-Tuning for LLMs
📚 External: Coursera ↗
Self-paced
Generative AI Advance Fine-Tuning for LLMs
Opens on Coursera ↗
Marketing Design with Easil
📚 External: Coursera ↗
Self-paced
Marketing Design with Easil
Opens on Coursera ↗
Introduction to Learning
📚 External: Coursera ↗
Self-paced
Introduction to Learning
Opens on Coursera ↗
Designing Larger Python Programs for Data Science
📚 External: Coursera ↗
Self-paced
Designing Larger Python Programs for Data Science
Opens on Coursera ↗
Everyday Parenting: The ABCs of Child Rearing
📚 External: Coursera ↗
Self-paced
Everyday Parenting: The ABCs of Child Rearing
Opens on Coursera ↗
Understand and Apply Artificial Intelligence Fundamentals
📚 External: Coursera ↗
Self-paced
Understand and Apply Artificial Intelligence Fundamentals
Opens on Coursera ↗