Foundations

Reinforcement Learning

RL algorithms, reward modelling, RLHF, policy gradients, Q-learning and multi-agent RL

0
lessons
Skills in this topic
View full skill map →
RL Foundations
beginner
Formalise a problem as an MDP
Policy Gradient Methods
intermediate
Implement REINFORCE from scratch
RLHF & Alignment
advanced
Describe the RLHF pipeline end-to-end
🎮

No lessons yet for Reinforcement Learning

Content is imported every 6 hours. Check back soon!