LLM Alignment (RLHF, DPO, ORPO) + Hands-on Project

Name: LLM Alignment (RLHF, DPO, ORPO) + Hands-on Project
Uploaded: 2025-11-20T13:48:36+00:00
Channel: BrainOmega
Description: 💖 Support BrainOmega ☕ Buy Me a Coffee: https://buymeacoffee.com/brainomega 💳 Stripe: https://buy.stripe.com/aFa00i6XF7jSbfS9T218c00 💰 PayPal: ht...

BrainOmega · Beginner ·🧠 Large Language Models ·4mo ago

💖 Support BrainOmega ☕ Buy Me a Coffee: https://buymeacoffee.com/brainomega 💳 Stripe: https://buy.stripe.com/aFa00i6XF7jSbfS9T218c00 💰 PayPal: https://paypal.me/farhadrh 🎥 Want to master LLM alignment end to end—from RLHF and DPO all the way to cutting-edge ORPO—while actually building and fine-tuning models yourself? In this masterclass, we combine theory + code in one long-form tutorial: Fine-tune LLaMA-3 8B with DPO using Hugging Face TRL + PEFT (LoRA + quantization) Then build a Llama-like model from scratch in PyTorch and implement ORPO (Monolithic Preference Optimization) ste…

Watch on YouTube ↗ (saves to browser)

Chapters (19)

1. What is Alignment in Large Language Models (LLMs)?

2:42 2. Alignment: A Path to Trust, But Not Full Explainability

5:13 3. The Future and Core Philosophy of Alignment

7:51 4. LLM Alignment Approaches – RLHF Overview

10:30 5. DPO vs. RLHF

13:14 6. How DPO Actually Works: The Loss Function

17:40 7. Fine-Tune LLaMA-3 with Direct Preference Optimization (DPO)

20:57 8. Running the DPO Code on Google Colab Pro

22:11 9. Data Processing Pipeline for DPO

25:32 10. Model Loading: Quantization & LoRA (PEFT)

28:48 11. Configuring the DPOTrainer (Hugging Face TRL)

31:47 12. Inference with a DPO-Tuned LLaMA-3

35:50 13. DPO Wrap-Up: Conclusion & Next Steps

37:15 14. ORPO Introduction: From SFT → RLHF → DPO → ORPO

51:37 15. ORPO Project Structure & Configuration

54:32 16. Building Llama From Scratch: RMSNorm & RoPE

1:00:31 17. Building Llama From Scratch: Grouped-Query Attention (GQA)

1:04:21 18. Building Llama From Scratch: The Full Model

1:10:50 19. The ORPO Loss Expl

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)