LLM Alignment (RLHF, DPO, ORPO) + Hands-on Project
๐ Support BrainOmega
โ Buy Me a Coffee: https://buymeacoffee.com/brainomega
๐ณ Stripe: https://buy.stripe.com/aFa00i6XF7jSbfS9T218c00
๐ฐ PayPal: https://paypal.me/farhadrh
๐ฅ Want to master LLM alignment end to endโfrom RLHF and DPO all the way to cutting-edge ORPOโwhile actually building and fine-tuning models yourself?
In this masterclass, we combine theory + code in one long-form tutorial:
Fine-tune LLaMA-3 8B with DPO using Hugging Face TRL + PEFT (LoRA + quantization)
Then build a Llama-like model from scratch in PyTorch and implement ORPO (Monolithic Preference Optimization) steโฆ
Watch on YouTube โ
(saves to browser)
Chapters (19)
1. What is Alignment in Large Language Models (LLMs)?
2:42
2. Alignment: A Path to Trust, But Not Full Explainability
5:13
3. The Future and Core Philosophy of Alignment
7:51
4. LLM Alignment Approaches โ RLHF Overview
10:30
5. DPO vs. RLHF
13:14
6. How DPO Actually Works: The Loss Function
17:40
7. Fine-Tune LLaMA-3 with Direct Preference Optimization (DPO)
20:57
8. Running the DPO Code on Google Colab Pro
22:11
9. Data Processing Pipeline for DPO
25:32
10. Model Loading: Quantization & LoRA (PEFT)
28:48
11. Configuring the DPOTrainer (Hugging Face TRL)
31:47
12. Inference with a DPO-Tuned LLaMA-3
35:50
13. DPO Wrap-Up: Conclusion & Next Steps
37:15
14. ORPO Introduction: From SFT โ RLHF โ DPO โ ORPO
51:37
15. ORPO Project Structure & Configuration
54:32
16. Building Llama From Scratch: RMSNorm & RoPE
1:00:31
17. Building Llama From Scratch: Grouped-Query Attention (GQA)
1:04:21
18. Building Llama From Scratch: The Full Model
1:10:50
19. The ORPO Loss Expl
DeepCamp AI