LLM Alignment (RLHF, DPO, ORPO) + Hands-on Project

BrainOmega ยท Beginner ยท๐Ÿง  Large Language Models ยท4mo ago
๐Ÿ’– Support BrainOmega โ˜• Buy Me a Coffee: https://buymeacoffee.com/brainomega ๐Ÿ’ณ Stripe: https://buy.stripe.com/aFa00i6XF7jSbfS9T218c00 ๐Ÿ’ฐ PayPal: https://paypal.me/farhadrh ๐ŸŽฅ Want to master LLM alignment end to endโ€”from RLHF and DPO all the way to cutting-edge ORPOโ€”while actually building and fine-tuning models yourself? In this masterclass, we combine theory + code in one long-form tutorial: Fine-tune LLaMA-3 8B with DPO using Hugging Face TRL + PEFT (LoRA + quantization) Then build a Llama-like model from scratch in PyTorch and implement ORPO (Monolithic Preference Optimization) steโ€ฆ
Watch on YouTube โ†— (saves to browser)

Chapters (19)

1. What is Alignment in Large Language Models (LLMs)?
2:42 2. Alignment: A Path to Trust, But Not Full Explainability
5:13 3. The Future and Core Philosophy of Alignment
7:51 4. LLM Alignment Approaches โ€“ RLHF Overview
10:30 5. DPO vs. RLHF
13:14 6. How DPO Actually Works: The Loss Function
17:40 7. Fine-Tune LLaMA-3 with Direct Preference Optimization (DPO)
20:57 8. Running the DPO Code on Google Colab Pro
22:11 9. Data Processing Pipeline for DPO
25:32 10. Model Loading: Quantization & LoRA (PEFT)
28:48 11. Configuring the DPOTrainer (Hugging Face TRL)
31:47 12. Inference with a DPO-Tuned LLaMA-3
35:50 13. DPO Wrap-Up: Conclusion & Next Steps
37:15 14. ORPO Introduction: From SFT โ†’ RLHF โ†’ DPO โ†’ ORPO
51:37 15. ORPO Project Structure & Configuration
54:32 16. Building Llama From Scratch: RMSNorm & RoPE
1:00:31 17. Building Llama From Scratch: Grouped-Query Attention (GQA)
1:04:21 18. Building Llama From Scratch: The Full Model
1:10:50 19. The ORPO Loss Expl
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)