How to Fine-tune LLMs with RLVR (OpenAIโs RFT API)
๐ค Work with me: https://aibuilder.academy/yt/k-94oCJ_WJo
๐ Ship AI apps in weeks, not months: https://aibuilder.academy/courses/yt/k-94oCJ_WJo
This is the 3rd video in a larger series on reinforcement learning (RL) with LLMs. Here, walk through a concrete example of fine-tuning GPT-o4-mini to detect HDFS anomalies using RLVR.
๐ป GitHub Repo: https://github.com/ShawhinT/rlvr-hdfs-classification
๐ค Dataset: https://huggingface.co/datasets/shawhin/HDFS_v1_blocks
โถ๏ธ Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosU_UY8NtZAMaraz74sMHo2W
References
[1] arXiv:2509.16679 [cs.CL]
[2] arXiv:2509.04501 [cs.CL]
[3] arXiv:2501.12948 [cs.CL]
[4] https://platform.openai.com/docs/guides/reinforcement-fine-tuning
Introduction - 0:00
RL with LLMs - 0:15
RLVR - 1:42
SFT vs RLVR - 2:23
Example: HDFS Classification with RLVR - 4:09
Step 0: Imports - 6:37
Step 1: Train-Validation Split - 7:40
Step 2: Format Data - 10:23
Step 3: Create Grader - 12:27
Step 4: Fine-tune Model - 15:38
Step 5: Evaluate Model - 19:07
Limitations - 22:40
What's Next? - 25:00
Watch on YouTube โ
(saves to browser)
Sign in to unlock AI tutor explanation ยท โก30
Playlist
Uploads from Shaw Talebi ยท Shaw Talebi ยท 0 of 60
โ Previous
Next โ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
biometricDashboard2 DEMO
Shaw Talebi
biometricDahboard3 DEMO
Shaw Talebi
Time Series, Signals, & the Fourier Transform | Introduction
Shaw Talebi
The Fast Fourier Transform | How does it (actually) work?
Shaw Talebi
The Wavelet Transform | Introduction & Example Code
Shaw Talebi
Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Shaw Talebi
Independent Component Analysis (ICA) | EEG Analysis Example Code
Shaw Talebi
Kmeans-based Blink Detecter DEMO
Shaw Talebi
Shit Happens, Stay Solution Oriented
Shaw Talebi
Why Conflict Is Good & How You Can Use It
Shaw Talebi
Causality: An Introduction | How (naive) statistics can fail us
Shaw Talebi
Causal Inference | Answering causal questions
Shaw Talebi
Causal Discovery | Inferring causality from observational data
Shaw Talebi
How to Be Antifragile | 7 Practical Tips
Shaw Talebi
Multi-kills: How to Do More With Less (no, not by multi-tasking)
Shaw Talebi
Topological Data Analysis (TDA) | An introduction
Shaw Talebi
The Mapper Algorithm | Overview & Python Example Code
Shaw Talebi
Persistent Homology | Introduction & Python Example Code
Shaw Talebi
What Is Data Science & How To Start? | A Beginner's Guide
Shaw Talebi
How to do MORE with LESS - multikills
Shaw Talebi
Causal Effects | An introduction
Shaw Talebi
Causal Effects via Propensity Scores | Introduction & Python Code
Shaw Talebi
Causal Effects via the Do-operator | Overview & Example
Shaw Talebi
Causal Effects via DAGs | How to Handle Unobserved Confounders
Shaw Talebi
Smoothing Crypto Time Series with Wavelets | Real-world Data Project
Shaw Talebi
Causal Effects via Regression w/ Python Code
Shaw Talebi
5 Reasons Why Every Data Scientist Should Consider Freelancing
Shaw Talebi
An Introduction to Decision Trees | Gini Impurity & Python Code
Shaw Talebi
10 Decision Trees are Better Than 1 | Random Forest & AdaBoost
Shaw Talebi
Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Shaw Talebi
How to Make a Data Science Portfolio With GitHub Pages (2025)
Shaw Talebi
My $100,000+ Data Science Resume (what got me hired)
Shaw Talebi
How to Create a Custom Email Signature in Gmail (2025)
Shaw Talebi
I Spent $675.92 Talking to Top Data Scientists on UpworkโHereโs what I learned
Shaw Talebi
Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience
Shaw Talebi
A Practical Introduction to Large Language Models (LLMs)
Shaw Talebi
The OpenAI (Python) API | Introduction & Example Code
Shaw Talebi
The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
Shaw Talebi
Why I Quit My $150,000 Data Science Job
Shaw Talebi
Prompt Engineering: How to Trick AI into Solving Your Problems
Shaw Talebi
The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness
Shaw Talebi
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Shaw Talebi
How to Build an LLM from Scratch | An Overview
Shaw Talebi
I Have 90 Days to Make $10k/moโHere's my plan
Shaw Talebi
I Spent $716.46 Talking to Data Scientists on UpworkโHereโs what I learned.
Shaw Talebi
Pareto, Power Laws, and Fat Tails
Shaw Talebi
Do NOT become an entrepreneur #entrepreneurship
Shaw Talebi
Detecting Power Laws in Real-world Data | w/ Python Code
Shaw Talebi
How Iโd learn data analytics (if I had to start over in 2024) #dataanalytics
Shaw Talebi
4 Ways to Measure Fat Tails with Python (+ Example Code)
Shaw Talebi
Fine-tuning EXPLAINED in 40 sec #generativeai
Shaw Talebi
How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
Shaw Talebi
5 Questions Every Data Scientist Should Hardcode into Their Brain
Shaw Talebi
AI for Business: A (non-technical) introduction
Shaw Talebi
LLMs EXPLAINED in 60 seconds #ai
Shaw Talebi
3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
Shaw Talebi
What is #ai? โ Simply Explained
Shaw Talebi
QLoRAโHow to Fine-tune an LLM on a Single GPU (w/ Python Code)
Shaw Talebi
How to Improve LLMs with RAG (Overview + Python Code)
Shaw Talebi
Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Shaw Talebi
More on: Fine-tuning LLMs
View skill โRelated AI Lessons
โก
โก
โก
โก
Moonshot AI and the Rise of Beijingโs Open-Source Frontier: What a $20B Valuation Signals forโฆ
Medium ยท LLM
โLLMs Do Not Remember Anythingโ: They only process the context we give them.
Dev.to AI
Why My Coding Assistant Started Replying in Korean When I Typed Chinese
Towards Data Science
Claude AI vs ChatGPT: What I Noticed After Using Both for Real Projects
Medium ยท ChatGPT
๐
Tutor Explanation
DeepCamp AI