Reinforcement Learning from Human Feedback

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Reinforcement Learning from Human Feedback

Coursera · Beginner ·🧠 Large Language Models ·3mo ago

Skills: RL Foundations80%

Key Takeaways

Trains large language models with reinforcement learning from human feedback

Original Description

Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences. Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case. In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will: 1. Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets. 2. Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF. 3. Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.

Watch on External: Coursera ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: RL Foundations

View skill →

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Nicholas Renotte

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Nicholas Renotte

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

How to Win Slot Machines - Intro to Deep Learning #13

How to Win Slot Machines - Intro to Deep Learning #13

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Nicholas Renotte

Related Reads

Building an open-source offline voice assistant with Ollama—looking for contributors and brutally honest feedback

Learn how to build an open-source offline voice assistant using Ollama and contribute to the AURA project for a private and extensible AI experience

Optimizing LLM Inference for Human-Computer Interaction

Optimize LLM inference for human-computer interaction to achieve low latency and high responsiveness, crucial for user experience

Open-Weight LLM API Integration: A Developer's Guide to Flexible AI Integration

Learn to integrate open-weight LLM APIs for flexible AI integration, enabling fine-grained control and vendor-agnostic solutions

Meet GPT-Red: an LLM super-hacker OpenAI built to make its models safer

OpenAI's GPT-Red LLM super-hacker strengthens model defenses against cyberattacks, making GPT-5.6 the most robust release yet

MIT Technology Review

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)