Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

📰 ArXiv cs.AI

Masked IRL uses LLMs to disambiguate reward functions from demonstrations and language, improving robot adaptation to user preferences

advanced Published 1 Apr 2026

Action Steps

Collect demonstrations of a task
Use LLMs to identify relevant state details
Disambiguate reward functions using language and demonstrations
Fine-tune the reward model for improved generalization

Who Needs to Know This

ML researchers and roboticists can benefit from this approach to improve the accuracy of reward models and enable more effective human-robot collaboration

Key Insight

💡 LLMs can help disambiguate reward functions by identifying relevant state details and specifying what matters for a task