gpt-oss-safeguard technical report

📰 OpenAI News

OpenAI releases gpt-oss-safeguard technical report, detailing performance and safety evaluations of two open-weight reasoning models

advanced Published 29 Oct 2025

Action Steps

Read the technical report to understand the performance and safety evaluations of gpt-oss-safeguard-120b and gpt-oss-safeguard-20b
Use the models to classify content against a provided policy, rather than as the core functionality for end-user interaction
Evaluate the safety metrics provided in the report to understand how gpt-oss-safeguard models function in chat settings

Who Needs to Know This

AI engineers, data scientists, and researchers can benefit from this report to understand the capabilities and limitations of gpt-oss-safeguard models, and to inform their development of safe and effective AI systems

Key Insight

💡 The gpt-oss-safeguard models are fine-tunes of their gpt-oss counterparts and are designed to reason from a provided policy, making them suitable for content classification tasks