Learning to reason with LLMs
📰 OpenAI News
OpenAI's new model, o1, demonstrates improved reasoning capabilities through large-scale reinforcement learning and chain of thought techniques
Action Steps
- Understand the concept of chain of thought and its application in LLMs
- Explore the use of reinforcement learning in training LLMs
- Evaluate the performance of o1 on various human exams and ML benchmarks
- Investigate the potential applications of o1 in real-world scenarios
Who Needs to Know This
AI researchers and engineers can benefit from this new model, as it provides a more efficient and effective way to train LLMs, while product managers and developers can leverage o1 to build more intelligent and reasoning-enabled applications
Key Insight
💡 The use of large-scale reinforcement learning and chain of thought techniques can significantly improve the reasoning capabilities of LLMs
Share This
🤖 OpenAI's new model, o1, achieves state-of-the-art results in reasoning-heavy tasks! 📈
Key Takeaways
OpenAI's new model, o1, demonstrates improved reasoning capabilities through large-scale reinforcement learning and chain of thought techniques
Full Article
# Learning to reason with LLMs | OpenAI
[Skip to main content](https://openai.com/index/learning-to-reason-with-llms#main)
[](https://openai.com/)
* [Research](https://openai.com/research/index/)
* Products
* [Business](https://openai.com/business/)
* [Developers](https://openai.com/api/)
* [Company](https://openai.com/about/)
* [Foundation(opens in a new window)](https://openaifoundation.org/)
Log in[Try ChatGPT(opens in a new window)](https://chatgpt.com/)
* Research
* Products
* Business
* Developers
* Company
* [Foundation(opens in a new window)](https://openaifoundation.org/)
[Try ChatGPT(opens in a new window)](https://chatgpt.com/)Login
OpenAI
Table of contents
* [Evals](https://openai.com/index/learning-to-reason-with-llms#evals)
* [Chain of Thought](https://openai.com/index/learning-to-reason-with-llms#chain-of-thought)
* [Coding](https://openai.com/index/learning-to-reason-with-llms#coding)
* [Human preference evaluation](https://openai.com/index/learning-to-reason-with-llms#human-preference-evaluation)
* [Safety](https://openai.com/index/learning-to-reason-with-llms#safety)
* [Hiding the Chains of Thought](https://openai.com/index/learning-to-reason-with-llms#hiding-the-chains-of-thought)
* [Conclusion](https://openai.com/index/learning-to-reason-with-llms#conclusion)
* [Appendix A](https://openai.com/index/learning-to-reason-with-llms#appendix-a)
September 12, 2024
[Release](https://openai.com/research/index/release/)
# Learning to reason with LLMs
[Contributions](https://openai.com/openai-o1-contributions/)[Use o1(opens in a new window)](https://chatgpt.com/)
Loading…
Share
OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1‑preview, for immediate use in ChatGPT and to [trusted API users(opens in a new window)](https://platform.openai.com/docs/guides/rate-limits/usage-tiers).
Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.

o1 performance smoothly improves with both train-time and test-time compute
## Evals
To highlight the reasoning improvement over GPT‑4o, we tested our models on a diverse set of human exams and ML benchmarks. We show that o1 significantly outperforms GPT‑4o on the vast majority of these reasoning-heavy tasks. Unless otherwise specified, we evaluated o1 on the maximal test-time compute setting.


[](https://openai.com/)
* [Research](https://openai.com/research/index/)
* Products
* [Business](https://openai.com/business/)
* [Developers](https://openai.com/api/)
* [Company](https://openai.com/about/)
* [Foundation(opens in a new window)](https://openaifoundation.org/)
Log in[Try ChatGPT(opens in a new window)](https://chatgpt.com/)
* Research
* Products
* Business
* Developers
* Company
* [Foundation(opens in a new window)](https://openaifoundation.org/)
[Try ChatGPT(opens in a new window)](https://chatgpt.com/)Login
OpenAI
Table of contents
* [Evals](https://openai.com/index/learning-to-reason-with-llms#evals)
* [Chain of Thought](https://openai.com/index/learning-to-reason-with-llms#chain-of-thought)
* [Coding](https://openai.com/index/learning-to-reason-with-llms#coding)
* [Human preference evaluation](https://openai.com/index/learning-to-reason-with-llms#human-preference-evaluation)
* [Safety](https://openai.com/index/learning-to-reason-with-llms#safety)
* [Hiding the Chains of Thought](https://openai.com/index/learning-to-reason-with-llms#hiding-the-chains-of-thought)
* [Conclusion](https://openai.com/index/learning-to-reason-with-llms#conclusion)
* [Appendix A](https://openai.com/index/learning-to-reason-with-llms#appendix-a)
September 12, 2024
[Release](https://openai.com/research/index/release/)
# Learning to reason with LLMs
[Contributions](https://openai.com/openai-o1-contributions/)[Use o1(opens in a new window)](https://chatgpt.com/)
Loading…
Share
OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1‑preview, for immediate use in ChatGPT and to [trusted API users(opens in a new window)](https://platform.openai.com/docs/guides/rate-limits/usage-tiers).
Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.

o1 performance smoothly improves with both train-time and test-time compute
## Evals
To highlight the reasoning improvement over GPT‑4o, we tested our models on a diverse set of human exams and ML benchmarks. We show that o1 significantly outperforms GPT‑4o on the vast majority of these reasoning-heavy tasks. Unless otherwise specified, we evaluated o1 on the maximal test-time compute setting.

![Image 5: Competition code (CodeForces)](https://cdn.openai.com/reasoning
DeepCamp AI