Computer-Using Agent
📰 OpenAI News
OpenAI introduces Computer-Using Agent, a model that enables AI to interact with digital interfaces like humans
Action Steps
- Understand the concept of Computer-Using Agent and its capabilities
- Explore the research preview of Operator, which powers CUA
- Review the safety measures and limitations of CUA
- Consider potential applications of CUA in various industries and domains
Who Needs to Know This
AI researchers and engineers can leverage Computer-Using Agent to develop more sophisticated AI models, while product managers and designers can explore new applications for this technology
Key Insight
💡 Computer-Using Agent combines GUI perception with structured problem-solving, enabling it to perform digital tasks without using OS-or web-specific APIs
Share This
🤖 Meet Computer-Using Agent, a new AI model that interacts with digital interfaces like humans! #AI #ComputerUsingAgent
Key Takeaways
OpenAI introduces Computer-Using Agent, a model that enables AI to interact with digital interfaces like humans
Full Article
# Computer-Using Agent | OpenAI
[Skip to main content](https://openai.com/index/computer-using-agent#main)
[](https://openai.com/)
* [Research](https://openai.com/research/index/)
* Products
* [Business](https://openai.com/business/)
* [Developers](https://openai.com/api/)
* [Company](https://openai.com/about/)
* [Foundation(opens in a new window)](https://openaifoundation.org/)
[Try ChatGPT(opens in a new window)](https://chatgpt.com/)
* Research
* Products
* Business
* Developers
* Company
* [Foundation(opens in a new window)](https://openaifoundation.org/)
[Try ChatGPT(opens in a new window)](https://chatgpt.com/)
OpenAI
Table of contents
* [How it works](https://openai.com/index/computer-using-agent#how-it-works)
* [Evaluations](https://openai.com/index/computer-using-agent#evaluations)
* [Safety](https://openai.com/index/computer-using-agent#safety)
* [Conclusion](https://openai.com/index/computer-using-agent#conclusion)
January 23, 2025
[Release](https://openai.com/research/index/release/)
# Computer-Using Agent
Powering Operator with Computer-Using Agent, a universal interface for AI to interact with the digital world.
[Go to Operator(opens in a new window)](https://operator.chatgpt.com/)
Loading…
Share
Today we introduced a research preview of [Operator(opens in a new window)](https://operator.chatgpt.com/), an agent that can go to the web to perform tasks for you. Powering Operator is Computer-Using Agent (CUA), a model that combines GPT‑4o's vision capabilities with advanced reasoning through reinforcement learning. CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen—just as humans do. This gives it the flexibility to perform digital tasks without using OS-or web-specific APIs.
CUA builds off of years of foundational research at the intersection of multimodal understanding and reasoning. By combining advanced GUI perception with structured problem-solving, it can break tasks into multi-step plans and adaptively self-correct when challenges arise. This capability marks the next step in AI development, allowing models to use the same tools humans rely on daily and opening the door to a vast range of new applications.
While CUA is still early and has limitations, it sets new state-of-the-art benchmark results, achieving a 38.1% success rate on OSWorld for full computer use tasks, and 58.1% on WebArena and 87% on WebVoyager for web-based tasks. These results highlight CUA’s ability to navigate and operate across diverse environments using a single general action space.
We’ve developed CUA with safety as a top priority to address the challenges posed by an agent having access to the digital world, as detailed in our [Operator System Card](https://openai.com/index/operator-system-card/). In line with our iterative deployment strategy, we are releasing CUA through a research preview of Operator at [operator.chatgpt.com(opens in a new window)](http://operator.chatgpt.com/) for [Pro](https://openai.com/chatgpt/pricing/) Tier users in the U.S. to start. By gathering real-world feedback, we can refine safety measures and continuously improve as we prepare for a future with increasing use of digital agents.
## How it works

CUA processes raw pixel data to understand what’s happening on the screen and uses a virtual mouse and keyboard to complete actions. It can navigate multi-step tasks, handle errors, and adapt to unexpected changes. This enables CUA to act in a wide range of digital environments, performing tasks like filling out forms and navigating websites without needing specialized
[Skip to main content](https://openai.com/index/computer-using-agent#main)
[](https://openai.com/)
* [Research](https://openai.com/research/index/)
* Products
* [Business](https://openai.com/business/)
* [Developers](https://openai.com/api/)
* [Company](https://openai.com/about/)
* [Foundation(opens in a new window)](https://openaifoundation.org/)
[Try ChatGPT(opens in a new window)](https://chatgpt.com/)
* Research
* Products
* Business
* Developers
* Company
* [Foundation(opens in a new window)](https://openaifoundation.org/)
[Try ChatGPT(opens in a new window)](https://chatgpt.com/)
OpenAI
Table of contents
* [How it works](https://openai.com/index/computer-using-agent#how-it-works)
* [Evaluations](https://openai.com/index/computer-using-agent#evaluations)
* [Safety](https://openai.com/index/computer-using-agent#safety)
* [Conclusion](https://openai.com/index/computer-using-agent#conclusion)
January 23, 2025
[Release](https://openai.com/research/index/release/)
# Computer-Using Agent
Powering Operator with Computer-Using Agent, a universal interface for AI to interact with the digital world.
[Go to Operator(opens in a new window)](https://operator.chatgpt.com/)
Loading…
Share
Today we introduced a research preview of [Operator(opens in a new window)](https://operator.chatgpt.com/), an agent that can go to the web to perform tasks for you. Powering Operator is Computer-Using Agent (CUA), a model that combines GPT‑4o's vision capabilities with advanced reasoning through reinforcement learning. CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen—just as humans do. This gives it the flexibility to perform digital tasks without using OS-or web-specific APIs.
CUA builds off of years of foundational research at the intersection of multimodal understanding and reasoning. By combining advanced GUI perception with structured problem-solving, it can break tasks into multi-step plans and adaptively self-correct when challenges arise. This capability marks the next step in AI development, allowing models to use the same tools humans rely on daily and opening the door to a vast range of new applications.
While CUA is still early and has limitations, it sets new state-of-the-art benchmark results, achieving a 38.1% success rate on OSWorld for full computer use tasks, and 58.1% on WebArena and 87% on WebVoyager for web-based tasks. These results highlight CUA’s ability to navigate and operate across diverse environments using a single general action space.
We’ve developed CUA with safety as a top priority to address the challenges posed by an agent having access to the digital world, as detailed in our [Operator System Card](https://openai.com/index/operator-system-card/). In line with our iterative deployment strategy, we are releasing CUA through a research preview of Operator at [operator.chatgpt.com(opens in a new window)](http://operator.chatgpt.com/) for [Pro](https://openai.com/chatgpt/pricing/) Tier users in the U.S. to start. By gathering real-world feedback, we can refine safety measures and continuously improve as we prepare for a future with increasing use of digital agents.
## How it works

CUA processes raw pixel data to understand what’s happening on the screen and uses a virtual mouse and keyboard to complete actions. It can navigate multi-step tasks, handle errors, and adapt to unexpected changes. This enables CUA to act in a wide range of digital environments, performing tasks like filling out forms and navigating websites without needing specialized
DeepCamp AI