Computer-Using Agent

📰 OpenAI News

OpenAI introduces Computer-Using Agent, a model that enables AI to interact with digital interfaces like humans

advanced Published 23 Jan 2025

Action Steps

Understand the concept of Computer-Using Agent and its capabilities
Explore the research preview of Operator, which powers CUA
Review the safety measures and limitations of CUA
Consider potential applications of CUA in various industries and domains

Who Needs to Know This

AI researchers and engineers can leverage Computer-Using Agent to develop more sophisticated AI models, while product managers and designers can explore new applications for this technology

Key Insight

💡 Computer-Using Agent combines GUI perception with structured problem-solving, enabling it to perform digital tasks without using OS-or web-specific APIs

Key Takeaways

OpenAI introduces Computer-Using Agent, a model that enables AI to interact with digital interfaces like humans

Full Article

# Computer-Using Agent | OpenAI

[Skip to main content](https://openai.com/index/computer-using-agent#main)

[](https://openai.com/)

* [Research](https://openai.com/research/index/)
* Products
* [Business](https://openai.com/business/)
* [Developers](https://openai.com/api/)
* [Company](https://openai.com/about/)
* [Foundation(opens in a new window)](https://openaifoundation.org/)

[Try ChatGPT(opens in a new window)](https://chatgpt.com/)

* Research
* Products
* Business
* Developers
* Company
* [Foundation(opens in a new window)](https://openaifoundation.org/)

[Try ChatGPT(opens in a new window)](https://chatgpt.com/)

OpenAI

Table of contents

* [How it works](https://openai.com/index/computer-using-agent#how-it-works)
* [Evaluations](https://openai.com/index/computer-using-agent#evaluations)
* [Safety](https://openai.com/index/computer-using-agent#safety)
* [Conclusion](https://openai.com/index/computer-using-agent#conclusion)

January 23, 2025

[Release](https://openai.com/research/index/release/)

# Computer-Using Agent

Powering Operator with Computer-Using Agent, a universal interface for AI to interact with the digital world.

[Go to Operator(opens in a new window)](https://operator.chatgpt.com/)

Loading…

Share

Today we introduced a research preview of [Operator⁠(opens in a new window)](https://operator.chatgpt.com/), an agent that can go to the web to perform tasks for you. Powering Operator is Computer-Using Agent (CUA), a model that combines GPT‑4o's vision capabilities with advanced reasoning through reinforcement learning. CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen—just as humans do. This gives it the flexibility to perform digital tasks without using OS-or web-specific APIs.

CUA builds off of years of foundational research at the intersection of multimodal understanding and reasoning. By combining advanced GUI perception with structured problem-solving, it can break tasks into multi-step plans and adaptively self-correct when challenges arise. This capability marks the next step in AI development, allowing models to use the same tools humans rely on daily and opening the door to a vast range of new applications.

While CUA is still early and has limitations, it sets new state-of-the-art benchmark results, achieving a 38.1% success rate on OSWorld for full computer use tasks, and 58.1% on WebArena and 87% on WebVoyager for web-based tasks. These results highlight CUA’s ability to navigate and operate across diverse environments using a single general action space.

We’ve developed CUA with safety as a top priority to address the challenges posed by an agent having access to the digital world, as detailed in our [Operator System Card](https://openai.com/index/operator-system-card/). In line with our iterative deployment strategy, we are releasing CUA through a research preview of Operator at [operator.chatgpt.com⁠(opens in a new window)](http://operator.chatgpt.com/) for [Pro](https://openai.com/chatgpt/pricing/) Tier users in the U.S. to start. By gathering real-world feedback, we can refine safety measures and continuously improve as we prepare for a future with increasing use of digital agents.

## How it works

![Image 1: A flowchart showing the process of a CUA system interpreting input as text or screenshots, generating actions, and applying commands to a virtual machine.](https://images.ctfassets.net/kftzwdyauwt9/1CgBJ2rSQnriAldNvIdInw/32befbf64486d1cabe6377b42a06d4a9/Infographic_Transparent__Web_.png?w=3840&q=90&fm=webp)

CUA processes raw pixel data to understand what’s happening on the screen and uses a virtual mouse and keyboard to complete actions. It can navigate multi-step tasks, handle errors, and adapt to unexpected changes. This enables CUA to act in a wide range of digital environments, performing tasks like filling out forms and navigating websites without needing specialized

Read full article → ← Back to Reads