Qwen 3.5 | Building a Visual AI Agent to Control Your Computer

Roboflow · Beginner ·🤖 AI Agents & Automation ·43m ago
In this video, Matvei Popov, Machine Learning Engineer at Roboflow, explores the capabilities the Qwen 3.5 vision language model (VLM). Unlike previous architectures that separated language and vision encoders, Qwen 3.5 natively combines vision, language, and coding capabilities within a single model, unlocking advanced agentic capabilities like tool calling and direct computer use. Matvei starts by showing how to run Qwen 3.5 locally using Roboflow Inference and how to set it up within Roboflow Workflows for basic image description tasks. He demonstrates how to use the 0.8B and 2B models, adjust parameters like system prompts, and configure token generation limits. The true power of Qwen 3.5 is revealed in the second half of the video when Matvei builds a visual AI agent capable of controlling his computer. Using a custom Python script and Roboflow Inference, Matvei tasks Qwen 3.5 with navigating the Roboflow UI to kick off a new model training job. The model analyzes screenshots of the UI, outputs normalized screen coordinates for specific buttons, and executes the clicks autonomously—proving how this technology can be used to automate complex UI manipulation or guide physical systems. = Additional Resources = Roboflow Inference: https://inference.roboflow.com/ Roboflow Workflows: https://roboflow.com/workflows = Chapters = 00:00 Introduction to Qwen 3.5: Why Native Vision-Language Models Matter 02:03 Setting Up Qwen 3.5 in Roboflow Workflows 03:51 Running Qwen 3.5 Locally with Roboflow Inference 05:47 Building a Visual Agent for "Computer Use" Automation 07:29 Live Demo: Qwen 3.5 Clicks Buttons and Starts a Training Job 08:38 Exploring Tool Calls and Future Use Cases
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Chapters (6)

Introduction to Qwen 3.5: Why Native Vision-Language Models Matter
2:03 Setting Up Qwen 3.5 in Roboflow Workflows
3:51 Running Qwen 3.5 Locally with Roboflow Inference
5:47 Building a Visual Agent for "Computer Use" Automation
7:29 Live Demo: Qwen 3.5 Clicks Buttons and Starts a Training Job
8:38 Exploring Tool Calls and Future Use Cases
Up next
NEW Claude AI Agent Update is INSANE 🤯
Julian Goldie SEO
Watch →