Create Image-to-Speech Apps with Azure AI
Want to build applications that can see and speak? In this video, you’ll explore how Azure AI Vision and Azure AI Speech work together to create multimodal experiences—like describing images aloud for accessibility and automation.
Follow a step-by-step walkthrough inside Azure AI Foundry to create resources, test Vision Studio’s dense captioning, and convert image descriptions into speech using the Speech playground. By the end, you’ll understand how to connect these services to build smarter, more accessible applications.
00:00 – Why Multimodal AI Matters
00:29 – What Is Azure AI Vision?
00…
Watch on YouTube ↗
(saves to browser)
Chapters (14)
Why Multimodal AI Matters
0:29
What Is Azure AI Vision?
0:40
Azure AI Service vs Resource vs Studio
1:01
Creating an Azure AI Foundry Resource
1:21
Managing Resources in the Azure Portal
1:57
Setting Up a New Project
2:05
Navigating Vision Studio
2:44
Finding Keys and Endpoints
2:49
Using Dense Captioning in Image Studio
3:19
Testing AI Speech (Text-to-Speech)
4:00
Connecting Vision and Speech
4:44
Portal vs Studio: When to Use Each
5:34
Monitoring, Security, and Production Use
5:56
Building Accessible AI Applications
DeepCamp AI