Inference Characteristics of Streaming Speech Recognition
Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io
Deploying streaming speech recognition (ASR) is quite different from serving an LLM. In this video, I break down the unique challenges, architecture, and surprising behaviors of Kyutai’s Moshi streaming ASR model and its Rust inference server: GPU/CPU memory patterns and the economics of streaming vs. batch ASR.
Kyutai STT offline in browser: https://github.com/lucky-bai/wasm-speech-streaming
0:00 - Introduction
1:50 - Kyutai Moshi architecture
4:03 - Short quiz
4:45 - CPU and GPU memory experiments…
Watch on YouTube ↗
(saves to browser)
Chapters (7)
Introduction
1:50
Kyutai Moshi architecture
4:03
Short quiz
4:45
CPU and GPU memory experiments
6:08
Streaming inference server
8:56
Cost calculation
10:10
Differences vs LLM inference
DeepCamp AI