NOT USING Granite 4.1 ASR - The Fastest ASR?

Sam Witteveen · Advanced ·📐 ML Fundamentals ·2h ago
In this video, I dive into IBM's newly released Granite Speech 4.1 models and explore what makes them interesting — particularly the three 2B variants they've dropped and how each one makes a different trade-off between accuracy, richness, and throughput that you'll actually care about for real applications. We look at the base Granite Speech 4.1 2B which hits an impressive 5.33% WER on the OpenASR leaderboard, the Plus variant that adds speaker-attributed ASR and word-level timestamps, and the NAR (Non-Autoregressive) version that flips the architecture entirely to generate sequences all at once for much better GPU throughput. I also walk through multilingual support across English, French, German, Spanish, Portuguese, and Japanese, plus the bidirectional translation capabilities that make this genuinely useful for enterprise edge deployments. All three models are Apache 2.0 licensed and available on Hugging Face right now. 🔗 Links: Granite Speech 4.1 2B → https://huggingface.co/ibm-granite/granite-speech-4.1-2b Granite Speech 4.1 2B Plus → https://huggingface.co/ibm-granite/granite-speech-4.1-2b-plus Granite Speech 4.1 2B NAR → https://huggingface.co/ibm-granite/granite-speech-4.1-2b-nar IBM Research Blog → https://research.ibm.com/blog/granite-4-1-ai-foundation-models Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:20 IBM Granite Collection 00:27 Granite Docling 00:46 Granite Speech 4.1 01:16 Granite 4.1 Blog 01:38 Granite Speech 4.1 2B 04:02 Granite Speech 4.1 2B Plus 06:15 Granite Speech 4.1 2B NAR 07:30 NLE: Non-autoregressive LLM-based ASR by Transcript Editing Paper 07:45 Architecture 09:45 Code Time 12:00 Granite Speech Model Github #DellProPrecision #DellProMax
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The Day I Realized Most Developers Are Learning Python the Wrong Way
Learn how to apply Python skills by building real systems, rather than just finishing tutorials
Medium · Python
Deterministic OCR in JavaScript: PaddleOCR for Node, Bun, Deno, and the Browser
Run deterministic OCR in JavaScript with PaddleOCR, supporting Node, Bun, Deno, and browser environments
Dev.to · Awal Ariansyah
From Spite to a Double Offer: Data Science Intern at Adobe Research
Learn how Anuj Asthana landed a data science intern position at Adobe Research, and what skills and strategies contributed to his success.
Medium · Machine Learning
Out of curiosity, how did a lot of you start?
Learn how others started their journey in computer science and get inspired to start your own
Dev.to · libre-main

Chapters (12)

Intro
0:20 IBM Granite Collection
0:27 Granite Docling
0:46 Granite Speech 4.1
1:16 Granite 4.1 Blog
1:38 Granite Speech 4.1 2B
4:02 Granite Speech 4.1 2B Plus
6:15 Granite Speech 4.1 2B NAR
7:30 NLE: Non-autoregressive LLM-based ASR by Transcript Editing Paper
7:45 Architecture
9:45 Code Time
12:00 Granite Speech Model Github
Up next
Linus Torvalds Explains The Beginning Of Linux #shorts #linux #opensource #programming
WebKnower
Watch →