Your AI Is ~8× More Expensive in Some Languages — Here's Why

Shane | LLM Implementation · Intermediate ·🧠 Large Language Models ·8mo ago
Non-English prompts can explode token counts. In our demo using the cl100k_base tokenizer, the same sentence is English: 15, Spanish: 21 (×1.4), Telugu: 115 (×7.7) — which maps directly to higher API cost. Counts vary by model/tokenizer; here’s why training data and tokenization create a hidden “Token Tax” — and how to plan for it. This presentation is inspired by the core concepts in the book "AI Engineering" by Chip Huyen. If you want a deeper dive into these topics, I highly recommend checking it out. Timestamps 00:00 - The ~8x AI Cost Nobody Warns You About 00:50 - Problem: Why AI is an English-First World 01:34 - The AI's Library: Common Crawl 02:32 - The Under-representation Crisis (The Official Numbers) 03:56 - DEMO: Proving the 7.7x "Token Tax" 05:03 - How This Impacts Your Projects 05:58 - What This Means for the Future of AI 06:40 - Your Mission & Next Steps Connect & Subscribe: 🎓 Join our FREE AI Engineering Community on Discord: https://discord.gg/rQMxdJJC 🔔 Subscribe for our next series: https://www.youtube.com/@UCf12NnZycD7LB8prrgdTOyg This presentation is inspired by the core concepts in the book "AI Engineering" by Chip Huyen. If you want a deeper dive into these topics, I highly recommend checking it out. #artificialintelligence #ai #aiengineering
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Chapters (8)

The ~8x AI Cost Nobody Warns You About
0:50 Problem: Why AI is an English-First World
1:34 The AI's Library: Common Crawl
2:32 The Under-representation Crisis (The Official Numbers)
3:56 DEMO: Proving the 7.7x "Token Tax"
5:03 How This Impacts Your Projects
5:58 What This Means for the Future of AI
6:40 Your Mission & Next Steps
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →