Optimizing Models for Production

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Optimizing Models for Production

Coursera · Intermediate ·🧠 Large Language Models ·2mo ago
The Optimizing Models for Production course is designed for developers, engineers, and technical product builders who are new to Generative AI but already have intermediate machine learning knowledge, basic Python proficiency, and familiarity with development environments such as VS Code, and who want to engineer, customize, and deploy open generative AI solutions while avoiding vendor lock-in. The course prepares learners to make generative AI models more efficient, scalable, and cost-effective for real-world deployment. Learners begin with quantization, applying INT8 and INT4 precision reduction using tools like bitsandbytes while balancing accuracy and efficiency. Next, they explore inference optimization strategies, including batching, KV-cache management, and token-level computation scheduling to reduce latency in interactive applications. The course also covers memory footprint reduction and adaptive batch sizing for dynamic workloads. In the final module, learners apply practical hardware optimization techniques such as GPU memory tuning, mixed precision inference, and profiling tools like nvidia-smi and PyTorch Profiler to identify bottlenecks. By the end, learners will be able to deliver optimized models across diverse hardware environments, supported by performance benchmarks and reproducible deployment pipelines.
Watch on External: Coursera ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The AI Skills That Will Be Worth More Than a Degree by 2030
By 2030, AI skills may surpass traditional degrees in value, making them a crucial investment for professionals
Medium · AI
We Keep Asking One Model to Do Everything. That Needs to Stop.
Learn why relying on a single LLM model for all tasks is inefficient and problematic, and how intelligent query routing can help
Medium · AI
We Keep Asking One Model to Do Everything. That Needs to Stop.
Specialized LLMs can outperform general-purpose models, making intelligent query routing crucial for efficiency and governance
Medium · Cybersecurity
We Keep Asking One Model to Do Everything. That Needs to Stop.
Specialized LLMs can outperform generalist models, highlighting the need for intelligent query routing across heterogeneous LLM pools
Medium · LLM
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →