Free to audit · Opens on External: Coursera

Optimizing Models for Production

Name: Optimizing Models for Production
Uploaded: 2026-03-30T13:56:32.142Z
Channel: Coursera
Description: The Optimizing Models for Production course is designed for developers, engineers, and technical product builders who are new to Generative AI but alrea...

Coursera · Intermediate ·🧠 Large Language Models ·2mo ago

Skills: LLM Engineering90%ML Pipelines70%

The Optimizing Models for Production course is designed for developers, engineers, and technical product builders who are new to Generative AI but already have intermediate machine learning knowledge, basic Python proficiency, and familiarity with development environments such as VS Code, and who want to engineer, customize, and deploy open generative AI solutions while avoiding vendor lock-in. The course prepares learners to make generative AI models more efficient, scalable, and cost-effective for real-world deployment. Learners begin with quantization, applying INT8 and INT4 precision reduction using tools like bitsandbytes while balancing accuracy and efficiency. Next, they explore inference optimization strategies, including batching, KV-cache management, and token-level computation scheduling to reduce latency in interactive applications. The course also covers memory footprint reduction and adaptive batch sizing for dynamic workloads. In the final module, learners apply practical hardware optimization techniques such as GPU memory tuning, mixed precision inference, and profiling tools like nvidia-smi and PyTorch Profiler to identify bottlenecks. By the end, learners will be able to deliver optimized models across diverse hardware environments, supported by performance benchmarks and reproducible deployment pipelines.

Watch on External: Coursera ↗ (saves to browser)