Free to audit · Opens on Coursera

Production ML with Hugging Face

Name: Production ML with Hugging Face
Uploaded: 2026-03-30T13:57:36.268Z
Channel: Coursera
Description: Learn to deploy ML models to production using the Sovereign Rust Stack—a pure Rust implementation with zero Python runtime dependencies. This hands-on c...

Coursera · Intermediate ·🧠 Large Language Models ·1mo ago

Skills: Model Deployment95%LLM Engineering70%

Learn to deploy ML models to production using the Sovereign Rust Stack—a pure Rust implementation with zero Python runtime dependencies. This hands-on course teaches you to work with three critical model formats (GGUF, SafeTensors, APR), implement MLOps pipelines with CI/CD and observability, and deploy models across GPU, CPU, WebAssembly, and edge targets. Through real-world projects including a Python-to-Rust transpiler (Depyler), browser-based speech recognition (Whisper.apr), and LLM inference benchmarking (Qwen), you'll master format conversion, cryptographic model signing, and performance optimization. The course culminates in a capstone project deploying Qwen2.5-Coder across all three formats with benchmarking. What makes this course unique: instead of relying on Python frameworks, you'll build with production-grade Rust tooling that compiles to native binaries and WebAssembly. Learn to run sub-millisecond inference in browsers, bundle models into executables, and achieve 2x performance gains over standard tools. Ideal for ML engineers and software developers ready to move beyond notebooks into production deployment.

Watch on Coursera ↗ (saves to browser)