Write Once, Run Everywhere with Pytorch Transformers - Pedro Cuenca, Hugging Face

PyTorch · Advanced ·🧠 Large Language Models ·3w ago
Write Once, Run Everywhere with Pytorch Transformers - Pedro Cuenca, Hugging Face The Hugging Face transformers library is built on pure PyTorch and can be succinctly described as a model-definition framework. It provides an unified, familiar, clear and concise interface to multiple machine learning architectures across modalities. Serving and inference optimizations are not its focus. However, transformers model definitions become the de-facto reference implementations multiple other projects use. This includes training libraries, fast deployment engines such as vLLM and SGLang, and on-device libraries like MLX and llama.cpp. This session describes the path towards increasingly simpler downstream integration of transformers models into inference and deployment libraries, and how transformers and PyTorch core features enable the ecosystem to enjoy newly-released models as soon as they are released. We'll go through the journey towards easier modeling, which implies easier downstream porting and adaptation. The end-game is pure interoperability, where no code changes are required! This is now possible with vLLM and SGLang, and we'll show how. We'll end up discussing our ideas on upcoming interop features with MLX and llama.cpp.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →