Write Once, Run Everywhere with Pytorch Transformers - Pedro Cuenca, Hugging Face

Name: Write Once, Run Everywhere with Pytorch Transformers - Pedro Cuenca, Hugging Face
Uploaded: 2026-04-20T20:23:39Z
Channel: PyTorch
Description: Write Once, Run Everywhere with Pytorch Transformers - Pedro Cuenca, Hugging Face The Hugging Face transformers library is built on pure PyTorch and can...

PyTorch · Advanced ·🧠 Large Language Models ·3w ago

Skills: LLM Foundations90%Prompt Craft60%

Write Once, Run Everywhere with Pytorch Transformers - Pedro Cuenca, Hugging Face The Hugging Face transformers library is built on pure PyTorch and can be succinctly described as a model-definition framework. It provides an unified, familiar, clear and concise interface to multiple machine learning architectures across modalities. Serving and inference optimizations are not its focus. However, transformers model definitions become the de-facto reference implementations multiple other projects use. This includes training libraries, fast deployment engines such as vLLM and SGLang, and on-device libraries like MLX and llama.cpp. This session describes the path towards increasingly simpler downstream integration of transformers models into inference and deployment libraries, and how transformers and PyTorch core features enable the ecosystem to enjoy newly-released models as soon as they are released. We'll go through the journey towards easier modeling, which implies easier downstream porting and adaptation. The end-game is pure interoperability, where no code changes are required! This is now possible with vLLM and SGLang, and we'll show how. We'll end up discussing our ideas on upcoming interop features with MLX and llama.cpp.

Watch on YouTube ↗ (saves to browser)