Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Inp... Deifilia Kieckhefen

Name: Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Inp... Deifilia Kieckhefen
Uploaded: 2026-04-20T20:22:24Z
Channel: PyTorch
Description: Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Input Training - Deifilia Kieckhefen, Karlsruhe Institute of Technology Distri...

PyTorch · Advanced ·🧠 Large Language Models ·3w ago

Skills: Advanced RAG60%

Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Input Training - Deifilia Kieckhefen, Karlsruhe Institute of Technology Distributed neural network training frameworks typically optimize for specific architectures while minimizing communication overhead. Transformer layers can be efficiently parallelized, but other operations such as convolutions often remain inefficient. This creates bottlenecks for complex model architectures. Moreover, existing tensor parallelism strategies typically replicate input data across all processes, creating redundant I/O that scales poorly with input size. In applications with heavy I/O demands-weather forecasting, medical imaging, or video processing-unsharded input data creates additional data-loading bottlenecks that could benefit from parallelization. Jigsaw is a PyTorch library that shards both model weights and input data across parallel processes. It maintains a PyTorch-like interface while parallelizing activations, convolutions, linear layers, and attention through a distributed matrix multiplication backend. We demonstrate the usability of Jigsaw across a wide range of model architectures and shows performance when scaling multi-billion-parameter models sharded across up to 8 processes and compares the scalability to DDP, FSDP, and Megatron-LM approaches.

Watch on YouTube ↗ (saves to browser)