Skills › Deep Learning

Training at Scale

Train large models with mixed precision, gradient checkpointing, and distributed strategies.

0%
Confidence · no data yet
Sign in to track

After this skill you can…

  • Use FP16/BF16 mixed precision training
  • Apply gradient accumulation for large batches
  • Set up DDP and FSDP on multi-GPU clusters

Prerequisites

Watch (10 videos)

Lightning Talk: Optimized PyTorch Inference on aarch64 Linux CPUs - Sunita Nadampalli, Amazon (AWS)
PyTorch · intermediate hands-on
→ Optimize PyTorch inference on aarch64 Linux CPUs→ Use Arm compute library for optimization
DeepSpeed: Efficient Training Scalability for Deep Learning - Tunji Ruwase, Snowflake
PyTorch · advanced hands-on
→ Train large-scale deep learning models→ Optimize compiler for efficiency
Keras 3 Distributed Training: Scaling Models with JAX using DataParallel, and ModelParallel
Google for Developers · beginner hands-on
→ Train large deep learning models→ Use Keras 3 for distributed training
Optimize PyTorch: Build and Accelerate Layers
Coursera · advanced hands-on
→ Apply optimizations like mixed precision→ Boost training throughput
Pushing the Performance Envelope: An Optimization Study for 3... Suvaditya Mukherjee & Shireen Chand
PyTorch · advanced hands-on
→ Train 3D generative models with PyTorch→ Optimize performance of Variational Autoencoders
Deep Learning with PyTorch Live Course - Training Deep Neural Networks on GPUs (Part 3 of 6)
freeCodeCamp.org · beginner hands-on
→ Train neural networks on GPUs→ Implement GANs with PyTorch
Stock Price Prediction using GRU | Deep Learning Project in Tamil | Gated Recurrent Unit
Adi Explains · intermediate hands-on
→ Train a GRU model for stock price prediction→ Optimize deep learning models for time series forecasting
Tips N Tricks #6: How to train multiple deep neural networks on TPUs simultaneously
Abhishek Thakur · intermediate hands-on
→ Train multiple neural networks on TPUs→ Optimize hyperparameters for TPU training
Enabling Efficient Trillion Parameter Scale Training for Deep Learning Models // Tunji Ruwase
MLOps.community · intermediate hands-on
→ Train deep learning models at scale→ Optimize model training for performance
EP8: Training Models at Scale | AWS for AI Podcast
Amazon Web Services · advanced hands-on
→ Scale AI model training→ Optimize AI infrastructure for large models