PostLN, PreLN and ResiDual Transformers

Machine Learning Studio · Advanced ·🧠 Large Language Models ·2y ago

Skills: LLM Foundations53%

About this lesson

PostLN Transformers suffer from unbalanced gradients, leading to unstable training due to vanishing or exploding gradients. Using a learning-rate Warmup stage is considered as a practical solution, but that also requires running more hyper-parameters, making the Transformers training more difficult. In this video, we will look at some alternatives to the PostLN Transformers, including PreLN Transformer, and the ResiDual, a Transformer with Double Residual Connections. References: 1. "On Layer Normalization in the Transformer Architecture", Xiong et al., (2020) 2. "Understanding the Difficulty of Training Transformers", Liu et al., (2020) 3. "ResiDual: Transformer with Dual Residual Connections", Xie et al., (2023) 4. "Learning Deep Transformer Models for Machine Translation", Wang et al., (2019)

Original Description

PostLN Transformers suffer from unbalanced gradients, leading to unstable training due to vanishing or exploding gradients. Using a learning-rate Warmup stage is considered as a practical solution, but that also requires running more hyper-parameters, making the Transformers training more difficult. In this video, we will look at some alternatives to the PostLN Transformers, including PreLN Transformer, and the ResiDual, a Transformer with Double Residual Connections. References: 1. "On Layer Normalization in the Transformer Architecture", Xiong et al., (2020) 2. "Understanding the Difficulty of Training Transformers", Liu et al., (2020) 3. "ResiDual: Transformer with Dual Residual Connections", Xie et al., (2023) 4. "Learning Deep Transformer Models for Machine Translation", Wang et al., (2019)

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

How We Translate 300-Page Books Using Claude Without Hitting Token Limits

Learn how to translate long documents using Claude without hitting token limits by breaking them into overlapping chunks

Dev.to · 龚旭东

Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking

Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve model performance

Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking

Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve LLM performance

A simple way to test model fallbacks with RouterBase

Learn to test model fallbacks with RouterBase using a simple fallback wrapper and OpenAI-compatible API surface

Dev.to · routerbasecom

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)