From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

Discover AI · Beginner ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations90%Prompt Craft60%

Key Takeaways

The video discusses the evolution of the T5 model to T5X, a game-changing technology that combines the power of JAX and FLAX, and its application in Google search, with a code implementation for T5X using Google Collab notebooks and TPUs.

Full Transcript

hello Community let's talk about Transformer based language model and we are at the current state of affairs we look at the t5x model and of course the Palm model by Google and we look at the last video where I showed you that bird and jpt are more or less the encoder stack of the Transformer architecture this is bird and the decoder stack of the Transformer architecture which became TPT by open AI now GPD as being just only directional we found out that we can use the decoder stack of the Transformer as a language model so a model trained purely for the task of next step word prediction rate burden on the head is bi-directional it is an anchored only stack and it is great at doing the single prediction period input token so great for classification tasks currently if you ask we have T5 and this is a multi-task unified model or short mum and this is the technology that powers Google search today if you believe current research publication of course to be specific since the mid of 2022 you have t5x you have the text to text transfer Transformer in Jacks and in Flags so utilizing the compute infrastructure by Google and if you want to know more about Jax I have a video on about Jacks in tensorflow and pytorch and if you want to know about the hardware configuration of h100 gpus by Nvidia or the A6 by Google I have also a video for this now in July 2020 we had the publication by Google about its T5 architecture and their ideas behind this in a pre-print publication they focused specifically on transfer learning in natural language processing and they say we often have this pre-training part on unsupervised learning and unlabeled data that we have so much data free from the internet and I examined this transfer learning model when they had a first pre-training on a data Rich task before then fine-tuning the system on a downstream task now actual Downstream task a question answering document summarization sentiment classification and their idea was to put everything in a text to text framework and I wanted to explore the limits of transfer learning and of course they used a transfer Transformer architecture this means more or less that they're used on the encoder stack also something like birds MLM modeling they had a drop out of 15 of the token in the input sequence and they even cared about to reduce the computational cost of pre-training their model but the great thing about Google is they open sourced it you have here the GitHub directory by Google research where they provide you the code and as you see it has been updated just two weeks ago and they even provide you here with a free Google collab notebook where you can experience T5 yourself of course not the full-fledged half trillion parameter model but the smaller model that fit within the Google collab notebook and here we are now here in our collab notebook from Google about T5 and as you can see it is about fine-tuning text to text transfer Transformers four closed book question answering and you have here all the code you'll see how you set it up on a TPU is some Easy Pathways then you have natural questions you have the code to code this in detail if you want I can make a video going with you step by step through the code but it is really easy to implement you have to transfer to new tasks they explain in detail to you how it is done how it is coded a very nice implementation an expected results you how you evaluate your model of course and they give you all this code to play around and then of course most important and predict functionality of the model and you have here your question that you can Define and you see the output here now on the T5 model of course since you're working here on a free Google collab notebook you cannot use the highest and the half trillion parameter model but also the smaller T5 implementations show you what you can achieve with the current and open source T5 code it is free to you for you to explore here on a free Google collab notebook so summarizing we can say t5x combines the pre-training and defined tuning for specific tasks it is pre-trained on multitask mixture before fine tuning for a specific task and Google itself claims it is 1 000 times more powerful than bird I can verify this but we have a question now what about the future of this and especially if you think about conversational AI tools like jet GPT now the answer is easy but before I give you the answer in my next video I want to show you that T5 also evolved in flan T5 I have two videos for you where I show you the code and the tuning of the hyper parameters I hope you enjoyed this and I see you in my next video

Original Description

After explaining BERT vs GPT (last video) we now examine current tech like Google's T5X (for Google search) and in my next video new PaLM: Pathways Language Model (if combined w/ RLHF -Reinforcement Learning with Human feedback). T5X = Google's T5 on JAX and FLAX. Plus Code implementation for T5X. my sources (all rights are with the corresponding authors): Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer https://arxiv.org/pdf/1910.10683.pdf SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing https://arxiv.org/pdf/1808.06226.pdf Illustrating Reinforcement Learning from Human Feedback (RLHF) https://github.com/huggingface/blog/blob/main/rlhf.md Fine-Tuning the Text-To-Text Transfer Transformer (T5) for Closed-Book Question Answering https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb#scrollTo=zSeyoqE7WMwu PaLM + RLHF - Pytorch https://github.com/lucidrains/PaLM-rlhf-pytorch #ai #t5 #chatgpt #reinforcementlearning

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Discover AI · Discover AI · 15 of 60

← Previous Next →

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Create a Smarter Future!

Create a Smarter Future!

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

Discover Vision Transformer (ViT) Tech in 2023

Discover Vision Transformer (ViT) Tech in 2023

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

Microsoft and ChatGPU

Microsoft and ChatGPU

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

ChatGPT - Can it Lie to you?

ChatGPT - Can it Lie to you?

ChatGPT Alternative: Perplexity by Perplexity.AI

ChatGPT Alternative: Perplexity by Perplexity.AI

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

New TECH: Vision Transformer 2023 on Image Classification | AI

New TECH: Vision Transformer 2023 on Image Classification | AI

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT loses its mind

New BING ChatGPT loses its mind

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

New BING Chat AGGRESSIVE

New BING Chat AGGRESSIVE

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Microsoft's CEO in Trouble #shorts

Microsoft's CEO in Trouble #shorts

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

ChatGPT polarizes

ChatGPT polarizes

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

ChatGPT: Multidimensional Prompts

ChatGPT: Multidimensional Prompts

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

The video teaches the evolution of the T5 model to T5X, its application in Google search, and how to implement T5X using Google Collab notebooks and TPUs, with a focus on transfer learning and multi-task unified models.

Key Takeaways

Understand the basics of Transformer-based language models
Learn about the T5 and T5X models
Implement T5X using JAX and FLAX
Fine-tune LLMs for specific tasks
Use Google Collab notebooks and TPUs for implementation
Explore the code and hyperparameter tuning for T5X

💡 T5X combines the power of pre-training and fine-tuning for specific tasks, making it 1,000 times more powerful than BERT according to Google.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

14x Cheaper AI: A Real-World LLM Distillation Case Study on Bedrock

Learn how to reduce AI operational costs by 14x using LLM distillation on AWS Bedrock

Enterprise Semantic Search services

Learn how Enterprise Semantic Search services use AI-driven search capabilities to provide accurate and personalized results, and how to leverage them for your organization

Enterprise Retrieval-Augmented Generation software

Learn how to leverage Enterprise Retrieval-Augmented Generation software for improved accuracy and efficiency in data-driven applications

One Anthropic Researcher's Prompt Changed How I Use AI Forever. Here's the Exact Template.

Learn to craft effective AI prompts by asking AI to show instead of tell, revolutionizing your AI usage

Dev.to · Yao Xiao

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)