TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

Discover AI · Advanced ·🧬 Deep Learning ·3y ago

Key Takeaways

This video demonstrates pre-training BERT from scratch using a Transformer model, fine-tuning the model, and running inference on text using Keras NLP and TensorFlow 2. It covers the entire process, from building the model to running inference tasks, and provides a comprehensive understanding of the techniques and tools used.

Full Transcript

hello Community today we build a company-specific Transformer architecture on company data so what we're gonna do we're gonna pre-train a Transformer from scratch we're gonna fine-tune this Transformer and finally I show you how to run inference so finally we can start coding we are here in a Jupiter uh colab notebook and as you can see we will pre-train a Transformer from scratch we will fine tune it and we will I show you how to do the inference so first step we will set up then we will pre-train our Transformer with the mask language model mask generator then we're gonna fine-tune the model on a downstream task we will use a classification task for our pre-trained word model and when we have a fine-tuned bird model we will do some model interference so great I would say let's just jump right into it runtime we need a GPU yes and here we go at first of course we need carrots and we need tensorflow and what I show you here with Keras NLP you know that Keras in itself is a high level API four tensorflow 2 and they have a toolbox and this Keras NLP is a toolbox for modular building blocks and you have pre-trained Studio of the art models and you can even go down to Transformer encoder layers and we will build our Bird model our encoder stack with those layers from Keras NLP and it's so fast and it is so beautiful and I have to show you this so we installed tensorflow we installed Cara's NLP it's running yes beautiful and of course we need data sets we need two data sets at first we have sst2 and then we have Wikipedia now either you download it if you have it somewhere on an S3 or best thing you'll go to hugging phase you go to data sets you put in ssd2 and you see you have a sentence and a label 0 means it's a negative connotation one a positive and here you have everything the Stanford sentiment tree bang is a corpus we fully labeled yes yes yes and the other one well Wikipedia you know Wikipedia but this one is a version Wikipedia 103 is 110 times larger than the other Wikipedia you can have 100 million tokens yes yes yes beautiful so on hugging face you have within the data set you find every data set that we need so beautiful we downloaded it and now we have it now I told you that we are operating here in tensorflow so we use here beautiful command tensorflow.data that optimizes between our CPU and GPU or TPU uh processing at first we have to have the pro pre-processing parameters we have an input sequence length of 128 tokens we have a masking rate of 25 you can put it to 18 you can put it to 21 whatever you prefer let's make it a little bit more complicated let's mask 25 percent and then of course you load your data with a beautiful tensorflow data pipeline then reload our CSV files great batch size is not defined yes of course because I did not execute this cell I am talking too much beautiful so this is done this is done and if you want to have a peek so catsing elements show me here you have your sentences beautiful now we do not know that need a Baseline and here we go now we have now to pre-train a naked Transformer there's nothing there are no weights we can have from pre from download from pre-trained no we pre-train our own transformer so what we need we need an additional layer of course and we have here the mask language modeling objective that we train it on and we will not train it on the NSP or the next sentence prediction so great now you have to know that in this toolbox of Keras NLP there is a beautiful tokenizer this is our WordPress tokenizer that we need for the bird model have a look at this and we need something that is called a mask language model mask generator I showed you in my last video when we did all of this in pi torch that we had also a specific function to generate here the masking of our sentence token for us now here we have the command Keras NLP layers dot mask language masks language model mask generator beautiful and then as I told you we will use here tensorflow data that we have here a pre-compute of each batch on a CPU and the GPU or our TPU will do the heavy lifting the training of the batch that came before so we have an optimal resource optimization of CPU and GPU with our tensorflow DOT data command I have a specific video for you just on this command and here we go now for the tokenizer you have a bird tokenizer or we go with the WordPress tokenizer this is the class command I just want to show you that you see here all the parameters you can choose strip the French Oxo yes no split it yes split here on Japanese or Korean letters yes so beautiful whatever you can do yes I don't have to show you the example because this is clear or if you want to have a T-Type string you can have this beautiful so now for the mask generator this is the class those are the parameters we're going to use the vocabulary size the selection rate a mask token ID but let's have a look at this when we really do the Computing and yeah I've got a question about racked tensor what is it just think about it as a variable length list so a list with variable elements that have variable length think about this variable link features are the set of actors in a movie or think about sentences that have a different length different number of words in the sentences those are batches of wearable length SQL inputs so this is a correct tensor so here we go of course we need a lot of pre-definition pre-processing parameters for the batch size of the pre-training we go with 128 the fine-tuning depending on your GPU maybe increase it 32 works fine for me on a free collab notebook the sequence length here is rather short so I checked we have just some short sentences in our data set but please increase this to 256 so 512 or 1024. this is depending on your data set the length of the tokens in the inner sentence that you have now for the bird model for our encoder stack the number of layers or the Transformer blocks as it is called if you think about it original Transformer architecture normally you would go with 12 or with 24 or even higher for demonstration purposes here I just put it to three I also reduce the dimensionality to 256 and the number of heads is here before for an operational example you have to increase those numbers significantly so number of layers I would go with 24 or higher Dimension 512 minimum dropout rate you can Define beautiful and then you have the learning rate for the pre-training and for the fine tuning the epoch in we spend on free training and the fine tuning Epoch we will have to fine tune our pre-trained system beautiful we defined all of this and here we go as I told you at first like in my video I showed you here on the pi torch system we need at first a tokenizer and have here from our Keras NLP toolbox the wordpiece tokenizer here's our parameter then we need the mask language model mask generator as a layer those are the parameters and then we are already done we Define now a function called preprocess whereas an input we have our tokenizer and then our masker that is applied to our inputs and the mask is very simple our mask generated a does the masking of the tokens in our sentences so what we get out of this is a feature where we have the token IDs and The Mask position plus we get the labels and the weights so what we have to return from this function is features labels weights and then we do here with pre-fetch and the TF data all to tune we say hey system or to tune yourself become the best possible configuration you can do this in tensorflow 2 very easily and we use also the prefetch command I have a specific video on this now what I want to show you what we get back we get back exactly as I showed you the features the features you know are token IDs and the most position so we have here our token IDs those are token ID disk and the second part of the feature is the mask position here or mask position and then as I told you we get the labels those are our labels and then of course in a tensor form and then we have here our weights so again just to make it absolutely clear we have two features our token IDs where some tokens have been replaced with our mask token and a mask position that keeps track which token in our sentence which sup word in our sentence we have mask and the system is challenged to find a corresponding correct token for the labels it is simply the IDS that we mask out so the system knows hey there is a mask when the other one are original tokens and there's an additional information because not all sequences will have the same number of masks we also keep a sample weight tensor remember this is our sample weight tensor here which removes the padded labels from our loss function by giving them the easiest way you can think zero weight beautiful so we have our input ready and now comes the most beautiful part of all we can build our Bird model our encoder stack and we do this of course with this high level API in the tool set called Keras NLP and they provide all the things that we need to do this easy so two things I want to point out to you we have here our token and position embedding layer this is the very first layer when the original input comes into our Bird model and then we can build our if you want transformer encoder layers here 4 6 12 layers simply with this command Harris nlp.layers.transformer encoder the bread and butter of the Transformer model we have the self-attention mechanism the feed forward Network this is beautiful this is done for us so we will use this we'll just have an idea here is the class definition Transformer encoder all the parameters you can use here we have the token and position embedding so the very first layer if you want in our bird system of course we have some normalization that brings the data to the same scale to reduce some variance in our data with all the parameters you can adjust to and then yes normalization and then we have a dropout rate which helps prevent against overfitting of our data and now we apply all of this now in our model so here we go at first we have here embed our tokens with a positional embedding so the very first layer if you want is here our token and position embedding beautiful so we apply this embedding layer the very first layer to our inputs and what we get is here an output of the very first layer unbelievable then we do some normalization and we do a dropout and now comes the beauty let's say we want to have here I don't know six Transformer blocks six encoder stacked layers where each layer has a self-attention mechanism and a feed forward Network as I showed you in my last video so we say for I in the range of a number of layers is the number of layers hold down a sack number of layers is 608 number of layers come on where's my number of layers number of layers is three oh yeah as told you it's just for demonstration purposes so you would have here 24 if you're operational number of layers number of layers yes yes yes yes yes here so for I from 0 to 3 we have here now our beautiful Transformer encoder layer we have here all the parameters that we have defined and we go here now our Keras model is defined by our inputs defined by our outputs here and we call it the encoder model because it is not a complete Transformer model but the encoder stack the bird model so and then we say okay let's do this and show me the summary of my model architecture model dim is not defined okay why is model dim not defined just hold on I can't believe that talking I do not execute don't need this we don't need this we don't need this and we need this yes so here we go this is now our model that we built our encoder model and you can have here a look at the layer structure and the free trainable parameters so first we have our input layer beautiful and then we have our token and position embedding layer and we have normalization we have defined our Dropout and now come our three layer our three Transformer blocks or in our encoder stack our encoder layers here we have them one two three one two three we have defined here the maximum sequence length and the dimension and this is our model and now easy we have now to pre-train the Transformer model and this is it we have as you know our mask head that we now put on top of our untrained Bird model and we do the pre-training now with our specific layer for mask language model this is the head so take about 1 hour and 20 minutes because as I showed you last time the pre-training is really really intensive and it takes a lot of time but the beauty is the fine tuning is short so here we go create a pre-training model by attack yes just started and then I show you so we have the input we have as I showed you our token IDs our features and our mask position great and then we take here from the input our token IDs and this is the input to our encoder model that we just built then what we receive you're not going to believe is acetic tokens of the model remember this is our encoder model here this is the model that we just built and now on this model that we have here our encoder model we put now an additional layer an additional head that we use now for the pre-training and this is here our Command for this mask language model head rate and here you have our encoded tokens that we just encoded here with our encoder model and our features that we have here the input the mouse position we defined the output beautiful and then we say here we go this is our model command keras.model we have defined inputs we have outputs and now we compile the model beautiful you know the Adam W optimizer kit compile is true if you have seen one of my last videos you'll know what it does beautiful and then we have here the pre-training of the model and of course we choose the mighty Wikipedia Text data set that we want to do the pre-training on and here we go pre-training underscore model dot fit and here we have our pre-training data set the validation data set and the number of epochs we want to train our model on and if this is done we say beautiful they save this now for further fine tuning and the command I've got a lot of questions from you how can I save the complete model well it's easy you say model dot save and then you give here the path to a directory on your disk or here where you want your oops model to be saved yeah it's still calculating and it takes about as I have written here about an hour and 20 minutes since I do not have the time I did already at pre-check so I just load the result from yesterday so beautiful here we go I've not connected and now it just say okay I know that I have my model on my Google Drive and it is called the directory encoder model and now I wanna download the encoder model here to our operational directory in this free call up notebook it is done let's have a look we should have now a encoder model directory and if you look now into this directory you see we have here a saved Model A metadata assets variable we have here the whole model now on our disk so and then easily you see model load model and you give the path to the directory this is it and we have it back now as I showed you we have now finished the the pre-training and now we end up fine tuning and now we do after let's do this here after pre-training we now fine-tune the model on our SSG to a data set this is a data set where we have a qualifier zero and one for good and bad we have this data set and we want to train now our Transformer our Bird model on a classification task we fine-tune it now after the pre-training on a classification task to boost our performance on this Downstream task so what we need and now this is so beautiful uh it is now much simpler because pre-processing for a fine tuning is now much simpler because we just tokenize our input sentences so we have here defined the preprocess now in a function we have just the tokenizer of our sentences and you know that labels are already available the labels here you see Zero for negative and one for positive and this is the sentence and we just tokenize now all sentences from our ssd2 data set as easy as this is it then we use the prefetch command to pre-compute the pre-process batches on the fly on our CPU like I showed you before beautiful you get the tensor and now comes the beautiful part so now yeah at first we reload our model from disk let's say we have done this compilation yes the data compiler so we now say Okay Karis dot model dot load model wherever your directory is and you know we have our directory here encoder model so I load it here now from my directory beautiful then we take this input of course our tokenized input we have here now we encode our model our encoder model with the inputs and then comes the nice part here and this is now the pooling layer in one of my last videos showed you the structure of the fine tuning that's going to happen we just add an additional Keras layer and a layer to the bird model with a specific pooling layer and we have here the global average pooling 1D command from Karis where we pull our tokens that we have now at the end of the last layer of our bird system and then we have very simple we predict an output label here with an activation function whatever you like value or whatever then we have our model our inputs and outputs from the carers model and as before next step is we compile the model and we train the model well we know to train we have here the fine tune data set and the fine-tuned validation data set and the number of epochs for our fine tuning and as you can see here we are now in our last epoth and we have just eight seconds left and we are finished in two one zero seconds done yes beautiful you see here that our loss went down from dot 4.26.19 our cures increased from.80.89.92 our validation yes everything you know this and this is beautiful now we have a fine-tuned model and now I want to show you something when you save this model you can just save the model in itself but there is something very special I would like to point out to you that we have a fine-tuned model and you can now save it with our tokenization layer and this is really an advantage you have with can with tensorflow or with cars NLP that all the pre-processing is done in a graph and so you can make it possible to save and restore the model that can directly run on the inference task on some raw text so we do not save only the model but we can save the tokenization so whenever we do the inference task we do not have to imply any tokenizer anymore it is all within our model this is the Beauty and the advantage you have here if you work in tensorflow against pytorch and of course uh filming this video in January 2023 tensorflow is still the most industrial application that is implemented in industry and you can see there is definitely an advantage here if you integrate this so let me show you this here if we now look here in our directory we have now here an encoder model and now new is we have the final model and again we have the saved final model the metadata fingerprint variable assets everything so here you can see now you have here your pre-trained model save to disk you have here your final model available and now let's say we want this to happen yesterday and now it's a new day and we'll do now the model inference task you remember I showed you we do here three tasks after fine tuning we do the interference task we apply now our pre-trained and fine-tuned model to a particular application so here we go so let's say we start new so the First Command you would use is of course keras.model load the model from our path final model it's here do not compile it because it's already done and we have restored our model beautiful and then yeah it's do this and then we start with the inference task and then we have some input and we have two sentences the first sentence is a negative sentence remember terrible no go trash this is a movie review if you want and the second sentence is a positive so we have negative and positive and you remember that one indicates positive and zero indicates negative so what we expect now here when we run the inference task that since the first sentence is negative we get zero and the second sentence is positive we get a one so let's do this let's reload the model here our final model with all the assets and then run it on an untotenized input we have the pure human sentences the human text since we already saved here with our tokenizer here in tensorflow 2. so let's do this let's execute this it reloads the complete model and we run the interference and here our result interference of our pre-trained and fine-tune part model is zero and yeah I'd say one so Sarah as I showed you is negative so the first sentence is zero is negative and the second sentence as you can see here is dot 999 so let's say is one only one indicates a positive sentiment and indeed the sentence is positive so system is working it was of course just a demonstration but if you have now your client data set or your private data set and you build you a Transformer model from scratch just on your data set you do not need any political news or any small talk you are highly focused you want to have an efficient system on your explicitly only on your data set you can build your own encoder model you can build your own fine tuning of the pre-trained model and you have a model ready to deploy whenever you want yes yes yes there's a modular approach to NLP model building and really carers NLP if you are familiar with Keras which is in itself a high level API to tensorflow 2. with this Karas NLP toolbox it is really really easy to build and pre-train your model to fine-tune your model and to run interference tasks with your personal optimized model I hope you enjoyed this a little bit here are all the data for you if you want to run this again I'll leave you the link to some other resources on the internet if you want to have a deep dive in certain commands or in certain layer normalization commands it is easy I hope you enjoyed it and I see you in my next video

Original Description

TF2 KERAS Transformer pre train: we pre-train BERT from scratch, on a company or domain specific dataset, then fine-tune the model (BERT) and run an inference task on our (domain specific pre-trained and fine-tuned) BERT model. All steps in real time. KERAS NLP. In my other video we coded the pre-training of a BERT model for SBERT in PyTorch, today in Tensorflow TF2, more specific in KERAS, with the toolbox KerasNLP for NLP. We build the complete BERT transformer model in code and pre-train the model. 00:00 Build a Transformer (BERT) from scratch 16:30 Code to pre-train a Transformer (BERT model) 20:41 Code to fine-tuning a unique BERT model 26:19 Code BERT model inference on plain text Video transitions are provided by Canva Professional. #naturallanguageprocessing #datascience #finetune #pretrain #trending
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Discover AI · Discover AI · 21 of 60

1 Step Into the Unknown (by YouChat) - May 2023 be your best year yet
Step Into the Unknown (by YouChat) - May 2023 be your best year yet
Discover AI
2 Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!
Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!
Discover AI
3 Create a Smarter Future!
Create a Smarter Future!
Discover AI
4 The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers
The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers
Discover AI
5 Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models
Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models
Discover AI
6 Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business
Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business
Discover AI
7 Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D   (SBERT 48)
Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)
Discover AI
8 Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey!  (SBERT 49)
Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)
Discover AI
9 SBERT Extreme 3D: Train a BERT Tokenizer  on your (scientific) Domain Knowledge  (SBERT 50)
SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)
Discover AI
10 Discover Vision Transformer (ViT) Tech in 2023
Discover Vision Transformer (ViT) Tech in 2023
Discover AI
11 Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)
Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)
Discover AI
12 Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI
Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI
Discover AI
13 BERT and GPT in Language Models like ChatGPT or BLOOM |  EASY Tutorial on Large Language Models LLM
BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM
Discover AI
14 Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source)  #shorts
Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts
Discover AI
15 From T5 to T5X: A Game-Changing Evolution with JAX & FLAX
From T5 to T5X: A Game-Changing Evolution with JAX & FLAX
Discover AI
16 How to start with ChatGPT?  | Short Introduction to OpenAI API #shorts
How to start with ChatGPT? | Short Introduction to OpenAI API #shorts
Discover AI
17 The Future of Conversational AI? Google's PaLM w/ RLHF  | LLM ChatGPT Competitor
The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor
Discover AI
18 Microsoft and ChatGPU
Microsoft and ChatGPU
Discover AI
19 From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch
From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch
Discover AI
20 Google's 2nd Answer to "BING ChatGPT":  Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI
Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI
Discover AI
TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP
TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP
Discover AI
22 3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer
3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer
Discover AI
23 FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!
FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!
Discover AI
24 ChatGPT - Can it Lie to you?
ChatGPT - Can it Lie to you?
Discover AI
25 ChatGPT Alternative: Perplexity by Perplexity.AI
ChatGPT Alternative: Perplexity by Perplexity.AI
Discover AI
26 2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2
2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2
Discover AI
27 Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING
Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING
Discover AI
28 BLOOM 176B Inference on AWS  | Bigger than GPT-3 for more Power!
BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!
Discover AI
29 Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings?  My own ChatGPT? | Visual Q+A
Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A
Discover AI
30 Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!
Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!
Discover AI
31 After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?
After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?
Discover AI
32 Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT
Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT
Discover AI
33 Fine-tune ChatGPT w/  in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct
Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct
Discover AI
34 The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE
The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE
Discover AI
35 New TECH: Vision Transformer 2023 on Image Classification | AI
New TECH: Vision Transformer 2023 on Image Classification | AI
Discover AI
36 PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned  | AI  Tech
PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech
Discover AI
37 New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!
New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!
Discover AI
38 New BING ChatGPT loses its mind
New BING ChatGPT loses its mind
Discover AI
39 Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)
Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)
Discover AI
40 Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI
Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI
Discover AI
41 Microsoft strongly restricts access to ChatGPT on new BING - WHY?
Microsoft strongly restricts access to ChatGPT on new BING - WHY?
Discover AI
42 PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)
PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)
Discover AI
43 New BING Chat AGGRESSIVE
New BING Chat AGGRESSIVE
Discover AI
44 Panoptic Image Segmentation: Mask2Former explained | Identify all objects!
Panoptic Image Segmentation: Mask2Former explained | Identify all objects!
Discover AI
45 Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial
Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial
Discover AI
46 Dream Job Alert: AI Prompt Engineer - $335K  |  AI Prompt Design: A Crash Course
Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course
Discover AI
47 Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide
Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide
Discover AI
48 Microsoft's CEO in Trouble   #shorts
Microsoft's CEO in Trouble #shorts
Discover AI
49 Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)
Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)
Discover AI
50 OpenAI's ChatGPT can NOW summarize external Sources on the Internet?
OpenAI's ChatGPT can NOW summarize external Sources on the Internet?
Discover AI
51 ChatGPT polarizes
ChatGPT polarizes
Discover AI
52 Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed
Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed
Discover AI
53 ChatGPT Prompt Engineering w/ in-context learning (ICL)  - 7 Examples | Tutorial
ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial
Discover AI
54 Chat with your Image!  BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)
Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)
Discover AI
55 ChatGPT:  Multidimensional Prompts
ChatGPT: Multidimensional Prompts
Discover AI
56 ChatGPT:  In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples
ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples
Discover AI
57 Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM
Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM
Discover AI
58 Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?
Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?
Discover AI
59 Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)
Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)
Discover AI
60 Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?
Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?
Discover AI

This video teaches how to pre-train BERT from scratch using a Transformer model, fine-tune the model, and run inference on text using Keras NLP and TensorFlow 2. It provides a comprehensive understanding of the techniques and tools used, and demonstrates how to build and train a model from scratch.

Key Takeaways
  1. Set up a Jupiter Colab notebook
  2. Pre-train a Transformer from scratch using mask language model mask generator
  3. Fine-tune the pre-trained Transformer on a classification task
  4. Download and load datasets from Hugging Face
  5. Pre-process data with input sequence length of 128 tokens and masking rate of 25%
  6. Build a BERT model from scratch using a Transformer model
  7. Use Keras NLP high-level API to build the model
  8. Apply token and position embedding layer as the first layer
  9. Apply normalization and dropout rate to prevent overfitting
💡 Pre-training BERT from scratch using a Transformer model can be done using Keras NLP and TensorFlow 2, and fine-tuning the model can be done using a custom tokenizer for inference tasks.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning

Chapters (4)

Build a Transformer (BERT) from scratch
16:30 Code to pre-train a Transformer (BERT model)
20:41 Code to fine-tuning a unique BERT model
26:19 Code BERT model inference on plain text
Up next
Image Classification with ml5.js
The Coding Train
Watch →