Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Discover AI · Advanced ·🧬 Deep Learning ·3y ago

Skills: LLM Foundations90%LLM Engineering80%Fine-tuning LLMs80%Prompt Craft70%Prompting Basics60%

Key Takeaways

Pre-training BERT from scratch using PyTorch for company domain knowledge data, utilizing Hugging Face datasets and Transformer libraries, and fine-tuning the model for expert information retrieval systems.

Full Transcript

hello Community finally we are there pre-training a bird model for our expert extreme have a look at this article the inability to fine youe the models for company specific data may result in lower than expected performance in Ai and now we will address this problem we've going to pre-train a specific Bird model and you see normally we would have a huge pre-trained bird model with billions of sentences and then we just apply our little fine-tuned bird model with just 100,000 sentences and the main task was done on the pre-trained bird model but who needs all of Wikipedia all of the news of 2017 to 2019 we are going to do now the top 500 R&D projects that will finish in 2023 all discipline of science the most Innovative project I want to see where is EU science heading in 202 23 and we will build a pre-trained bird extreme model just 50k sentences and we will build our expert information retrieval system hello Community now is finally the time we going to do the pre-training of a birt model right from the beginning because we have domain knowledge and it is about outstanding EU R&D project that will finish in 2023 so I want to know and I want to build here a special expert information rral system but not a lexical search but a neural search engine to know especially here about R&D results that will come up in 2023 so here we go if you want to follow along there is a collab notebook for you I leave you the link in the description and I follow here www. python. to have an article and I saw this is the most simplest article I could F okay here we go we install our hugging face data sets and we install our Transformer libraries and then we yeah import everything is not perfect but just follow the example what is nice is on hugging face you have something called a data set and a data set is a library and they have close I don't know hundreds and thousands of data sets already available for you just to download and this is something beautiful let me show you you just say from data sets import the list of the data sets and they have 16,849 different data sets but it is beautiful right now what we are going to do is we are going to download a specific data set it's called CC news and this is just a data set I just want to show you how easy it is to download but as you can see it takes some sometimes because it has millions of sentences but the good thing is we don't have to wait for this because we are going to use a complete different data set let me just show you so let's stop here and let's go on here because an alternative to the public dat data faes data sets from hugging phase is you have your own data set and I have my own data set I have here on my Google Drive I have a specific csw file I just copy it over here in my local directory and then I say if you have no this hugging face data set and you have it in the format CSV or Json or Json lines or text or par which par would recommend you should be able to use the command load data set and then you simply give here the format of the data set you have the path to your data files and this is it and this is exactly what we are going to do and then I will show you yes yes yes beautiful data set is downloaded that we do now a splitting in a training data set and a test data set now just for demonstration purposes I choose here the test data set to be extremely small 001 so I have about 40,000 elements in my data set and my other my uh test data set will just have 40 examples so it is just for demonstration purposes normally I would recommend 90% you go with the training data and about 10% you go with the test data set but this is just for demonstration purposes now we have now for our tokenization very easy what we do what we do what we do Jesus the there's some things in here that do not belong at all in here I'm shocked so what we do we split now our huging face data set format very simply in two text files we have a training uh text file and a test text file this is great now if you already have this and you want to create the data set there's something beautiful that there called line by line Text data set and you just prepare tokenizer you give the file path to your text data set and Define a block size and you can create your specific data set now we are going to go on since we have now here our specific training text file provided and in there are all my sentences for my special EU projects so now what we're going to do we follow along here from python. comom by us see this is not as beautiful as I showed you last time with the tokenization this is the let's call it the quick urity rout but here we go you remember normally for the tokenizer you should have the normalization the cleaning of the data then you have the pre-processing then you choose a specific model and then you have a postprocessing now you can do it quick and dirty that you say hey I have some special tokens I have a vocabulary size normally with bird it is 30k I go here specifically with 20K and the maximum sequence length of your input normally 512 or uh 1024 I go for demonstration purpose is only with 256 please go at least with 512 if you do it for a company and then truncate or not if you have extra long sentences that go be Beyond 256 tokens should it be truncated yes or no and for demonstration purposes I say yes cut it off because we do not have such a check such extra long sentences beautiful so what we take now is our tokenizer and you know we have a word piece tokenizer and then we just train our tokenizer the files we have just defined vocabulary size we have set our special tokens we have defined here so quick and dirty version is do it now start the training of the tokenization yes and if I do not execute the cell I should not be amazed that sometimes the system doesn't know what I'm coding so here we go training is done and my model P yes I have here directory and I I say hey please save now this to my pre pre-trained bird so let's have a look at this update my pre-trained birth here's our directory subdirectory and there we have our vocabulary file let's have a look at the vocabulary file here we have our special tokens beautiful here we have our characters and then we're in bird you can see with this beautiful bird prefix this is bird great and we should have yes you're absolutely right how far down can we go 20,000 this is exactly the size of my vocabulary if you do it for a company knowledge please go with 50,000 60,000 but I want to be fast I want to speed up the process just for demonstration this is where you can optimize it beautiful so defining some of the tokenizer configuration file this is easy we have done it already lower case is true we have defined our token and the maximum length Max maimum sequence language defined you can Define of course here the conf tokenizer configuration file right so now it's interesting now you have more or less two option remember when I told you the tokenizer we have now here the tokenizer and we have save model myof we have the config Json and the vocabulary now the config Json this here let's have a look at this this is a rather unimpressive because this is really just our specific tokens you have the maximum length beautiful so there's nothing specific hidden in there now last time I showed you if you want to have the fast implementation the rest implementation that we simply we have our Transformers then we say bird tokenizer fast and we take the tokenizer that we just defined this is now our new tokenizer that we have just have a look at it that it's true we have a Transformer model bird we have a tokenization bird fast and this is a bird tokenizer fast great now we're going to say okay please save it and now we have chosen a different Model A different path a different directory and we go with saved model so here we go saved model yeah just update anyway and here we have now four vocabul four files we have our vocabulary file you're not going to believe it should be absolutely the same as before yes yes yes always it be limited to oops come on to 20,000 yes here we are translator this sounds very reasonable then we have the token tokenizer config this is now in the form you know from my last one and thank you and the special token map yes this is nothing you know this so this is more or less redundant beautiful what else yeah we can check if we really have the fast rust optimized tokenizer it is true and if you look at the tokenizer now in detail you will see summary uh what cap size is 20,000 tokens it is fast is true the padding is right the tration special token unknown separation padding token the start token and the mosque token because we are going to train our pre-train our Bird model on a mask language model so beautiful so here we have now our new tokenizer in the official cab notebook I give you the link here in the description they have a different way they say okay we take our tokenizer fast from pre-trained and my model paff so the model PFF is here my pre- birth so we have the vocabulary and the config Json and they create this now here at tokenizer and what I wanted to show you here that you have here in this way more information available because you are using here a pre-train statement so what you have they're loading everything that we have and then they say model config bird configuration now we are just here on the level of the tokenizer so should not because we have here from pre-trained now it accesses some pre-training but is is I think this is not the optimal way but okay anyway because you see we have here our token we have our lower case but then we also have now from the bird architecture the hidden size for example then we again have here the mask token that we need but this is more more or less uh based here in our tokenization we have now maximum position in badings which is an bird architecture thing then again we have here token token token and vocabulary size so we do have a little bit of a mixture okay they go quick fast and dirty here b toiz a fast for pre-train I prefer my classical method but whatever method you can go no no problem at all okay so beautiful let's make some place here okay now comes the point we have to encode it and in code is now we have our 500 project description I don't know anymore and we have now to encode our sentences into this beautiful tensor now there are two ways you can encode it with tration where you say after 256 tokens cut and I forget the rest of the sentence or you can say hey I want to encode it without any truncation and you see here the mapping function to tokenize the sentences part with and without tration is a little bit different what we do we apply the tokenizer of course to our text and then either we say truncation is true or not we have a padding to the max length if we have some truncation and of course we want to have our special token mask returned so this is what you are familiar with either remove the columns yeah this is beautiful and then here we have and if you want some G hugging face some beautiful Deep dive into language modeling uh more or less what it is all about it is the main data processing function then will concatenate all text from your data sets and generate chunks of Maximum sequence length so nothing special about this this is a very simple code with say yes beautiful let just show me the result again my training data set remember had 39,000 close to 40K and my test data set was extremely small just with 40 yeah to training data set and you have a look at the training data set beautiful you see here we have the features we have our text we have our input IDs our token type IDs our attention mask that we need to differentiate between padding token and and real token we have a special token mask and the number of rows is here 40K great so this was the tokenizer configuration you might see beautiful now now finally we are coming to our Bird model we can now design our Bird model congratulation and we have in hugging face some classif find for us then this is the beautiful class of transformer. birth configuration so here here we have everything we need we can Define our vocabulary size we can Define the hidden size the number of hidden layers of bird the number of attention had in the bird layers the intermediate size the hidden activation the dropout rate the attention Dropout probability the maximum position embedding whatever you want we have it here I have here the link for you hugging face Transformer this is bird config have a look at this there is where the magic happens again what is the hidden size simply dimensionality of the encoder layers and the pooling layer the number of hidden layers default to 12 we can set it 24 whatever you like it's the number of the Hidden layers in the Transformer encoder the encoder stack that we have that is important the number of attention at also default to 12 is simply the number of attention had for each attention layer in the Transformer encoder beautiful intermediate size is the dimensionality of the feed forward layer hidden Act is some nonlinear activation function and you can choose between gilo Rao Swiss or gilo new hey gilo new I don't know new okay so there is all the magic happening and we're going to do this now so we're going to say we initialize now the bird model with some configuration we say okay our vocab size is exactly the vocabulary size of our tokenizer maximum position embedding has a maximum length also defined with our tokenizer and then you can go on and say hey number of hidden layers and a number of attention add and whatever we go with default with 12 you see 12 12 we go with the minimum configuration because I just want to show you how it works so we Define the bird configuration file no yes yes yes mod configuration file let's have a look at our configuration that you see what we have created beautiful here we go Bird model configuration F the design of our Bird model we have where are we the number of hidden layers is 12 the number of attention head is 12 uh model type is a bird model yes uh maximum uh position is 256 uh vocabulary size I reduced because I have just just only 500 EU research projects and I don't need some vocabulary about American Presidents or whatever about new cycle or whatever so I go with 20K please try it out for you specific play around with this a little bit gives me the actual Transformer version what I have intermediate size yeah the hidden size what classifier drop out attention drop about you can set it this is our bird configuration file and you're not going to believe it we use this bird configuration file now for our Bird model but we are now doing the pre-training of the bird model so we have nothing from pre-training that we can load neither the architecture which is now defined in our config file nor the weights normally when you have bird from pre-trained you also load the weights from the hugging face model this is not going to happen here because we are starting there are no weights available so beautiful so the pre-training of the bird model originally in 2019 was in two hats on two tasks it was about The Masque language model and the next sentence prediction I leave you the link to the original archive pre-print in the description so what we are going to do we are not going to train it on MLM and NSP but experience has shown if you train it on MLM alone on mosque language modeling so when you mask out words in our sentence that this does the main job of training free training so beautiful and there is and you're not going to believe it there is already a class for us here and it is called bird for mask language modeling and the bird configuration file and this defines our model if we start with pre-training you will say isn't this beautiful yes it is I know you are amazed so yeah just that you see normally what we do is we have a bird model then we say from pre-trained and we take a bird base model all this is not available we do not download anything from huging face we Define the bird architecture that we want to have 24 layers 256 layers whatever you can afford in compute time and we going to start right right from scratch right so yeah if you want to have a look at this yes let's look at this why not here bird for mask LM this is a Transformer class that is defined for you here are the all the different parameters you can have a look at you can play you can optimize whatever you want to do everything is here for you to play around and to optimize your model talking about your model so what we have this is now our model model maybe I need some more place here on the screen let's start from the beginning very beginning no here we go so we have a bird for mosque language model so here we go at first we have our embedding beautiful and then we start with our encoder you're not going to believe it and we have self attention yippy y so first bird layer and then second bird layer and then we go up 11 where is it 9 10 11 11 bir layer beautiful so and then output beautiful and then we have here this additional training hat we do the pre-training of bird on the mque language model and this here is now our training app as you can see it's more or less a simple linear structure with an activation function here we have gilo beautiful yes you know everything about here I don't have to tell you anything about it yeah just a reminder the original bird was trained on two hats and if you want to do this and train it on two hat so next sentence prediction at MLM there is also a specific predefined class for you on hugging face you can download and it is uh class Transformer bird for pre-training config and they have here a bird model with two hats already programmed for you you just have to execute the command great so now that we know we have mask language modeling a specific head how we do the masking my God how do we have to code this no everything has been taken care of it's beautiful so how we do the masking for our first task and there are something maybe you know it it's called data collators and they object that will form a badge by using a list of data sets inputs and they are beautiful and some of them and you're not going to believe it data collator for language modeling yes it exists also applies some random data augmentation like random masking on our formed batch Now isn't this a coincidence that we find this here if you want to find out about it here is the link hugging phas data collator and you get every information about this and as you can see here we go here data for language modeling this are here our specific class that we're going to use purchase I know I know so yeah I leave you the link in the description you can have a look at this in detail we have a tokenizer we have the masking task you can now Define the amount of masking normally was 15% some say 18 sometimes go 20% let's go with 20% I mean this is really up to you to choose the amount of mking of course it has some uh effect on a but also compute time so what we get back what what kind of object is this well it has all our information that we have it has our vocabulary size it know where it is we have uh rust optimized we have our mask our padding our separation our unknown everything else we have our probability for the masking great so we have generated now here our data cleator which does the asking for us if you have multiple gpus please install accelerate if not like we are here working on a free Google collab notebook with just one CPU uh GPU we go now into training oraments this is hell they are training argents my goodness I have here the most important you should know about my model PFF I think it's my model puff oh model puff just hold on just hold on where am I toiz [Music] fast my model puff yes here we go here we go my model PFF yeah now I recognize it beautiful there's a simply put evaluation yes we allr it the number of training EPO you noce if you're rich you can afford more pair device training batch size 16 32 let's try 32 I don't know if it works credum accumulation per device batch size logging steps I have put it really down because as I will show you it takes quite sometimes and I want to see some results very soon I save it every 100 steps you have autof find the perfect batch size you can activate you can load the best model at the end of course you have can Define the total uh uh limits of model you want to save everything you know this beautiful let's define now our training argument for the pre-training of bird our training parameters default values will change in version five yes beautiful just to have a look at the complete set wait a second so here we go if you really want to optimize there are Adam for the Adam Optimizer you have beta 1 beta 2 your Epsilon you can have here Define a seed number of workers disable evaluation delay stop cheesus whatever you can think of you have here what you logging DMI runs December hey this is today I'm giving away when I'm doing this oh wow wow wow no C yes optimization you can push it to the hugging face Hub so the other people can use it save strategy you can use of course a TPU of course no problem the warm up ratio everything okay so but here we go now now this is almost the most important line because now we have we train we pre-train our Bird model so we have our trainer we have defined our model we have defined our arguments for the training we have have defined our data collator with the dynamic masking we have a training data set and we have a test data set and you see this is it yeah this is it congratulation and then comes the moment where trainer. train beautiful so as you can see now here we go yes yes yes I don't need this have a look at this and here we go just waiting for the first time estimation that I have the minimum amount of vocabulary 20K is is minimum yeah no way you can go below 20K I have the number of epo limited to just 10 yeah you need I don't know 1,000 10,000 100,000 EPO to whe you need a lot of data the the pre-processed pre-trained model model you can download from hugging face they have been trained on millions of sentences for days and weeks on multiple tpus so the more data you have the deeper your bird model is the higher the dimensionality the better of course is your model performance but this is just here for demonstration I don't know if you have a GPU or maybe you have two gpus in your local system or you can say hey I have so much money I can go to AWS and I can spin up a GPU cluster with I don't know 10 1224 gpus great do it it is unbelievable it is fantastic but just to show you if we do it here on a free Google collab notebook with a GPU run time GPU yes I have a GPU activated you see here if even in my minimum configuration it is 4 hours and 41 minutes and I just want to let it run for some minutes to show you where we are ah here we have 10 steps wow so our training loss is 9. one so high and validation loss is 86 so therefore that we have such a limited validation set of sentences this is not bad at all but of course this is just the very first one so I would say uh let it run for 10 15 minutes let's have a look if we go in the right direction and then I would say this is it for the for the pre-training of our Bird model of your bird model that you know now how to do the pre-training of a very specific Bird model on your specific data set choose a scientific data set choose a legal data set choose a whatever you like data set you can really optimize your system on if you work for companies if you work for clients on the client data sets but you have to be aware of that the pre-training this is really where the system gets the main knowledge before you can fine tune it the pre-training is really intense it is time intense it is resource intense if you have at home a GPU as you can see maybe you can afford at home to let it run for 10 12 hours minimum maybe even for 2 3 days that you get even with a limited data set some good results here is really the training time the amount of training you invest here this is the critical element for the accuracy of the model so I would say let's wait some minutes and I will be back with you in a second so and after about close to 10 minutes you see here we have now 50 steps the first 50 step and the training loss if you look at the training loss 91 85 81 7976 yeah it's going in the right direction I mean it means nothing because we have to do the training but at least it's not diverging and validation loss you're not going to believe this 86 81 80 7674 yes beautiful it looks it could be converging to our our solution but as you can see would take about 5 hours here in our simplest version on this free GPU I had a question from one of my viewers if you want to recreate the original Bird model the training what would it cost uh I can tell you I've read an article if you do the birth training in the traditional form on the Wikipedia and the English book Corpus on AWS and it is now at the end of 22 beginning of 2023 it would cost you between $1,800 and $2,000 compute time on AWS I have not done it this is just an information I received so just to give you an idea if you want to do the classical Bird model for Wikipedia so for everything if you want to have a very general approach including the news and whatever blah blah blah it cost you about $2,000 on AWS as a rough estimation please do not quote me on this this is just to give you a guideline if of course you have a GPU or a TPU system at home and you can use it and you just have to pay the electricity bill well even better hey step number 60 yes it's still going in the right direction beautiful I think we should stop the demonstration here who knows what's going to happen with the system I hope you enjoyed it a little bit we have now a system a model that is now trained in about 5 hours time and that you can use now as your specific bird extreme model trained on here EU research that will be published in 2023 we have here the project description we know what are the topics that will emerge in 2023 on ond in Europe and I hope you enjoyed it a little bit I'll leave you the links to all all of what I told you in the description of this video and I hope to see you in my next video so just to finish up if we have now a bird model pre-trained on European research topics you just duplicate it you remember in eser our sentence Transformers we have two identical bird models with some connected weights and then you just train your espert model there are just three lines of code to build your eser model from your bird model and you train your expert model let's say on the American research and development models that will be executing in 2023 or 2024 and you compare now then the European R&D sector to the American R&D sector in your specific domain area and with your domain knowledge so you have created an expert model specifically for some item no domain specific knowledge and you can build now an information retrieval system a neural system that is not a lexical system but this is a semantic neural Information Network congratulations

Original Description

We pretrain a BERT (Bidirectional Encoder Representations from Transformers) model from scratch in PyTorch, on domain specific data (eg confidential company data). We code in Python to train an optimized Tokenizer for our data, design a BERT architecture from scratch and start pre-training of BERT with a masked Language Model Head (MLM). We define the vocabulary size according to our needs (from 8K to 60K), define the depth of our BERT architecture (eg 96 layers) and train days on (a single) GPU for our domain specific knowledge encoding. BERT :: Bidirectional Encoder Representations from Transformers is a transformer-based machine learning technique for natural language processing. With an advanced BERT model (pre-trained on our special texts) we can then build a SBERT model (Sentence Transformers) for a Neural Information Retrieval (IR) system. official Links to my sources (all rights with them): https://www.thepythoncode.com/article/pretraining-bert-huggingface-transformers-in-python !! COLAB to follow along: https://colab.research.google.com/drive/1An1VNpKKMRVrwcdQQNSe7Omh_fl2Gj-2?usp=sharing #sbert #ai #naturallanguageprocessing

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Discover AI · Discover AI · 11 of 60

← Previous Next →

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Create a Smarter Future!

Create a Smarter Future!

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

Discover Vision Transformer (ViT) Tech in 2023

Discover Vision Transformer (ViT) Tech in 2023

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

Microsoft and ChatGPU

Microsoft and ChatGPU

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

ChatGPT - Can it Lie to you?

ChatGPT - Can it Lie to you?

ChatGPT Alternative: Perplexity by Perplexity.AI

ChatGPT Alternative: Perplexity by Perplexity.AI

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

New TECH: Vision Transformer 2023 on Image Classification | AI

New TECH: Vision Transformer 2023 on Image Classification | AI

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT loses its mind

New BING ChatGPT loses its mind

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

New BING Chat AGGRESSIVE

New BING Chat AGGRESSIVE

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Microsoft's CEO in Trouble #shorts

Microsoft's CEO in Trouble #shorts

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

ChatGPT polarizes

ChatGPT polarizes

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

ChatGPT: Multidimensional Prompts

ChatGPT: Multidimensional Prompts

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

This video teaches how to pre-train BERT from scratch using PyTorch for company domain knowledge data, and fine-tune the model for expert information retrieval systems. The video covers the basics of pre-training BERT, designing a BERT architecture from scratch, and training a word piece tokenizer with vocabulary size and special tokens.

Key Takeaways

Install Hugging Face datasets and Transformer libraries
Import necessary libraries
Download a data set called CC news
Use a CSV file from Google Drive
Split the data set into training and test data sets
Define a custom tokenizer with a vocabulary size of 20,000 tokens
Pre-train BERT from scratch using PyTorch
Encode sentences into tensors with or without truncation using the custom tokenizer
Create a Bird model configuration file with 12 hidden layers, 12 attention heads, and a vocabulary size of 20K
Set the maximum position embedding to 256

💡 Pre-training BERT from scratch can be done using PyTorch and Hugging Face datasets, and fine-tuning the model can improve its performance on domain-specific data.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train