Deploy and Use any Open Source LLMs using RunPod

AI Anytime · Beginner ·🧠 Large Language Models ·2y ago

Key Takeaways

This video tutorial demonstrates how to deploy and use open-source Large Language Models (LLMs) using RunPod, a powerful GPU service, covering topics such as model selection, fine-tuning, and serverless deployment. The tutorial utilizes various tools including Hugging Face, Gradio, and LangChain to deploy and customize LLMs like Mistral 7B, Eliza Japanese model, and Dogs.

Full Transcript

hello everyone welcome to AI anytime channel in this video we're going to see that how we can deploy a large language model on runpod so if you don't know what runpod is it's a GPU provider you can also call it a GPU rental provider to be more specific you know that basically helps you get started or up and running with the large language models basically know when you don't if you don't know how to work with large language model or how to deploy an nlm sometimes it's very difficult you know to uh select an OP Source model and then try to configure and deploy it on clouds like AWS salemaker a machine learning platform or the gcp verx AI now what Runo does is that it's easy to set up know it's really affordable if you look at the GPU rate uh I think it's uh among top two or top three in the world World there are other competitors of runport like Lambda Labs was Ai and Sage maker and things like that right by run pod is the one of the most affordable GPU rental provider so you can you know uh spin up a GPU of your choice and you can work with large language model from serverless deployment where you don't have to worry about the servers and thing they they will set it up everything for you and you can just you know use spin of a template and just work with that the other is also fine-tuning runport also support through these different tools you know text generation inference uaba and lot of other you know uh tools that you can use unslot llama Factory uh and things like that right specifically for deploying and fine tuning large language model there are different tools for both the purposes and runp helps you with all it's one of my favorite uh uh provider as well for gpus I you know for my fine tuning training uh I basically use runport I'll show you how easy it is to you know deploy large language model and you can just play around it for your use cases you know you can just take a ug case you can experiment with it uh just in 5 minutes so let's see that how we can do it here now if you look at on my screen I am on Mistral 7B instruct model so Mistral is a large language model that has been created by a French startup named mistal Ai and they are creating lot of noise they are doing great stuff and open AI can you know sense the uh heat there you know they are a bit of foro going on between the open source and the closed source and you can see I'm going to use their 7B instruct version 0.2 model so that is 0.1 0.2 you can also look at I also have a Japanese llm because sometime you also want to play with uh other LMS which are like you know uh different languages like Japanese Italian French uh Chinese mandarina or whatever now you can take any model from hugging face it will be that easy because I'll show you how I going to do that you can see I already have a pod running so basically it provides you GPU pod and the CPU P depending on how big your model is you don't always need a GPU so for example if you want to use a ggf model you can just spin up a CPU you don't need GPU every time that's what I'm trying to to save now depending on which model you select and now you have identified a model now let's see how we can you know initiate a pod I'll first show you what I already have done it you can see this is if you expand this this is how it looks like I am running uh already created template by runp pod and that's called The Block llms if you can look at over here so we are using the block template on runpod the Block is the block is famous for creating quantized llms model on hugging face now you can see I have a 5gb disk space left after all the things that I have done and 100 GB pod volume so there are two things one is volume and the other is a container so container is like not persistent so every time when you exit or stop the port everything will be deleted within the container so you don't have to worry about it volume is persistent so for example if you you know download files load files keep it in that volume and if you stop the port that will be deleted as well so excuse me not sorry excuse me it will not be deleted the container will get deleted volume will be there so volume is basically persisted so that get persist in the directory now you can look at I have a 800 GPU that's called 1 into 880 GB it's an 00 is acronym for M 100 by the way if you don't know and I'm using the 80GB vram and 16 CPU and 125 GB Ram so that's good enough for a 7B model and that's what I have done you can see I'm using a secure Cloud I'll show that there are two types of cloud on runpod one is secure Cloud the other is community Cloud secure cloud is community cloud is more affordable because it's community so it's more interruptable so the same port can be you know used by the same Cloud clusters can be used by different uh user so if there's more load you might face some interruptions uh in your work so that's basically Community Cloud secure cloud is more where you have a secure very much self-explanatory everything's remain same you know you have a private access to the particular uh Port that you can use it that's what CQ cloud is now you can see right now it says exited where I have exit the port I'm not using it but I will spin up spin this up you can see it says start for 1.89 so I'm paying around 1.89 uh per hour it's around you know basically around 150 rupees or something in Indian currency and cheaper now what you have to do in that case you have to click on this start now once you click on this start you can see I have 1 into 880 GB I'm going to start the p and you can see it start showing you the contain start container stop container so all the logs you can see it over here now I'm first showing you a live demo of how we going to do this and then we'll go up and start a new uh pod here we'll create a new pod I'll show you how you can do that and programmatically as well how you can do that as well now you can see that it says pod utilization pod up time dis utilization you can see how much storage I have been already occupying you know due to the files and whatnot and you can find out all the description over here then you can also go to connect once you go to connect so let me show you a few things one is in the more actions you can edit the port so you can increase the volume sizes and uh Etc but then again that will cost you a bit of money uh per hour so you can edit the Pod as well let me go to connect now once you click on connect you will see a few things appears over here the one is called connect to http Service Port 5,000 and the HTTP Service Port 786 KN now I already using a tool the block template which provides me a grad application you know to work with and that's why you can see it says connect to http Service Port 7 or 6 note now once you click on that it opens you can see a it uses a port and it uses your pod ID and then proxy runp pod. net now you'll see a gradio application over here on the screen and you know you can find out my past chat you can see this is the chat and I'll show you which model I've been using in this example so this is a grad application the first part is called chat when you come in for the first time you will have no model loaded over there so you have to go to the model and in the model you can see I have a none right now but I already have used I've already loaded Eliza Japanese code Lama 7B instruct so Eliza is a Japanese model that has been fine tuned on Lama 2 and you can see I'm using their code llama 7B instruct model now in this case what I'm going to do is this is where you can download a model or a loraa now imagine if you want to bring mistra this model into this uh runpod gradio app that you are using text generation web UI if you don't know I'm I'm basically I'm writing it wrong but it will get it right so if you look at this is what we are using here so this has been integrated into that template text generation web UI uh by uaba okay tough to pronounce it by the way but fantastic right the work that they have done for the open source community so this has been integrated in a template and had been you know runp basically have created different containers and images for you and you can just spin up and you will be ready to use that now here you can see download model AA for example when you come in for the first time you will have nothing over there so just come here and paste this here if you are using a if you're not using a quantized model in case ggf model then you have to give the entire hugging face path repository and then also the file name for ggf model so let me show that what I what I mean by that let's go on the BL hugging face repository let me just go it over here and the block and then let me take this model which model we can take let's take any famous model let's take Cod Lama python 70b ggf one of the latest model by meta AI now if you want to work with code Lama 70b python ggf model then you can just take any one of these file for example if you want to use the medium ggw 4bit quantize they can you can just copy this path and just give it over there in this file name so basically you have to define the file name that it will download the ggf model so and then just hit download once you hit the download the model will get downloaded in the volume that you have persisted so this is how you can load a model so I'm not doing it right now I'll just show you how to load now once you load the model for the first time you have to do a refresh and the model will be available here so for me in this case you can see a Japanese code Lama model is visible so let's click on that and now let's load once you click on load what happened you can see it says loading model sometimes runport has this issue where connectivity is might be a problem okay now I have the settings for have been updated now when you come to chat you can see I already have you know couple of chats that I have already chat U I've have done some chat previously you know you can see this is and I'm just you can see I have a code here let me just copy this code by the way I'll just copy this code I'm just going to copy this let me just copy and I'm going to write below is my code uh write the code documentation in Japanese language now once you click on generate if the model is not loaded it will throw you an error okay so we'll see that and he said AI is typing you can see I written the code documentation in Japanese for your program and it's giving you the output here it is code documentation blah blah blah you can see it's generating that very very simple this is extremely extremely helpful if you want to you know set up a large language model for your used case you can see it's a grad application by text text generation Wei by ubaba we are using it that easily to easy to deploy and you can test it out it also have lot of customization you can you know do a bit of prompt engineering you can also say which kind of mode you are looking for chat instruct instruct you can change the chat type a bit of themes and all of those things over here so I'm not that you know wor worried about those now the good thing about text generation web UI is that you can also you know F tune if you look at the training and I'll show that in the next video I'm also creating a video on how to use this to fine tune you can see you can bring your own data and fine tune it on runp pod with ease you don't have to worry about a lot of infra and configurations and things like that now it has a notebook also you can see the markdown The Notebook the HTML tokens everything default lot of configuration you can do it but the model is the part so in the model you can see it supports Transformers it supports Lama CPP so for example if you don't want to go with a00 GPU you don't have that much of affordability for the compute power you can go ahead with Lama CPP or C Transformers uh created by Mela you know fantastic uh framework to work with you can select any other of the other examples as well but I'm going to go with the base Transformers and have this model you can configure the uh loading 8bit 4bit use double quantization quantile quantization things like that flash attention too for M and stuff you can use Flash attention but for 52 it's not enabled yet while this while I'm creating this video so you can basically you know configure this uh how how do you want to deploy and use that now how can you do this progamic and also on the port so so far I just want to close this so let me just close this here I will I'll just stop this once I stop they charge it for volume storage which is like 0.028 you know per hour you can see now I have stopped that pod and once I run it again you will see it will not work it says not found right now have so basically you can fully control it okay and you can also use an API I'll show it programmatically let me open Google collab quickly to show you how you can uh do this I'll just not run the program because it will cost me a bit of money but I'll just show you how you can do it programmatically very easy to do that not locket signs now yeah I'm going to do new notebook and here I'll will show you but first thing what I'm going to show you is that if you want to do your own how can you do that now that's the import important here because I already have you know already have basically set up for my experiment but what if you want to do on your own so let me first show you what you have to do you have to click on GPU p and let me show you from the left hand side if you come to home you will see a home like an overview of your dashboard for what whatever you just we have done then you have explore you can explore runport templates you can search you know for example let's see how the search Okay happen something it's fine I'm going to go on ports ports where you saw all the ports then you have serverless and serverless is important because that's what I was talking for example if you quickly want to deploy an end and get a custom Endo that you can use it as an API in Lang chain or just like an request module of python how can you do that so this is how you can do that it supports these models it has Lama 1B is the only large language model that it supports right now but it has stable diffusion a lot it has automatic 1111 estable diffusion Excel estable diffusion 1.5 and things like that right now it has faster whisper for a speech to text model based on C translate so extremely faster now they're coming up with the other models as well very soon but if you click on start it's very easy to just deploy it so I'm not going to do that here you can create an endpoint completely serverless you can see so don't have to worry about sub setting up you know a virtual machine or you know installing Cuda and things like that they will do it everything for you just click on start wait for few minutes and use it as an API endpoint now what I'm going to show you more is we have storage you can create a new Network volume we have lot of templates that you can create a new templates you have a secret key settings bilding so I'm not going to do uh so all of those things now when you click on a GPU pod you can see it says choose template we we so that now here you have latest generation and the old generation of GPU so a00 is an old generation GPU m00 now in the in the latest we have h00 you know we have RTX uh RTX 490 RTX 6000 Ada you can see the new generation they have listed all the pricing depending on what pricing you are you know going ahead with and you can also find out the previous generation for example if you want to deploy a a00 you can also do that uh but for bigger llms you might need two or 3 a00 you can see it supports maximum 8 a00 that you can support yeah but this is what it is now let me click on deploy and see what it does once you click on deploy you can see in the left hand side a bit of configuration of the infra and in the right hand side you can select a template if you don't want to select any template and want to go ahead with pyos as a base template as a back end to work uh with language models or large language models you are free to do that as well you can see it's by default it select pyro 2.1 now here you can search and come down and you will see something called The Block so you have to do runp the block llms and then you can bit do a bit of customizations you know on you can see I'm not interruptable because I'm using the secure Cloud you can again and this thing you can again increase it the volumes and container volume disc volume later and then you have to click on continue but I'm not going to do that because then again it's going to cost me I already have shown you my previous pod so let me just go to pod and you can see once you do that you'll be seeing something like this it will uh appear now that is uh on from dashboard how you can do it programmatically because I think that that should be your areas of Interest let me show that quickly you know I'll just show you I'm not going to run that but I'll give you the program as a GitHub gist or GitHub repository so let me just call it runp setup and here what you have to do you have to install runp so let me just do pip install runp you do p install runpod and then you can write uh import so you do import OS because you want to set up the API keys so you do let it let it install but I'm going to just do uh import runp and then I'm going to do from I python so from I python. display it's installing runp import uh display and mark down so let's do that display and mark down now these are the ah this should be import here display and markdown so these are the things that I need and then I will just add some more qu sales here and I'm going to just do this is how you do it so you do os. inan uh os. invad and then you can maybe keep it in a secret or keep it as a en file or you can also use get pass to get it runp API key and and then your API key so this is how you do it and how will you get the API Keys you will get it here in the uh secret so you go to Secrets you will see I don't have any secret have to create a secret also in the settings in the settings you can find out the API key so in the API key you can see I have an API key that I will just delete it now so here you can create an API key click on API and you can give it a permission of read right depending on what what kind of workflow you want to manage you can do that now let me just run this now once you run it it says invalid syntax because I forgot to give a comma over here let's do that and now you see it will be able to show you the uh thingy okay it says no module named I python okay that is oh that is surprising I python because I forgot to give this as P so that's a typo now os. invine is done now what you going to do next is you say maybe you can do it like this as well so run forward API key equals and you get a runport API key os. get EnV you can also do that runp API key and then you can pass your key here okay so you can just do I'm just keeping it here maybe you can do your API but this is fine you can understand how to fit an API key now you can do an if else if not run part API key raise is not fine things like that write something like this you can just do it now this is how you can get a GPU so for example all if you want to Vis find out all the visible gpus for you what kind of GPU that you can use you can do just do runport Gore GPU you can see it suggest here get gpus and you can just do it so I'm not running it because I Haven set up the keys get gpus now I'll show you how you can create a PS and I'll give that uh G in GitHub G in the description you create a pod and you say there is a function of run pod do create pod now you can just create a pod programmatically if you don't have access to run pod or dashboard how can you do it programmatically and how can you you can use it through Lang chain when you want to inference it so runp do create pod and then you can define a lot of things first you can see it's a model name so model is a string for example if I say Lama 213b and things like that right you you can find it out the name in the Repository name you can give it a name and then image name I'll give that image name you give an image name over here there basically a container name kind of a thing that comes from basically text generation inference Docker image and then you have a GPU type so here you define a GPU type now GPU type you can give it GPU type or GPU type ID not GPU type just ID for that and for example if you have a00 then you have to do things like this a Nvidia a00 atgb and then you just do pcie that's the ID on and you'll get that in the when you run the command and then you can do a cloud type if you want secure or you want community so then you do Cloud type equals and then you here write you know secure so you do secure so let me just write it over here secure Cloud type and then you can also have Docker arguments and thing like that okay if you want to do it more customizable you can do a Docker ask I'll give it to you this in uh my repository you can just get it if you if you have deploying a bigger model like 13B or 40b or 30b or 70b then you can you have to also uh consider how many GPU uh you need so for example a00 for 13B if you're not using a ggf you might need a two GPU so GPU count then you can also do volume in and all of this right it be suggested by eight volume in GB you say I need around 195 or 194 whatever 195 container dis in GB you can see all of it uh has been created container dis in GB I'll say 10 and then you can also do ports you can Define ports if you are running a gradio then you have to give it like you know for example 80 and then I need HTTP and then I also need for example 295 not not and then you also give HTTP so you can Define port and last what you can do is how how where you going to mount the volume so let's do volume Mount PA and then I'll will give this as data so this is how you can create uh how you can initiate P you can create an entire configuration for your run pod pod and then what you have to do once you just run it it will create a pod and that pod will be visible on here in inside ports okay so you can do it two way programmatically or also directly from the dashboard dashboard is much easier just like drag and drop and selecting some options programmatically it's more you know like where write programs and control it and do it like that now once you do that how want to create an inference server so langin has the some integration with that so LMS input and then you can just do use hugging phas so I don't have Lang chain install here that's why it's giving me that but pH text generation inference so they have text gen inference you can just use that so text gen inference is there then what you can do is you can define an inference server URL so let me just do that and INF server URL and here you have to Define that pod ID so that will be basically your that will basically your let me just show that you will have your https and the I was showing that proxy runp pod. net right so what what it does here is it takes your pod ID so let me just write it so pod and then it takes your ID that you have so pod ID and then after after that you know uh you have your whatever Port you are running and then just run.net so just you will get it from uh the URL uh when you run that from dashboard or if you just get it after this has been deployed if you print that it will print you as well now you can use the hugging face text gen inter uh text gen inference hugging face text gen inference and not this right so let me just remove it now here you have inference of URL yes you can see inference server URL and that is nothing but the inference server URL that we have defined so inference server URL and I'll just remove this model name and path because we are using a deployed now you can Define inference parameters like Max new tokens I'm going to keep it like 512 and you can just completely customize the temperature and then know you can keep things like keep on adding things and then when you have the LM you can just use that LM inside some chains like retrieval QA or conversational memory and things like that so this is how you can do it programmatically I will give you already I have a very good gist that I use it for programmatically uh set it up now runp for is fantastic I recommend you using runp for all your llm experimentation and you know it's it's more you can control it it's it's really affordable and you can easily F tune and create llms right uh for your specific use cases I hope you understood how easy it is to set up a large language model you can bring any llm from hugging face you know or you can use it ggf Quant model through Lama CPP or things like that right C Transformers and stuff so this is what I wanted to create guys you know in this video I hope you uh now if you haven't worked with runpod you'll be able to work with runpod very easy just go and create an account you can do a Google login as well you can see I have logged in with Gmail you can put some money over here you can see I have $10 minimum to put it uh through your credit cards you can just put that and work with it you can see there's a dogs you can just open dogs which is on a mission click on documentation it will come and you can just go ahead and work with it right so if you have any question thoughts feedbacks please let me know in the comment box more than happy to help you you can also reach out to me through my social media channels find those information on channel banner and channel about us if you haven't subscribe the channel yet please do subscribe the channel if you haven't liked the uh video please like now and and also watch other videos you know we have more than 200 videos on generative AI please watch those videos share this video and Channel with your friends and to peer thank you so much for watching see you in the next one

Original Description

In this comprehensive tutorial, I walk you through the process of deploying and using any open-source Large Language Models (LLMs) utilizing RunPod's powerful GPU services. If you're intrigued by the potential of generative AI and looking for affordable ways to work with LLMs without the hassle of managing heavy infrastructure, this video is tailor-made for you. I cover the basics of serverless computing, the necessity of high GPU VRAM for running LLMs, and demonstrate how to create GPU instances in the cloud specifically for language model tasks. You'll learn how to efficiently allocate GPU VRAM based on the size of the LLM you're working with, leveraging RunPod's diverse range of GPUs. The tutorial includes a practical demonstration using a user-friendly template that simplifies deploying and interfacing with LLMs through a text generation web UI. Whether you're a novice eager to dive into the world of LLMs or a seasoned developer looking to optimize your workflow, this guide offers valuable insights and tips on making the most out of RunPod's offerings. Don't forget to like, comment, and subscribe for more tutorials on leveraging cloud computing for generative AI projects. GitHub Gist: https://gist.github.com/AIAnytime/be79b6a23a8ca5864604ce7f98c1574c Join this channel to get access to perks: https://www.youtube.com/channel/UC-zVytOQB62OwMhKRi0TDvg/join #runpod #llm #ai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Anytime · AI Anytime · 0 of 60

← Previous Next →
1 Spelling and Grammar Checking Streamlit App: Building Docker Image
Spelling and Grammar Checking Streamlit App: Building Docker Image
AI Anytime
2 Spelling and Grammar Checking Streamlit App: Docker Image and Docker Hub
Spelling and Grammar Checking Streamlit App: Docker Image and Docker Hub
AI Anytime
3 Image Caption Generator: Google Colab and Hugging Face
Image Caption Generator: Google Colab and Hugging Face
AI Anytime
4 Low Code/No Code AI Platform Teachable Machine: Brain MRI Image Classification
Low Code/No Code AI Platform Teachable Machine: Brain MRI Image Classification
AI Anytime
5 Low Code/No Code AI Platform Teachable Machine: Testing the Model
Low Code/No Code AI Platform Teachable Machine: Testing the Model
AI Anytime
6 Low Code/No Code AI Platform: Streamlit App for Brain MRI Image Classification
Low Code/No Code AI Platform: Streamlit App for Brain MRI Image Classification
AI Anytime
7 Readme Generator Streamlit App using ChatGPT
Readme Generator Streamlit App using ChatGPT
AI Anytime
8 Generate Minutes of Meeting (MoM) from Video using ChatGPT: AI as an API
Generate Minutes of Meeting (MoM) from Video using ChatGPT: AI as an API
AI Anytime
9 The Great AI Showdown: ChatGPT vs ChatSonic 🔥
The Great AI Showdown: ChatGPT vs ChatSonic 🔥
AI Anytime
10 Generating Transcripts and News Article with Whisper, GPT-3.5, ChatGPT and Streamlit
Generating Transcripts and News Article with Whisper, GPT-3.5, ChatGPT and Streamlit
AI Anytime
11 Toxicity Classifier using Machine Learning and NLP
Toxicity Classifier using Machine Learning and NLP
AI Anytime
12 Toxicity Classifier API using FastAPI
Toxicity Classifier API using FastAPI
AI Anytime
13 Toxicity Classifier Streamlit App
Toxicity Classifier Streamlit App
AI Anytime
14 Low-Code Insurance Prediction with PyCaret and Streamlit
Low-Code Insurance Prediction with PyCaret and Streamlit
AI Anytime
15 Deploy Streamlit Python Application for Free
Deploy Streamlit Python Application for Free
AI Anytime
16 GPT3 Powered Text Analytics App
GPT3 Powered Text Analytics App
AI Anytime
17 AI Image Generation Streamlit App
AI Image Generation Streamlit App
AI Anytime
18 Streamlit and txtai: Building an Abstractive Summarization App in Python
Streamlit and txtai: Building an Abstractive Summarization App in Python
AI Anytime
19 Building a Topic Modeling and Labeling app with Streamlit
Building a Topic Modeling and Labeling app with Streamlit
AI Anytime
20 The Art of AI: Exploring Midjourney, Dall-E, and Lexica
The Art of AI: Exploring Midjourney, Dall-E, and Lexica
AI Anytime
21 Exploring the latest Large Language Models (LLaMA and Alpaca)
Exploring the latest Large Language Models (LLaMA and Alpaca)
AI Anytime
22 Comparing LLMs like GPT-X, LLaMA, and Alpaca: Analyzing the Perplexity Score
Comparing LLMs like GPT-X, LLaMA, and Alpaca: Analyzing the Perplexity Score
AI Anytime
23 GPT-3 powered Q&A App using Langchain, GPT-Index, and Gradio
GPT-3 powered Q&A App using Langchain, GPT-Index, and Gradio
AI Anytime
24 All things #ai . Latest and greatest in AI. #tech #python #chatgpt #youtubeshorts #shorts #gpt3
All things #ai . Latest and greatest in AI. #tech #python #chatgpt #youtubeshorts #shorts #gpt3
AI Anytime
25 Text-to-Video Generation using a Generative AI Model
Text-to-Video Generation using a Generative AI Model
AI Anytime
26 #ai brand name generator. #artificialintelligence #tech #shorts #youtubeshorts #youtube #chatgpt
#ai brand name generator. #artificialintelligence #tech #shorts #youtubeshorts #youtube #chatgpt
AI Anytime
27 Talking AGI with Sam Altman: A Deepfake Showcase
Talking AGI with Sam Altman: A Deepfake Showcase
AI Anytime
28 A conversation with ChatGPT creator Sam Altman. #tech #technology #ai #shorts #viral
A conversation with ChatGPT creator Sam Altman. #tech #technology #ai #shorts #viral
AI Anytime
29 Get to Know Anthropic's Claude: The Ultimate ChatGPT Competitor
Get to Know Anthropic's Claude: The Ultimate ChatGPT Competitor
AI Anytime
30 #shorts #chatgpt #python #datascience #tech #coding
#shorts #chatgpt #python #datascience #tech #coding
AI Anytime
31 Recipe Generator App from Cooking Videos using Whisper and ChatGPT
Recipe Generator App from Cooking Videos using Whisper and ChatGPT
AI Anytime
32 Segment Anything Model by Meta AI: An Image Segmentation Model
Segment Anything Model by Meta AI: An Image Segmentation Model
AI Anytime
33 One of the best #ai #books based on #tensorflow. #tech #coding #shorts #chatgpt #machinelearning
One of the best #ai #books based on #tensorflow. #tech #coding #shorts #chatgpt #machinelearning
AI Anytime
34 Music Generation using Mubert #ai . #music #shorts #youtubeshorts #chatgpt #generativeai
Music Generation using Mubert #ai . #music #shorts #youtubeshorts #chatgpt #generativeai
AI Anytime
35 Image to Text Prompt: Reverse Engineering AI Image Generation
Image to Text Prompt: Reverse Engineering AI Image Generation
AI Anytime
36 Image Generation for #ramadan using #ai. #midjourney #chatgpt #shorts #youtubeshorts #islam
Image Generation for #ramadan using #ai. #midjourney #chatgpt #shorts #youtubeshorts #islam
AI Anytime
37 How to build an AI-ready organization: Cultivating a Data-Driven Culture
How to build an AI-ready organization: Cultivating a Data-Driven Culture
AI Anytime
38 Midjourney: Generate AI-powered Images
Midjourney: Generate AI-powered Images
AI Anytime
39 Getting Started with Graphs: A Beginner's Guide (Part 1 of GNN Series)
Getting Started with Graphs: A Beginner's Guide (Part 1 of GNN Series)
AI Anytime
40 Build India's First ChatGPT like App for Politics: BJP-GPT
Build India's First ChatGPT like App for Politics: BJP-GPT
AI Anytime
41 Meet BJP-GPT.... @AIAnytime  #bjp #news #shorts #tech #chatgpt #ai #youtubeshorts #coding #video
Meet BJP-GPT.... @AIAnytime #bjp #news #shorts #tech #chatgpt #ai #youtubeshorts #coding #video
AI Anytime
42 ChatPDF... #chatgpt  for PDF files. #ai #generativeai #shorts #youtubeshorts #coding #tech #ai
ChatPDF... #chatgpt for PDF files. #ai #generativeai #shorts #youtubeshorts #coding #tech #ai
AI Anytime
43 Free AI Image Generation #ai #chatgpt #coding #tech #shorts #youtubeshorts #shortvideo #generativeai
Free AI Image Generation #ai #chatgpt #coding #tech #shorts #youtubeshorts #shortvideo #generativeai
AI Anytime
44 Transform old photos into Vibrant Memories with Deoldify AI: Build a Streamlit App
Transform old photos into Vibrant Memories with Deoldify AI: Build a Streamlit App
AI Anytime
45 Open Assistant: The Real Open-sourced LLM
Open Assistant: The Real Open-sourced LLM
AI Anytime
46 Thanks to @YannicKilcherand team for the open sourced LLM Open Assistant. #ai #shorts #tech
Thanks to @YannicKilcherand team for the open sourced LLM Open Assistant. #ai #shorts #tech
AI Anytime
47 Search Engine for AI generated images. #ai #tech #technology #generativeai #chatgpt  #shorts #video
Search Engine for AI generated images. #ai #tech #technology #generativeai #chatgpt #shorts #video
AI Anytime
48 Generative AI Video Platform "Synthesia" #shorts #youtubeshorts #ai #tech #chatgpt #generativeai
Generative AI Video Platform "Synthesia" #shorts #youtubeshorts #ai #tech #chatgpt #generativeai
AI Anytime
49 Text to speech Voice AI platform. #shorts #youtubeshorts #ai #tech #technology #python #coding
Text to speech Voice AI platform. #shorts #youtubeshorts #ai #tech #technology #python #coding
AI Anytime
50 Create Amazing Videos with ChatGPT and Pictory: Free AI-powered Video Creation
Create Amazing Videos with ChatGPT and Pictory: Free AI-powered Video Creation
AI Anytime
51 Want to create beautiful video using #chatgpt and #pictory ? Watch the tutorial on channel. #ai
Want to create beautiful video using #chatgpt and #pictory ? Watch the tutorial on channel. #ai
AI Anytime
52 Animate your photos using AI. Bring old family photos to life. #ai #tech #shorts #shortvideo #coding
Animate your photos using AI. Bring old family photos to life. #ai #tech #shorts #shortvideo #coding
AI Anytime
53 Create a PDF Search and Summarization Tool in less than 100 Lines of Code: GPT-Index and Streamlit
Create a PDF Search and Summarization Tool in less than 100 Lines of Code: GPT-Index and Streamlit
AI Anytime
54 Text to Video Generation using Videocrafter: Intuitive Math behind Latent Diffusion Model
Text to Video Generation using Videocrafter: Intuitive Math behind Latent Diffusion Model
AI Anytime
55 Gamma AI: Create presentation PPT easily with #ai . #chatgpt #shorts #shortvideo #tech #coding
Gamma AI: Create presentation PPT easily with #ai . #chatgpt #shorts #shortvideo #tech #coding
AI Anytime
56 Tripnotes: Free AI tools for your trip planning. #ai #chatgpt #shorts #youtubeshorts #video
Tripnotes: Free AI tools for your trip planning. #ai #chatgpt #shorts #youtubeshorts #video
AI Anytime
57 Meet Bark (New Text to Speech Model): Clone Any Voice to Generate Music and Speech
Meet Bark (New Text to Speech Model): Clone Any Voice to Generate Music and Speech
AI Anytime
58 Fliki: The free AI video creation tool. #ai #shorts #shortvideo #youtubeshorts #chatgpt #tech #news
Fliki: The free AI video creation tool. #ai #shorts #shortvideo #youtubeshorts #chatgpt #tech #news
AI Anytime
59 Ask Anything Tool: Chat with Your Video using ChatGPT, MiniGPT4, and StableLM
Ask Anything Tool: Chat with Your Video using ChatGPT, MiniGPT4, and StableLM
AI Anytime
60 HuggingChat: Open Source ChatGPT (Interface and Model)
HuggingChat: Open Source ChatGPT (Interface and Model)
AI Anytime

This video tutorial teaches how to deploy and use open-source LLMs on RunPod, covering model selection, fine-tuning, and serverless deployment. By following the tutorial, viewers can learn how to utilize RunPod's powerful GPU services to work with LLMs like Mistral 7B and Dogs.

Key Takeaways
  1. Initiate a pod on RunPod
  2. Deploy a large language model on RunPod
  3. Use RunPod's template for deployment
  4. Fine-tune a model on RunPod
  5. Create a quantized LLM on Hugging Face using RunPod's Block template
  6. Click on start to deploy LLMs
  7. Create an endpoint completely serverless
  8. Select a template or use pyro 2.1 as a base template
  9. Customize the deployment with RunPod and the Block LLMs
💡 RunPod provides a powerful and affordable way to deploy and use open-source LLMs, allowing users to fine-tune and customize models for specific use cases.

Related Reads

📰
GPU Survivors: Can You Survive a 1T Parameter Inference Run?
Learn how GPUs handle massive language model inference runs and play an interactive game to understand LLMs under load
Dev.to AI
📰
Plan-and-Solve: make the model plan the steps before it computes any of them
Learn how to improve language models' performance on multi-step word problems using Plan-and-Solve prompting, which makes the model plan the steps before computing any of them.
Dev.to AI
📰
Fine-Tuning Vision-Language Models for Production Invoice Extraction
Learn to fine-tune vision-language models for production invoice extraction to automate processing of large volumes of invoices
Medium · Machine Learning
📰
Fine-Tuning Vision-Language Models for Production Invoice Extraction
Learn to fine-tune vision-language models for extracting data from production invoices, a crucial task for automation in industries like beverage distribution
Medium · LLM
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →