Recipe Generator App from Cooking Videos using Whisper and ChatGPT

AI Anytime · Intermediate ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations90%Prompt Craft80%LLM Engineering70%Fine-tuning LLMs60%

Key Takeaways

This video tutorial demonstrates how to build a Recipe Generator App using Whisper, Pytube, ChatGPT, and Streamlit, showcasing the integration of these technologies for efficient and user-friendly recipe generation.

Full Transcript

hello everyone welcome to AI anytime channel so in this video we are going to build a food recipe generator app so the application will generate food recipes uh based on the video files so we'll have we'll take YouTube videos as an input source and based on that input video from YouTube The Open AI gpt3 model will help us to generate the food recipes okay so we are going to use couple of models here so we we will use one ASR model which is an acronym for automatic speech recognition the model that we are going to use for ASR is whisper again that was launched by open Ai and the other model that we are going to use is a large language model we are going to use gpt3 here to do the prompt and get the food recipe from the video files so the flow will be from video to audio to transcript and then we will perform the prompt on that transcript to generate the food recipe so we are going to simplify the food recipe right so sometimes you have you know this video uh this large videos which are like more than 20 30 minutes you have to listen to the entire video and it's better to have some kind of you know summarization or to give you a holistic view okay of about the overall food recipes okay so we are going to do the prompt uh in that prompt we'll have ingredients separately then we'll have all the required steps you know uh which is there in that food recipe so this is what we are going to do to do today and we are here on this playground okay if you see uh on open AI playground I will use open AI gpt3 API okay because I haven't got the access for GPT for till yet and then we'll also use the open source model risper you can also use uh whisper API if you want to use it if you uh if your data is not that confidential you know just we were doing it for some you know hobby purposes you can still use whisper API by open AI okay so I'm going to use this uh ASR model which is state of the art okay one of the best model out there for automatic speech recognition it's it's extremely powerful when it comes to uh perform speech to text also supports multiple languages you know there are some work around you can also do speaker diarization so if there are multiple speakers you know in your audio calls okay you can also do that so what we will do will use this video or you can take any other video as well I've selected this video uh which says one pot creamy garlic mushroom chicken pasta okay so this is a video that I'm going to use so now the question arises that how will we you know extract the audio from this video because whisper uh works on an audio file in order to perform this transcription and that's where we are going to use something called Pi tube so we are going to work with the pi tube Library okay which is uh available uh in Python and then we'll take the video as a source they'll first get the audio and then we'll pass this audio to open AI model uh whisper model sorry so just to give you some overview about whisper no whisper uh is the neural network has been trained on more than 600 000 hours of audio call most of them I think are English language and you can see it perform multilingual speech recognition as I said earlier that it supports multiple languages it also helps you with speech translations language identification speaker Direction Etc Okay so you can see over see it over here voice activity detection angle so it's a transform model it's a sequence to sequence model and they have uh different uh models available of having uh different sizes so you can see the the tiniest one is a tiny model having the parameters around 39 M parameters and uh this is how you load the English model and the multilingual model it automatically detects the language and then gives you the transcription we are just going ahead with the English model for this uh tutorial video and you can also have a look at some of the metrics over here and yes they also have a very neat and clean documentation uh whisper is extremely powerful uh if you are dealing with automatic speech recognition there have been a lot of other models like there were silero wed there were was there are other offline models available right that you can utilize it but whisper works out of the box and very powerful so we are going to create this application in streamlit I will use streamlit if you haven't know I want to use streamlit you can use this over here it says a faster way to build and see your data applications this is what we are going to do in this video we are going to use extremely framework to build the application and we will integrate these models in that stability UI that how we can you know make this inference so I'll I have created a virtual environment in Anaconda prompt I'm using Anaconda you can do you can use whatever you want to use you can use this you know a standalone virtual environment you know you can spin up create a V and we activated and install all the dependencies you can use Anaconda prompt you can also use other IDs as well so currently you see I am on this vs code repository recipe Gene and these are the requirements as I said we are going to use open AI gpt3 API okay you can use that you can also use GPT 3.5 turbo for example if you want to use that you know it's also affordable when you compare the API prices per tokens and then we have python.env that's how we load the EnV file and then we have open AI whisper to work with the whisper library and then we'll use extremely to build the web app and we have Pi tip to work with the video file so it will help us do all the video operations on YouTube video what I'm going to do now I'm going to create an app.pi I'll just do advert pi and here I will uh I start writing the code for so the first thing I have to do I have to say is import streamlit as St because we are going to import streamlit by the way already have installed all the dependencies if you want to install the dependencies what you have to do you have to do keep install hyphen R requirements txt so just do requirements.txt it will install all the dependencies for you yes so I'm not going to do that okay whisper uh is literally I'll not say it's little tricky to install on Windows machine you know it requires ffmpeg it also requires something called number which don't support python 3.11 or even python 3.10 in some cases you might have to downgrade your python versions if you are following this tutorial and if you are facing any issues so import streamlined SSD sorry this is my first one now what I'm going to do now I'm going to import all this dependency the first is from PI Tu which is uh to work with the YouTube videos from PI tube import YouTube yeah this is this is we are going to use this YouTube class you can see here see it over here it says class YouTube a very neat and clean documentation as well in vs code and then we have to also import on the utilities one of that will be path lab so path leap as sorry from path leave import path that's done from pathlib input path now the next is import sh util I'll say shutil and then import whisper and we also need OS to deal with the file processing thing so import OS and then we need uh python.env so and we also need open AI so import open AI input open Ai and then from dot EnV import load underscore dot EnV so this is what I'm going to do yes so from dot EnV import load dot EMB this these are all the dependencies that I needed you know for this video and for this application so the first thing that I will do guys I will first load the dot EnV so load dot EnV uh the method and then I already have this EnV file where I have restored my open AI gpt3 API keys if you don't know where you'll find this API key you can see that this is the open AI playground on these are the dashboard basically you can see platform.openai.com playground you know you can come over here you can go and you can see the view API keys inside view API Keys you will find all your API keys if you have already created a few of them you will be able to see all the listed API keys there if you haven't uh use this platform before this video you have to create an API key to work with this video so uh I already have done it so I'll just say open AI dot API underscore key and here I'll just you know load it from this uh EnV file and thus I'm going to use OS Dot ktnv and and then I'll just just Define whatever the name that you have stored here in this EnV file okay so I have loaded the API key successfully now what I'm going to do uh next is uh let's import The Whisper model so how we're going to import The Whisper model is uh diff uh load model so here I'm going to do the load model and in load model the next thing is I need that whisper that I have imported it says import visible I will use that input with uh whisper module and then I will load the model so it has five different types of model I am going with the base model so I'll just use base model it doesn't take much time otherwise if you are on multiple if you are on a powerful GPU GPU system you can use the the large model or the media model as well for this video I'm just using base model and then I'll just return the model so this is what I'm going to do and one thing to understand that we are going to work with an extremely application a stimulates provide you a couple of very powerful decorators basically that will handle you to load this model and keep it in your cache so every time you are you know uh performing this task for activities you know based on that model in the runtime you do not have to load the model every time so they estimate provides something called Cash resources and cache data if I do cash you can see it over here in the newest version of streamlit earlier it was only cash now it provides cash data and cash resources so I'm just going to use cash resource for model if you are loading a CSV file or if you're performing those sort of operations you can also uh use The Decorator of cache underscore data okay so now every time if I am performing an operation in the same runtime it will not load the model again and again this is what this is why I'm using this decorator now I uploaded the model successfully you know what next I have to do guys the next is the next is to uh use the spy tube library and this YouTube class basically to get the YouTube video and also to you know uh extract the audio from it so the first thing is I'll use the save video okay so what I'm going to do I'm going to say depth save video save video and here I'll pass the URL to only take URL and what you can also pass video underscore file name so I'm still showing video underscore file name save video and let's create variable YouTube object and I'm just going to use this YouTube class that I have imported over here and in this YouTube I'll just pass the URL okay it will store this variable it stores that uh YouTube object and now the next again is let's use the same sorry YouTube object dot save not save video I need the highest with the highest resolution stream so I'll just say stream YouTube object dot streams I'm going to use this streams uh module streams and then get highest resolution you can see over see it over here right to get highest resolution I need the highest solution and I will just do try cash now if the video is available just download the video for me pretty much self-explanatory so YouTube object dot download I'm just going to use the download method in pi tube this doc excuse me this download and if I'm getting an error any kind of exception just accept and say print and error occur or something okay and error occurred while downloading simple thing and otherwise I'll just do of course the print the video download is completed successfully or something so this will download the video and then just return that uh video file name okay so this is for saving the video guys this will save the video in this directory and the reason I am saving this video I would have already downloaded or I would have also downloaded this audio only but I don't want to download the audio do I want to show the video in the UI okay so it provides that user experience to the end user that's why I'm using this uh save video function now what I will do I'll also write uh save audio so in save audio I'll only download the audio file directly from that YouTube url okay I'm not going to use this downloaded video to extract the audio because Pi tube provide this out of the box you have an URL you can only do only underscore audio I'll show you what I'm talking about so in this case I'm just going to say Define save audio this is going to be my function I'll just pass the URL over here and I will say hey okay uh just let's create a variable YT and then pass this again the same class you took URL and this time I'm just going to do quite dot stream and it has a filter method so I'm just going to use this filter method in this filter method I'll listen only audio and I'll just do true I just only need the audio from it and the first okay this is what I'm doing here so this is what I'm talking about that from directly from this URL I can get this audio file okay and just let this output file we also store this so we'll store this with the same fine name and all and we have to look at the extension as well so video dot download okay excuse me sorry video dot download output file and then have base and extension so base extension what we are going to do now let's please take this is correct but it will be out file not video file name okay this is out file this is done now file underscore name and then base plus this is correct but this should let's call it let's define it specifically we need the MP3 so base plus dot MP3 and then we can do uh again the try except here so what I'm going to do now I'm just going to do try in this drive OS dot rename the file name this is correct try OS dot rename output file and file name now let's do in accept if we get any Windows error so Windows error it's not a method except Windows error [Music] OS dot remove that file name okay OS dot remove file name OS dot rename voice dot rename out file dot yes this is correct out file and finally so try and accept and let's do audio file name audio file name equals path and in this path we are going to do file name dot split not let's not use split let's use stem and then uh this should stem Plus dot stem Plus MP3 audio underscore file name of path file name plus 10 plus MP3 okay now the video file name so video file name it will not be the same so we have to save this I'm going to use Save video so save video URL this is right but this video file name should not be led so URL and then uh we can use the path method so path and then file name path find name dot stem and then here we do the MP4 to save it to MP4 MP4 and this should save so we have audio file name and video file name we'll just print it in terminal to see if your uh if you are getting any error this will help so print YT dot title because YT is our variable name over here we have this title of the video plus has been successfully or something you know downloaded or something okay download it so this is done print now let's return all of this so YT dot title that's the first thing we should return title and then audio file name and video file name this is the return so now we have written both the functions one is for bid uh saving the video and other is the saving the audio we already have loaded the model uh the function has been written over here for load model now what I will do next is that we have to write the uh next one's function which is for transcription so I'll say audio to transcribe or transcription something so let's call it transcription audio to transcription and here I will pass the audio file okay this is what I'm passing over here as a parameter Define audio to transcription and then we have already the model I'll use the load model function that we have written over here and in this and then we'll pass this uh uh audio file and we'll store it in the result variable so in this result variable what we are going to do we are going to use model.transcribe so I'm just going to use this method model.transcribe and I'll pass this audio excuse me audio file this method has been by uh taken by Whisper okay so if you have to transcribe you have to use this method model dot transcribe and then what I will do I'll just do the transcript okay and in transcript I only need the text part of the result this might provide some of the extra detail or some kind of metadata about that transcript transcript that you have start or ended I don't need it okay because it's a trans or transform model you might see some extra information like start in and all those uh because it's again being based on tokenizers so transcript result text and then I'll just do return and then transcript so this will help us get the transcription audio to transcription function that we are using uh passing this audio file and we are getting the transcript now once we have transcript we'll write the code for you know a text to recipe so we'll use gpt3 here so I'm just saying Define the function text to recipe in this we'll use our text input text and now we will write the standard function so this is how you see so response and in response the first will use the open AI dot completion engine so open AI dot it's a capital by the way so completions completion dot create this is what we are going to use and in this create we will Define all our you know uh that we have the value pair so model and then the model that we are going to use is text Damian C3 we are going to use DaVinci 3 here so DaVinci 3003 model it has different open AI has different set of models as well gpt3 has the different set of models it has Adda it has Babbage it has query it has DaVinci 3 the different models so it's based I I think it's based on instruct GPT Works uh uh very well of course uh multiple set of NLP tasks now we also have GPT 4 released you know which I don't have the access yet but the day I have the access I'll create a video on that as well so model damage three and in now we are going to uh uh do the prompt so our prompt will be that we are going to do is uh in that we will write what we are going to do in the prompt okay so let's have a very simple prompt write the food recipe okay write the food recipe from the below text and the text will come from uh that above audio to transcription that we are getting this transcript will pass this transcript and then we'll just do a text over here so I'll just do text we have this excuse me text then we have temperature and this temperature can be let's keep it very standard it is to it is to control the you know creativeness and randomness of you know uh this damage 3 Model that we are using so temperature then we have Max tokens let's keep the max tokens as something around 600 Max token then we have top P so top P equals one let's keep it one and then we have penalties so frequency we have frequency penalty and the frequency penalty let's keep it zero and then we have presence penalty so let's keep it recent penalty sorry presence penalty presence underscore penalty and that's 0 as well and then I'll just do return okay here I miss that comma by the way and then here what I will do I'm just going to return that response return response and then it provides choices and the first one so I'm just going to call 0 and only the text so I'm just going to call it take this is what I'm going to excuse me this will be inside a list text and yes so this is what we are doing in this function so in this function what are we are doing guys that we have this model from temperature max tokens top B and frequency penalty the NP sales penalty so what are the last function that we are going to write okay we do not need any function because this was the last function takes two recipe so we have written one two three four total of five function the first function is to load the model what this function will do is to load the model the base model from open AI whisper which is an open source model to use for automatic speech recognition then we have saved video that will save the video in this local directory from YouTube and then we'll have save audio will also save the audio directly from this only audio equals to true we are passing this audio rule this transcription function where you're going to use the base model and then we have text to recipe where we have this uh openai dot completion.create engine to perform this prompt and generate the recipe based on your ticks that you have the input text so you have a background noise there have been thunderstorms okay in my city and now what we will do will write the code for the streamlit application so the first thing I'm going to do guys okay if you're not familiar with streamlit streamlit by default provides the container fluid view okay it's a very compressed view in the center of the page I need the full layout so I'm just I'm going to do something called set page config when you are writing HTML application for if you want to use the full layout the full width of the layout you have to first Define this uh function over here or the method set underscore page underscore config and that what I will do I'll use layout equals wide in this layout equals to White and now what I will do I'll use this uh now this will be able to use the width of this entire UI that we are going to have set phase config now the next thing is hd.sub header let's use some sub header and in this we can call it food recipe or something right food recipe generator food recipe generator app you know from cooking videos something like that okay cooking videos and the next thing is we need an input so we need an URL input so this is what I'm going to do so URL [Music] will have a text input as they enter YouTube url so interutube URL of the cooking video now let's see let's run this now so how do we run a stimulate application guys we do extremely to run app.pi so you can see it over here that we have this inter YouTube url option where we can enter the YouTube url now you can see a sum header you can see this input box now let's do one thing let's have a button so what we'll do we'll first see if this is not none so if URL is not none and then we'll have a streamlit button so I'm just going to do HD dot button and here I will pass this button inside this function uh this URL one uh within this button method and this name will be uh let's call this button as a magic or something okay [Music] let's call it generate so I'll do generate and now here I will write the uh code to use all the function that we have written above right so the first thing or let's do one thing you can see this uh full layout that we are using like full width of the layout we will will divide this in three different columns so the good thing about stimulate is like you can Define this layout if you have worked with HTML or the web elements you you have this option of columns and rows where you separate or divide those rows or columns right so you can also do it with streamlit so we'll do that by defining column one column two column three and then we'll call it St dot columns this is what we are going to use and we will you can also do it like this or you can if you keep this HD dot columns by default it will be equally divided but if you want to give more weightage to the First Column you can also do that that's how you handle it okay so if you want to give two weightage to it you can also do that but I'll equally divide so column one column two and now this is how we enter inside a column so with column one whatever I want to do in the column one my code will go here so with column one and then I have an info so St dot info it's a status message this is how you so you have St dot right this is like a print statement we in a python but I will prefer this status messages uh method uh instead of messages method that streamly provides it has info it has warning it has success and it has error as well so I'm just going to it hd.info in this case and I'll say video you know uh uploaded successfully or something okay so video uploaded successfully and now what I will do I have HD dot info so I'll just do video I need to show the video I want to show the video which video we are using you know to get this so video title because we'll use this function by the way guys so if I go over go above you will see title audio file name and video file name I'm going to use this function so video underscore title and then audio underscore file name these are my variables which I'm defining here audio underscore file name and then video underscore file name as well so these were our return in the above function so uh this is what I'm going to do here I'm going to use that save audio and an URL this URL variable will passed here okay so QR video title and let's let's show the video sd.video and I will just say you know uh video finally so video finally let's see if you're able to get any response for this guide so let's run this it's stimulate is already running it over here in this terminal I will come back and I'll click on this rerun once you click on this rerun you'll see a generate button appears now but what I will do I will click on this URL this video we are going to take and I'll just paste it over here in this input box and I will hit the generate button and see if I am able to get the desired response or are we getting any error so you can see it's running it says running it's it uh the info message appears video operates successfully so what it will do now guys it will uh use the pi tube Library okay it will fetch the videos and audios and the video will be shown over here and you can see that we have uh got our video over handy just because it saves time and makes easier cleanup at the end obviously I have quite a few in a box grater to create a paste which is otherwise known we have the videos in the UI as well so the First Column has been completed we have got the result for the First Column guys so now we have this video okay how we can use this automatic speech recognition ASR models like whisper we are using it over here you can use any other ASR models like celero wad or you know voice or any other cultivased model you can also use you have assembly AI that provides API to perform this transcription how will you use this video to perform the transcription task to extract the transcription and then you pass the transcription to this gpt3 model to generate the food recipe this is what we are going to you know do in this uh application now because we are going we already have written the function on top and that we are going to use the functions now so our column one task is done now what I will do I'll just do with column two and in column two what I'm going to do again we'll have a info message as same we have in the column one and he will say transcript is below or something transcript is below and here I am just going to use the transcript function so let's also print uh to see if we get any error we'll have audio file name if you are not getting the transcription we can track that in the terminal I am using Anaconda prompt but you can also see it in your other terminal as well if you are using so transcript result let's call this variable transcript result and then I'll just do audio to transcription function that we have written on top and I'll pass the audio file name that I have here you can see this variable audio file name so I'm just going to use audio file name that's it and what I will do now just do sd.success so the other inform status that I am using uh by streamlit St dot success which provides me that green background sort of thing so on St dot success I'll just you know pass this transcript result variable let's see if we are able to get the transcription using whisper guys okay so now what I will do I will go back I will again hit the generate button because my URL Remains the Same now once we hit the generate button we'll get this video and also the transcription now let's see if we are getting this results so you can see it uh it's running the again okay it will again uh fetch this video okay and so we are not handling any cash or those sort of stuffs with this video because this video are being saved in the directory if you want to extend this application for the that I will encourage you to do but not with streamlit if you are following any other uh you know web Technologies and you know micro Services sort of stuff you can also extend this further and make it even better to you know because if you can also see you can store this in a database and check the next time if a same video is available you do not have to download every time okay you can restore it somewhere in database and match it and then only so the on the UI from that database so you can say transcript is below you can see it's still running okay it's generating the transcript for you it might take from uh from few seconds to minute depending on this uh size of the audio file and also the computational power of your machine so whisper required GPU machine if you have big files it might takes a lot of time if you have 10 15 minutes of audio files and if you do not have a GPU it might take up to an hour to perform those transcription but you can also use uh some work around there have been some other libraries like faster whisper for example which you know where they have quantized all these models okay if they have I can also use those for faster responses now just to help you understand uh till this is being completed what we have did okay let me just do one thing let me go back and connect with this uh I would like to show you what I'm talking about okay so let me just do that okay so you can see we have got the transcript now you can see the transcription over here meanwhile I'm setting up my uh tab so you can see the transcription we say transcript is below okay transcript is below and we have got a transcription which says one part DCs are super handy just because it saves times and you can see the accuracy okay what kind of accuracy we are getting this basically calls word error rate that's called wer word error rate when you are evaluating uh speech recognition model and uh what error rate of trans uh whisper model is amazing when you compare it with other models and you can see the uh words also these are very uh seems very accurate and if you come down you can see a huge transcription over there right and you can completely see the last line if I read it for you delicious creamy garlic mushroom and chicken one pot Wonder allowing us to jump into the best part which is that we can then dig it and you can just hear this isn't broken down as much as for one of my favorite ingredients which is 200 degrees you can match it as well line by line now we have the transcription as well now what we are going to do guys next is okay the next thing is to handle the column three now so we we have done our column one and column two task now we'll get the last task which is of you know basically generating The Prompt so here we'll generate the prompt uh sorry generate the recipe by the way with help of the prom that we have done in the above function now with column three again we'll have the same HD dot info and here we'll write the recipe is you know below or something okay recipe is generated generated below this is going to be and here we'll just use that right recipe result let's call this variable as a recipe result in this recipe result what we are going to do we are going to use that function text to recipe and we'll pass our transcript result that we have got in the above column uh with help of whisper model so in transcript what I'm going to do transcript result and in transcript result we're just going to again use the success one and respirator now let's see if we are getting this recipe result as well again I hit the generate button what it will do it will again go ahead and uh get this video transcription and then the this one food recipe meanwhile let me I don't know what's really happening with this I'll connect with I'm trying to connect with my tab so I can cast it I can show you the workflow that what are the things that we have used in this uh video okay but I am facing some internet connection issues so I'll let that go okay or maybe I think it's working now so let me just bring it for you so what we are doing here guys we have a video file and we are passing this to uh Pi T we are using this Pi tube and you Pi T we are getting this audio and we're passing this audio to something called whisper which is a transform model which is a Transformer model and we are getting the transcriptions and once we have the transcription we are passing it through gpt3 and then we are getting the recipe we are doing a very simple prompt you can again tweak The Prompt so we are using llm here you can see we are using a large language models uh in gpt3 we are using DaVinci 3 and we are also using an ASR model so we are again using whisper and this both the models guys are being released by open AI okay open AI has created this modeling part so this is the flow okay we have a video to a recipe this is what we are doing now let me go back to this uh estimate application you can see recipe generated below it's still working on this one and now you can see we have got a very uh beautiful uh recipe by the way guys they have said ingredients so we are getting this ingredients what are all the ingredients that we need to you know prepare this food from this video basically guys so it says 750 grams it also giving you the pound basically and 750 grams boneless and skinless chicken uh chicken thighs and then it's giving you onion powder all those ingredients like mushrooms which are being used you know to prepare this food and this looks fantastic okay this basically uh minimizes your effort you know to understand you will get a very concise manner a very organized manner that how you have to go ahead and prepare that food right you have you just get these ingredients and then you can follow the instructions question so the instruction is the first instruction is to place chicken thighs in a mixing bowl and add in the dried basil dried thyme onion powder garlic powder salt and pepper masses the masal sorry massage the chicken until everything is well coated and then you have peel the onion you know slice in then make thin slices you know and then dice into a small even side pieces and you have to on the third point you have to work with garlic cloves then we have to in the first step you have the mushroom coming in you have to slice the mushrooms then in the sixth one you have to go ahead and I think we have to use this pot over a high heat so all those things come in now okay then you have to tweak the heat portion of it medium you know and then oils and all those things coming and coming in you have this spices and all no and for wait for four minutes and until the mushrooms are golden and soft you know in that you have this mixed through chicken stock and those creams and all and in last you have this pasta and mix to bring the sauce back now the reason we are you know not getting the full response maybe we are using 600 tokens you can increase the tokens and you can tweak it you can see there were also 12 13 14 which will Pro obviously be given uh here will be generated here by open AI we have used this uh Max tokens at 600 you can increase this to maybe I don't know maybe you can increase it to 800 or 1000 okay to uh get the full response but this looks fascinating so see you have this kind of unstructured data right you have this unstructured text you have just set up text you don't know what to do with those texts now you have some set of questions or questionaries you can use you know prompting to pass those questionaries you can do a prompt you can get your response so if you see what we did in this video guys we have this food recipe generator app from cooking videos so we took this video from YouTube which is one pot creamy garlic mushroom chicken pasta we have passed the URL you know to this uh app that we built using a streamlit and then we have we are showing this video how many mushrooms these can simply be thinly stuff and then we have a transcription that we have you know uh we have performed this uh speech to text using whisper model which is an open source Transformer model so we have used whisper to get this transcription and once we have the transcription we passed it to gpt3 damage C3 model okay so we have pass it to DaVinci 3 to get this uh food recipe guys you can also perform a lot of other analytics on top of this you can also find out the categories you know categories of food non-vegs wage and all those things this is just an idea for you if you're really extending this uh further maybe you can do the tagging as well if this is a non-veg food item food recipe this is a whale's food recipe and all those you can also calculate a carbon footprint you know based out of all these uh values that we're getting 750 grams of chicken all those you can do it guys it's up to you how do you want to take it Forward maybe you can create uh you can add this carbon footprint uh logic and also the uh tagging of the types of veg or non veg if you are doing it please let me know I would like to see what you are doing with this application if you have got any inspiration from this uh uh video tutorial video if you have done something very unique please tag me there you know I would like to test it out as well so this is what I have for this video guys if you like the video you know uh please like uh and share this video with your friend just for your information the this will be uh this code will be available on the GitHub repository of AI anytime that I will give in the description if you have any thoughts or feedback you know please drop in the comment box and I will I do I'll do reply and I'll reply to those comment as well please reach out to me if you have any thoughts or feedback for me please if you haven't subscribed the channel do subscribe it and share with your friends and peer that's all for today videos guys thank you so much for watching see you in the next video

Original Description

In this tutorial, I will show you how to build a Recipe Generator App using Whisper, Pytube, #chatgpt, and Streamlit. By the end of this tutorial, you will have a good understanding of how to integrate these technologies to build an efficient and user-friendly recipe generator app. First, Pytube is a library that allows you to perform various video operations such as downloading videos and extracting audio from videos. You can use Pytube to extract audio from your cooking videos and pass it to Whisper for transcription. Then I used Whisper, a transformer-based model to transcribe audio. You can use Whisper to transcribe the audio from your cooking videos and extract relevant information such as ingredients and instructions. Then, GPT-3 is a language model that can be used to generate text based on a given prompt. You can use GPT-3 to generate a recipe based on the prompts provided by the user. Finally, Streamlit is a framework that allows you to build interactive web applications. You can use Streamlit to build a user-friendly interface for your Recipe Generator App. Find the cooking video here: https://www.youtube.com/watch?v=85eG2gsJ0Ko GitHub Repo: https://github.com/AIAnytime/Food-Recipe-Generator-using-Whisper-and-GPT-3 Streamlit Docs: https://streamlit.io/ Whisper Repo: https://github.com/openai/whisper #ai #machinelearning #deeplearning

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Anytime · AI Anytime · 31 of 60

← Previous Next →

Spelling and Grammar Checking Streamlit App: Building Docker Image

Spelling and Grammar Checking Streamlit App: Building Docker Image

Spelling and Grammar Checking Streamlit App: Docker Image and Docker Hub

Spelling and Grammar Checking Streamlit App: Docker Image and Docker Hub

Image Caption Generator: Google Colab and Hugging Face

Image Caption Generator: Google Colab and Hugging Face

Low Code/No Code AI Platform Teachable Machine: Brain MRI Image Classification

Low Code/No Code AI Platform Teachable Machine: Brain MRI Image Classification

Low Code/No Code AI Platform Teachable Machine: Testing the Model

Low Code/No Code AI Platform Teachable Machine: Testing the Model

Low Code/No Code AI Platform: Streamlit App for Brain MRI Image Classification

Low Code/No Code AI Platform: Streamlit App for Brain MRI Image Classification

Readme Generator Streamlit App using ChatGPT

Readme Generator Streamlit App using ChatGPT

Generate Minutes of Meeting (MoM) from Video using ChatGPT: AI as an API

Generate Minutes of Meeting (MoM) from Video using ChatGPT: AI as an API

The Great AI Showdown: ChatGPT vs ChatSonic 🔥

The Great AI Showdown: ChatGPT vs ChatSonic 🔥

Generating Transcripts and News Article with Whisper, GPT-3.5, ChatGPT and Streamlit

Generating Transcripts and News Article with Whisper, GPT-3.5, ChatGPT and Streamlit

Toxicity Classifier using Machine Learning and NLP

Toxicity Classifier using Machine Learning and NLP

Toxicity Classifier API using FastAPI

Toxicity Classifier API using FastAPI

Toxicity Classifier Streamlit App

Toxicity Classifier Streamlit App

Low-Code Insurance Prediction with PyCaret and Streamlit

Low-Code Insurance Prediction with PyCaret and Streamlit

Deploy Streamlit Python Application for Free

Deploy Streamlit Python Application for Free

GPT3 Powered Text Analytics App

GPT3 Powered Text Analytics App

AI Image Generation Streamlit App

AI Image Generation Streamlit App

Streamlit and txtai: Building an Abstractive Summarization App in Python

Streamlit and txtai: Building an Abstractive Summarization App in Python

Building a Topic Modeling and Labeling app with Streamlit

Building a Topic Modeling and Labeling app with Streamlit

The Art of AI: Exploring Midjourney, Dall-E, and Lexica

The Art of AI: Exploring Midjourney, Dall-E, and Lexica

Exploring the latest Large Language Models (LLaMA and Alpaca)

Exploring the latest Large Language Models (LLaMA and Alpaca)

Comparing LLMs like GPT-X, LLaMA, and Alpaca: Analyzing the Perplexity Score

Comparing LLMs like GPT-X, LLaMA, and Alpaca: Analyzing the Perplexity Score

GPT-3 powered Q&A App using Langchain, GPT-Index, and Gradio

GPT-3 powered Q&A App using Langchain, GPT-Index, and Gradio

All things #ai . Latest and greatest in AI. #tech #python #chatgpt #youtubeshorts #shorts #gpt3

All things #ai . Latest and greatest in AI. #tech #python #chatgpt #youtubeshorts #shorts #gpt3

Text-to-Video Generation using a Generative AI Model

Text-to-Video Generation using a Generative AI Model

#ai brand name generator. #artificialintelligence #tech #shorts #youtubeshorts #youtube #chatgpt

Talking AGI with Sam Altman: A Deepfake Showcase

Talking AGI with Sam Altman: A Deepfake Showcase

A conversation with ChatGPT creator Sam Altman. #tech #technology #ai #shorts #viral

A conversation with ChatGPT creator Sam Altman. #tech #technology #ai #shorts #viral

Get to Know Anthropic's Claude: The Ultimate ChatGPT Competitor

Get to Know Anthropic's Claude: The Ultimate ChatGPT Competitor

#shorts #chatgpt #python #datascience #tech #coding

#shorts #chatgpt #python #datascience #tech #coding

Recipe Generator App from Cooking Videos using Whisper and ChatGPT

Recipe Generator App from Cooking Videos using Whisper and ChatGPT

Segment Anything Model by Meta AI: An Image Segmentation Model

Segment Anything Model by Meta AI: An Image Segmentation Model

One of the best #ai #books based on #tensorflow. #tech #coding #shorts #chatgpt #machinelearning

One of the best #ai #books based on #tensorflow. #tech #coding #shorts #chatgpt #machinelearning

Music Generation using Mubert #ai . #music #shorts #youtubeshorts #chatgpt #generativeai

Music Generation using Mubert #ai . #music #shorts #youtubeshorts #chatgpt #generativeai

Image to Text Prompt: Reverse Engineering AI Image Generation

Image to Text Prompt: Reverse Engineering AI Image Generation

Image Generation for #ramadan using #ai. #midjourney #chatgpt #shorts #youtubeshorts #islam

Image Generation for #ramadan using #ai. #midjourney #chatgpt #shorts #youtubeshorts #islam

How to build an AI-ready organization: Cultivating a Data-Driven Culture

How to build an AI-ready organization: Cultivating a Data-Driven Culture

Midjourney: Generate AI-powered Images

Midjourney: Generate AI-powered Images

Getting Started with Graphs: A Beginner's Guide (Part 1 of GNN Series)

Getting Started with Graphs: A Beginner's Guide (Part 1 of GNN Series)

Build India's First ChatGPT like App for Politics: BJP-GPT

Build India's First ChatGPT like App for Politics: BJP-GPT

Meet BJP-GPT.... @AIAnytime #bjp #news #shorts #tech #chatgpt #ai #youtubeshorts #coding #video

Meet BJP-GPT.... @AIAnytime #bjp #news #shorts #tech #chatgpt #ai #youtubeshorts #coding #video

ChatPDF... #chatgpt for PDF files. #ai #generativeai #shorts #youtubeshorts #coding #tech #ai

ChatPDF... #chatgpt for PDF files. #ai #generativeai #shorts #youtubeshorts #coding #tech #ai

Free AI Image Generation #ai #chatgpt #coding #tech #shorts #youtubeshorts #shortvideo #generativeai

Free AI Image Generation #ai #chatgpt #coding #tech #shorts #youtubeshorts #shortvideo #generativeai

Transform old photos into Vibrant Memories with Deoldify AI: Build a Streamlit App

Transform old photos into Vibrant Memories with Deoldify AI: Build a Streamlit App

Open Assistant: The Real Open-sourced LLM

Open Assistant: The Real Open-sourced LLM

Thanks to @YannicKilcherand team for the open sourced LLM Open Assistant. #ai #shorts #tech

Thanks to @YannicKilcherand team for the open sourced LLM Open Assistant. #ai #shorts #tech

Search Engine for AI generated images. #ai #tech #technology #generativeai #chatgpt #shorts #video

Search Engine for AI generated images. #ai #tech #technology #generativeai #chatgpt #shorts #video

Generative AI Video Platform "Synthesia" #shorts #youtubeshorts #ai #tech #chatgpt #generativeai

Generative AI Video Platform "Synthesia" #shorts #youtubeshorts #ai #tech #chatgpt #generativeai

Text to speech Voice AI platform. #shorts #youtubeshorts #ai #tech #technology #python #coding

Text to speech Voice AI platform. #shorts #youtubeshorts #ai #tech #technology #python #coding

Create Amazing Videos with ChatGPT and Pictory: Free AI-powered Video Creation

Create Amazing Videos with ChatGPT and Pictory: Free AI-powered Video Creation

Want to create beautiful video using #chatgpt and #pictory ? Watch the tutorial on channel. #ai

Want to create beautiful video using #chatgpt and #pictory ? Watch the tutorial on channel. #ai

Animate your photos using AI. Bring old family photos to life. #ai #tech #shorts #shortvideo #coding

Animate your photos using AI. Bring old family photos to life. #ai #tech #shorts #shortvideo #coding

Create a PDF Search and Summarization Tool in less than 100 Lines of Code: GPT-Index and Streamlit

Create a PDF Search and Summarization Tool in less than 100 Lines of Code: GPT-Index and Streamlit

Text to Video Generation using Videocrafter: Intuitive Math behind Latent Diffusion Model

Text to Video Generation using Videocrafter: Intuitive Math behind Latent Diffusion Model

Gamma AI: Create presentation PPT easily with #ai . #chatgpt #shorts #shortvideo #tech #coding

Gamma AI: Create presentation PPT easily with #ai . #chatgpt #shorts #shortvideo #tech #coding

Tripnotes: Free AI tools for your trip planning. #ai #chatgpt #shorts #youtubeshorts #video

Tripnotes: Free AI tools for your trip planning. #ai #chatgpt #shorts #youtubeshorts #video

Meet Bark (New Text to Speech Model): Clone Any Voice to Generate Music and Speech

Meet Bark (New Text to Speech Model): Clone Any Voice to Generate Music and Speech

Fliki: The free AI video creation tool. #ai #shorts #shortvideo #youtubeshorts #chatgpt #tech #news

Fliki: The free AI video creation tool. #ai #shorts #shortvideo #youtubeshorts #chatgpt #tech #news

Ask Anything Tool: Chat with Your Video using ChatGPT, MiniGPT4, and StableLM

Ask Anything Tool: Chat with Your Video using ChatGPT, MiniGPT4, and StableLM

HuggingChat: Open Source ChatGPT (Interface and Model)

HuggingChat: Open Source ChatGPT (Interface and Model)

This video tutorial teaches how to build a Recipe Generator App using Whisper, Pytube, ChatGPT, and Streamlit, covering the integration of these technologies for efficient and user-friendly recipe generation. The app uses automatic speech recognition, natural language processing, and machine learning engineering to generate recipes from cooking videos.

Key Takeaways

Use Whisper for automatic speech recognition on videos
Extract audio from videos using PiTube
Pass audio to Open AI model for transcription
Use GPT-3 for prompt and recipe generation
Simplify food recipes by summarizing videos
Create a virtual environment in Anaconda prompt and install dependencies
Import necessary dependencies and load the .env file
Use Streamlit to build a web application and integrate Whisper model for inference

💡 The key insight of this video tutorial is the integration of Whisper, Pytube, ChatGPT, and Streamlit for building a Recipe Generator App, demonstrating the potential of multimodal LLMs for efficient and user-friendly recipe generation.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

The Impact of AI on Learning: A High School Student’s Perspective

Discover how AI can support learning without replacing it from a high school student's perspective

Prompt Engineering in 2026: The Essential Skill for Working with AI

Learn prompt engineering to effectively communicate with AI and unlock its full potential

I Cut 1,300 Lines of CLAUDE.md — My Token Bill Dropped 75%

Optimize your CLAUDE.md file to reduce token spend by 75% without deleting rules

Fine-Tuning Embedding Models for Domain-Specific RAG

Learn to fine-tune embedding models for domain-specific Retrieval-Augmented Generation (RAG) systems to improve performance on private or enterprise data

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)