Building Python Reports with ChatGPT

DataCamp · Intermediate ·🧠 Large Language Models ·1y ago

Key Takeaways

This video demonstrates building Python reports with ChatGPT, utilizing tools like ChatGPT, GP40, DataCamp, and Plotly to create dashboards and analyze traffic data. It showcases the capabilities of generative AI in coding and data work, making these skills more accessible than ever before.

Full Transcript

all right all right all right I think we are live hello hello everyone I am so happy to be here to be presenting and going walking you through how to build a dashboard python dashboard using chat GPT now of course we're going to be using chat GPT today but this week is all about deep seek so everyone give us your hottest take on deep seek in the chat we'd love to see it let us know where you're joining from what you're looking to get out of today's uh codal along and well like yeah give us your hottest take on deep seek and what you think is going on in AI today also you would do us a huge favor if you like this video And subscribe to the channel it really helps us we try to give you as much as good as content as we can give you so I would really really appreciate if you can help us there it really helps out in the algorithm and gets us to spread the knowledge the data Andi skills to even more people H now with that said I'm going to share my screen I have a few things prepared here today and we're going to get start started so we're going to talk about how to build a python dashboard with chat GPT I'm going to walk you exactly where the conception for this Cod long came because it's actually based on something that we built here internally on the media team um so with that let's get started as hello as I mentioned my name is adal VP of media data Camp seeing tons of folk folks engaging in the chat really appreciate folks joining in lots of Germans today in the audience great to see that great to see that um yeah um VP of media data Camp I'm going to walk you through uh quite a few things uh on quite a workflow that we've done internally at data Camp using chat to build a dashboard that looks at our own traffic so I'm going to walk you through what we've done how we've done it and then uh we're going to attempt to rebuild it in less than an hour um so seeing here a deal from Germany May mentioning deep seek is simply a chaty killer uh well we'll see about that it's seems like Foundation models are becoming more commoditized I think that's the that's the main thing um but what works about chat it's also really great product uh we have folks from uh Morocco uh folks from the US great to see people joining from all over the world uh for this code along really appreciate Steve from moayad again really really appreciate it here and yeah let's get started so uh to prompt along I just need you to go to chat and sign up or sign in if you don't have sign up if you don't have an account sign in if you do have an account I'm going to be using gp40 uh results May Vary depending on which model you use for example you may use gp40 mini or gp4 not sure exactly what categorizes today under the plus tier versus the free tier but I'm using a team tier here today so it should necess it should work no matter what model uh you're using uh The Notebook environment where we'll be actually coding today is is going to be shared in the chat I'll walk you through it as well uh and with that I'm going to uh first explain what are we trying to build today so if you go on the blog on the data Camp blog you'll see quite a lot of Articles and tutorials that you know we try to provide as much uh useful learnings uh for uh for our Learners right um and that's actually created by these four people that you see on the right so they work with data scientists and AI experts to create content that is useful for you uh so if you haven't you know if you've read the blog or enjoyed the blog right uh these four people that you see here on the screen is uh are uh responsible for a lot of the the content that you enjoy so for example yet today so today is January actually yesterday we released an article on fine-tuning deep seek R1 in case you're interested in it right so make sure to check it out um but uh we create a lot of this content uh now that said when we look at how effective this content is uh the way we want to measure it is by trying to understand traffic right so we want to monitor traffic we want to understand it h and we want to try to see how many people are looking at this content right however when we looking at tools that will help us understand this right no tool was fit for what we wanted to do and we wanted something that helps us look at granual metrics for each article right uh we needed something simple that anyone can interpret and anyone can run really with a bit of coding experience so you don't need to be an expert python um coder you just need to be able to run a notebook and understand what's going on right we needed something that refreshes daily right uh we had the raw data so that's the problem we had the raw data but we needed to put it all together right um so instead of working with our analytics team who is working on much more like strategic projects like you know uh understanding sales attribution or you know other types of projects right we created our own dashboard with a combination of generative AI right um so we use the combination of the data Camp data lab AI assistant and chat to be able to uh create the report that I'm about to show you and um today we're going to be using exclusively chat because you know I don't expect everyone will have access to dat Camp data lab a assistant and Chach is widely used by everyone so uh with that what we're going to do we're going to build a dashboard that looks at you know what that we run every day that looks at the performance of cont content um for um a give for the data Camp media content a sample of data Camp media content that we published over the past year now I'll give a bit of a description of the data set where you can expect some of the updates have been done to it right so we can do this publicly but uh I'll go through this so with that said let's create it with chat GPT uh make sure to check out the notebook environment that we're sending in the chat right uh because you're going to need it to be able to code along and prompt along as well so this is the code along environment it's a data lab um so data lab is our own cloud-based IDE and um this is essentially uh the working environment that we'll be having today now I'll explain a few things first uh to be able to navigate uh a few different aspects of the uh of the data lab if you open here um uh open context panel and go to files right we have the workbook right so if the file is called workbook this is actually where we're going to be coding but there's also a solution code with all of the solution that was created by chat PT by me uh yesterday right um and we're also so this is a notebook environment as well so this is also an additional notebook in case we need it right so it's the same notebook as the workbook just in case we run into any errors the data is here it's in CSV file and I have a few pictures that are featured in The Notebook uh just in case you are curious um and essentially what we're going to do right I'll provide some context here over the data set right uh the data set what it does what it is is essentially a month over month traffic data for uh a bunch of different articles right so uh it kind of looks like this so I'm going to share it here we already have the data set imported right so for each title uh for each each row is essentially an article what type is it is it a data Camp blog or is it a data Camp tutorial what is the URL of this article so here we only have the URL after https ww. datac camp.com right so you just if you just add what I just said as a prefix to this URL you'll be able to access this article right uh the category has what subcategory it belongs to so for example alrix is on data analysis so on and so forth when was it published which date right um so it was published on the 9th of January of 20124 um what is the month that was published so it's just the first month of the year uh what is the published year uh when was it last updated um when was the last update year so on and so fourth right and then you see here this traffic data so this is of course data that was all synthetically generated I I'm not going to share our own internal traffic data with you otherwise our um our uh commercial team will be quite our uh data privacy team will be quite unhappy right uh so as you can see here we have quite a few different um you know values right um and just yeah I'll walk you through it so again title is the title of the content piece what type of it what type of it is it it's going be tutorial the URL the category of the content piece right and then traffic data is based on month again so I'll go through it so in January it had this traffic February it had that traffic March it had that traffic so on and so forth so just a few notes uh the metadata is all accurate so um you know uh here like the title the type the URL the category uh so on and so forth you'll be able to access this content right and see it for yourself this is all publicly available information right the published dat when was it last updated so on and so forth uh however the traffic data um has already uh has been uh synthetically created and uh for the purpose of anonymity and just just to follow up it does not respect the original distribution of the data right so this is all synthetically created um and so on and so forth um and then uh for the sake of the scod along as well I'm sharing with you the clean data set right uh in reality what we do we uh have uh using data lab we combine SQL cells with python cells to query our database right so based on URL we have all of this traffic in our big query um data warehouse we query this traffic based on URL right and then we also have our CMS data which we have access to the API uh through python where we query um that CMS data and combine it together to have uh and merge on the URL field and then we do a bunch of data Transformations and cleanings so just cleaning up the data set a bit to have the clean data that I just showed you that you can see here right um so essentially almost all of the metadata that you see here last update year month blah blah blah etc etc this comes from our CMS and then all of the traffic data that you see here all comes from our data warehouse um but again I'll repeat this data has been synthetically uh generated for this code along and this is not actually the data that you would expect uh data dat um so given this when we were trying to come up with this dashboard we were s we were thinking about answering questions such as um are we on track to hitting our monthly production goals right like we have goals about how many tutorials we want to commission and create this uh this month right you know is there an easy way to see if we're on track to that goal right are we on track with our monthly traffic goals like we have goals for each you know month on you know how much content we want to create and how much content we want views to be created so here we keeping track of uh uh traffic goals are we on track with our production goals year today so kind of cumulative throughout month over month are we on track for that right same for traffic right when an article is released how much traffic is it getting in its first month right so I'll give you an example you know uh we released an article on deep seek given how crazy deep seek um hype has been we've seen you know a lot of interest and traffic on that article so you know the month of January 2025 for example we've seen a large amount of traffic going for that particular piece right um what is our top performing content across blogs and tutorials so this is a question that we're going to try to answer as well and then how much traffic should we expect next month given all the content that is already live right um so what I'm going to do I'm gonna um download this data set here you know uh first what I'm going to do I'm going to import relevant pack packages right so I'm going to be uh using pandas right I'm going to be using plotly Express right and I have it here import plotly Express SPX I'm going to import npmp in case I need it and then from date time I'm going to import datetime and then I'm also importing uh plotly graph objects as go so I'm going to run this and then I'm going to import the data set right that you see here I can download it immediately from here so I can upload it to chat PT right um or I can uh um you know I already have it uploaded I can just download it from here as well download file so I recommend that you download the file so you can upload it to CHT and then we can get started from there uh before we get started right our goal is to build a pipeline that runs every day right so we're going to build functions that can be put together in a single pipeline right again one thing Chach is volatile right so I ran I was able to successfully reproduce this code along uh two days ago right but I'm going to use the same prompts I did in my test run right but if it doesn't provide a correct answer the solution code can be found in the solution section right uh on the right hand side right um sometimes I may do quick verifications in Google Sheets right because you know if the code is created May omit some rows for certain reason right I'm just going to jump into this file uh you should all have access uh view access to this right so if I put my sharing uh uh permissions you should have all um have access to this all you need to do is just run uh make a copy and then uh you have it from there right and I'll just be creating some pivot tables on the Fly just to be able to see you know if if my data is correct um essentially right so uh here I'm going to cancel this because I already have my uh pivot data here so I'm just going to put it to the left so I can switch immediately between chat PT and the um the sheet and then what I'm going to do now is I'm going to start with a prompt I'm going to upload my file which is already here I'm going to tell chat you're an expert data scientist I am providing you a data set containing data for different blogs and tutorials published throughout 2024 I want you to deeply internalize the data then I will provide you a set of functions to create that will provide me better understanding of my data so I'm G start asking chpt to understand the data right uh so it's going to import pandas it's going to determine the file path and then it's going to display the head right so on and so forth right and then it says you know has these many rows these many columns right and then it's gonna get the numeric data here um so with that that's it if folks are having I saw someone mentioning that they're having a hard time creating workbook if you're having a hard time creating a workbook let us know we'll create another duplication link and I'll make sure that we send that over in the chat I'm going to create it just in case I'm going to send it over to ree in case that is the issue and yeah um so with that said uh chat you just understood the data right and then we're going to create the first step is understanding production volume over time right so what I want to do the best way I want to visualize this myself is I want to plot using plotly a stacked bar chart showing production by type so blogs and tutorials month over month right um so ideally I just have one bar chart with two stacked bars um that shows production volume for tutorials and production volume for blogs uh so what I will do is I'm going to use this prompt where I say I want you to create a function that takes this input this data set and returns a stacked bar chart in plotly that shows production value of content month over month this column the column to use for production volume is publish month and the bar chart should be stacked based on the type column so let's do this here so it's already creating the input the input is data so publish month it's taking the publish month here taking the different categories it's ordering them okay and then it's creating the Stacked bar chart using plotly it's coloring by type and then it's laying it out here so it's going to tell me it's doing an error let's actually try this out because I think chat generally has an issue with plotly so my call file name is called traffic data 2024 oops and as you can see boom we have it so um so it's the chat error was not related to chat it's actually related to how it interprets HTML files when it shows them so as you can see here we have 126 15 12 uh 15 it's all we can see it's also by um by uh by chronological order so what I'm going to do just to double check this I'm going to open this pivot table here I'm going to just filter just just to sense check it right uh if I go on publish month I'm going to filter on everything released in January right so I see that we have released 18 articles in January so if I go here so 12 plus 6 plus 6 that's 18 right and then if I Do by type so I'm going to order them right 12 tutorials six blogs 12 tutorials six blogs so first prompt we got it right directly from the start and what I want you to focus on what I'm doing here is the quick feedback loop between checking the data checking chat PT checking the output and making sure that it's accurate right this is probably one of the best takeways from this code long when it comes to effective use of AI when it comes to doing data tasks for example right now we got our first our first uh chart now given I want to include this in pipeline what I will do is actually I'm going to do return fig show here right so each time I run this code it Returns the traffic right it Returns the image so if I just run the the code here I get the image anyways and what I'm going to do is I have this cell at the end that says put it all together in one single pipeline right I'm going to take this article so what this cell does it has a function called delivered Das board Pipeline and it takes as input my traffic data and essentially what this does is plots production volume over time and it runs the data right uh and for example if I run here run this pipeline it runs my traffic data and if I do this for my future functions I just need to run this function once and we're good to go now Step One is complete now I want to do step two step two is understanding traffic volume over time so I'm going to ask plot using plotly a stacked bar chart showing total traffic month over month so I'm going to ask it I want you to create a function that takes his input this data set and then returns a bar chart in plotly that shows total traffic month over month the month should be on the xaxis and the traffic on the Y AIS the month should be in chronological order because that's a mistake that CHP does quite often now I'm going to ask it here so CHP went on a tangent here as you can see and it's not understanding why it's not being able to render right um it's just simply because it's due to HML files I think and how jpt is is set up but U the original code work so I want you know to create a function that takes us input this state data set and I'm just adding my new code here so it's defining plot total traffic monthly so here months so on and so forth so one thing that's worth noting is that here we have published month [Music] oh yeah so it's defining the month names that are January February because if you remember our data set looks like this so it's defining all of these month names it's summing across all of these months right so we can get the total sum here right and then we get the total traffic Columns of month plus total traffic so we Group by here essentially right and then it's visualizing this data so I'm going to copy here step two I'm going to create def Define plot traffic monthly so it seems like it's correct so creating a list of months in chronological order I'm summing across my data I'm only including month and total traffic then it's creating categorical data here and then it's visualizing according to the Fig oops oh I need to change it here to 2024 okay so we have it here and we do have annoying um data labels but we can just ask HPT to remove data labels from the chart so actually the way I will do is I will visualize the chart I will screenshot it I'm going to copy it right now I'm going to ask chpt can you remove the data labels from the bars it is correct I just want to keep to hover so I said can you remove the data labels from the bars I just want to be able to hover on the bars to be able to see the values so now it's creating one called plot data traffic No Labels okay so what did it do here so we have labels month month total traffic total traffic I wonder what the updates that were done here fig update layout x-axis huh it doesn't seem like there were any updates so let's see I'm going to do 2024 ahuh okay so it worked I think what happened huh I would need to double check without data labels ah we have the text equals total traffic I think that's the problem yeah it was the text equals total traffic so we now have the Fig I'm going to do here as well fig show right and I'm going to take this here I'm going to run this again and then I'm going to add it to my pipeline article I'm going to do plots total traffic over time and going make sure that I update my input to data okay now this is step two that has been completed here now step three are cumulative metrics over time so what I want to be able to show is a bar chart that shows cumulative production of over time right and I want to also show a line chart that shows cumulative traffic over time so what I'm going to do is I'm actually going to create three set of functions here the first function takes as input this data set and then returns a stacked barart employe that shows cumulative production volume of content month over month the column to use for production volume is publish month and the bar chart should be stacked based on the type column now the second visualization I want to create is a function that takes as input the data set and then returns a line chart in plotly that shows cumulative traffic month over month the month should be on the x-axis and the traffic on the Y AIS and the month should be in chronological order and then the final chart I want to create is a multi-line chart that shows cumulative traffic month over month for the different types of content so the month should be on the x- axis the traffic on the Y AIS and there should be in chronological order the type column determines the multiple lines so with that let's get started here first set first function I'm going to post it okay it's taking a bit of time to think [Music] here okay it's getting started so Define plot cumulative production time stacked bar so we're defining the months that's great the publ months are being categorized into ordered categories we're grouping by publish month and type we're doing a cumulative sum by type and then we're adding a bar chart that is colored by type so it should and then we're updating the stack to the bar mode to stack so technically it should be correct but we need to always try it out so again months have been defined the publish month is created into categorical data then production volume we've grouped by those published month and we've found the number of Articles per month here by using the do size method and resetting index to count and then we've done a cumulative sum month over month over these data frames this data frame and then we're creating a uh figure here so with that said I'm just going to add 24 here and then boom so we did 12 articles six articles in January that is correct based on what we just saw earlier but what I'm going to do is look at February and January it should be 27 plus 18 that's 30 45 I think so let's double check so if I filter on both January and February 45 indeed as you can see here on the bottom right I'm going to hide my face a bit right or actually I'll just do it this way yeah 45 40 45 rows as you can see here so our data is correct and it is indeed uh categorized by by type um so first first function created now I'm going to create the second function the second function is I want you to create a function that takes this input this data set and returns a line chart imp plotly that shows cumulative traffic month over month so first thing I'm going to do I'm going to add show here I'm going to take this data here I'm going to run this again so we don't have a lot of output on the notebook I'm going to do here plot cumulative data over time plot cumulative production so second thing here we're at step three okay Second Step second substep here is that prompt that we just saw here so again we're going to take a create a function that takes his input this data set and returns a line chart in plotly that shows cumulative traffic month over Monon month this month should the month should be on the x- axis the traffic on the y- axis and the month should be in chronological order I see one Squad here saying joining late from South Florida even if late you're so welcome so I'm going to run this here I'm plotting cumulative traffic monthly right so again the months are being defined because again this is traffic data so the columns are based on the month name we're setting a sum for each month and then grouping them by similar to what we did on the uh traffic over month but instead what we're doing here is doing come sum which is cumulative sum and then we're doing instead a line chart so lines plus markers so we should be here good to go okay so if I go traffic data 2024 so again I'll go here we've Define the names of the months we're summing across traffic total traffic we have essentially here a total traffic data frame that just has the month and its total traffic and then we're doing we're ordering by the names of the months and then we're just doing a cumulative sum and building a line chart so I think this should be good to go indeed we see here January February so on and so forth the best way to verify this is looking at December in December we have 23.5 one6 million in traffic so if I just go on all of these and I sum them so here oops I think I have filtered stuff here so if I just go here in need 23.5 one6 in traffic so again I'll verify this 23.5 one6 million so it is indeed correct now I'm going to do here show and I'm also going to delete any Imports of packages because chat tends to be repetitive when it comes to this stuff so I'm G to also repeat here just to clean up the code make sure it's readable and then here what I'm going to do I'm going to take this cumulative traffic sorry for the scroll everyone and I'm going to do plot cumulative traffic monthly oh here I put traffic data 2024 it should be data because I'm taking my input from my function data and then just adding here plot cumulative traffic over time right so this is our second step here in step three what I want to do still is I'm going to run this again to remove any output and now I'm going to write this third prompt which is a multi-line chart so I want you to create a function that takes as input this data set and then returns a multiple line chart in plotly that shows cumulative traffic month over month but for different type of content so what I want instead of just one line two lines one for blogs and one for tutorials the month should be on the x-axis and the traffic on the y- axis the months should be in chronological order the type column determines the multiple lines okay now what it's doing here is cumulative traffic by type so we have the months the months are grouped by so they're grouped by type right and then months are summed okay and then for month and months if month equals January huh seems as it's adopting a bit of a strange method here so it's for each type okay okay so it's seem like it's Computing the cative summon it's for each um for each traffic type but what I would have done here is probably created two lists or two dictionaries for example one to store the uh one by type so here okay let's see it's I I'm anticipating that we may have an issue here so what I'm going to do is plot cumulative traffic again so traffic by type we're grouping by type and then summing over the different months and then for every month and month so if the month is equal to January we're doing creating a new column called cumulative plus month so cumulative plus plus January equals January okay else we're adding and then we're melting that data based on the months and we're just adding a variable name some not so simple data manipulation happening here but let's see how this works so I'm going to do 2024 oh wow actually it worked uh so 16.2 million for tutorials by end of year and then 7.3 million for blogs so let's just verify this real quick so if I filter right now my data only on tutorials so it's the type yeah it's here so I'm just going to do tutorials and then if I go on January till yeah 16 16.2 on tutorials that is correct and naturally if I go on blogs 7.3 million as you can see here so technically correct okay CH has not let us down so far now this is going to be tricky when I tried this one two days ago it created quite a few errors and I had to De debug my way so if I given that we are almost at the hour mark I want to make sure that we uh have time for questions so if we don't are not able to debug it in under you know 10 minutes or so um you can reference the solutions code if it doesn't work so we we still need to see now here this one is a bit complex because what we want to do is create a bar chart that shows traffic for articles released in a given month in said month so for example for all content released in January I want to see its traffic only in January for all content released in February I want to see its traffic only in February and I want it grouped by type right so this shows whether the content that we're releasing is you know a slow burn uh or it's like an explosive content that is immediately popular right so what I'm going to do here I'm gonna have a bit of a long prompt because I want jat to think step by step so I want you to create a function that takes this input this data set and then returns a grouped bar chart and plotly that shows for a Content piece released in a given month its traffic on that month so for any content piece released in January it's January traffic for any content piece released in February it's February traffic and so on and so forth here's I want you to approach it so I want to create a bit of a system design for this function so that Chach understands what it's trying to do step by step first we're going to create an empty data frame to store grouped values for each month of release this should not be data framed it should be data frame um for each month of release for example January we isolate the content and look at the traffic for in that month only for example January we Group by type of content we add the group data into the empty data frame we repeat this for all months and then we visualize group data using grouped bar chart the grouping should happen on the type column where for each month there are two bars side by side that shows show this for different types of content so again I'll repeat what I'm trying to ask of chat here it first creates an empty data frame that store data frame that stores grouped values then for each month so for example in January it only filters on content released in January and the traffic in that particular month and then it groups by the type so it looks at tutorials released in January and its traffic in January only and blogs released in January and it's traffic released in January only and it adds that grouped uh data into the empty data frame that we initialized earlier and then we repeat this for all months and then visualize so I'm going to copy paste it here and then I'm going to see what happens okay so first is finding the months then it's creating an empty data frame to store months value so month type traffic then for each month and months it's going to look at monthly content okay it's looking at the publish date okay if not monthly content empty monthly content. Group by type. month. suum reset index okay so here if the data is not empty we're grouping by type in that month only and then we're summing okay and then we're assigning that into the month data and then we're grouping it so we're concatenating it on the concatenate concatenating it on the group data variable and then we R nameing columns appropriately then we're reordering and then we're visualizing so let's see how this works hopefully it works from the get-go so I'm going to go here for 2024 so again I'll repeat it's defining of the months it's grouping it's creating an empty data frame called group data and then for each month we're filtering on data where the publish month ends with the [Music] um ends with the third so month three okay where now I understand where the publish month ends with the same month as we see here because we have the year month okay with that said let's see how it works haha it's an empty data frame so uh the data framed I think one of the issues here related is that if you look at the data published month should not be 2024 4 01 02 probably better to have it by name right uh the visualization returns an empty data frame consider uh changing the publish month column values into month names before um isolating on the my name that way it is okay so again what I think is the issue is that it's trying to match this field on these names and we need to convert these into nouns month names before we actually do um any isolation on the data so it created it so month mapping it mapped theth the months manual it mapped the months to the mapping and created a list of the values if publish month equals the month then we're grouping by here so let's see how that works okay I'm going to do 2024 oh we have another error so okay these are correct okay so let's just copy paste the error interchange oh it's already going through the errors itself huh so it seems like there was a mistake at the month mapping level okay it seemed like it already fixed its own error let's see so here what it did is month code ah it looked at it added the mapping here as well so I'm going to try this before we give it the error again so I copy p pasted this and I'm going to do 2024 still not working um so I say it is still returning an empty data frame I want to be mindful of time so the solution code does have the solution for this one um M so we're doing the month mapping again so let's see how this one performs instead to [Music] it's still returning an empty data frame I think the main issue related here is that the month name the mapping between month name and month and how the month month is formatted here is not necessarily being uh represented correctly so here this requires like actually jumping in and debugging this code [Music] add value error statements if so let's add value error statements in case if group data is empty so what I'm trying to do here is that the the function returns an error message [Music] um in case the group data is empty which is I think what is going on so let's see here if this is indeed what is going on 2024 uhuh yeah the group data is this might do or missing value in publish month so at least we know what our issue is and with that though I will share this data frame here this cell that we have here this does have the correct code and your traffic your visualization should look something like this um because we still have 10 minutes and we still have a couple of steps that we need to wrap up um and I want to be mindful of time so do check out the solution code for this one uh and I'm happy to debug this with you if you reach out to me I'd love to jump in on this now that said let's go back to the solution code I will ignore this for now uh and we can always return to it if we have additional time I'm going to jump in on step five which is post a table of the top blogs and tutorial published so the what I just want to see here is a table that shows our top logs and our tutorials sorted by total traffic right so what I want to ask it here is I want you to create a function that takes this input this data set and then returns two tables in plotly one a table that returns a table of all of our top blogs sorted by the highest total traffic to lowest the table should contain the name of the article the type of the article and its total traffic to date the second table should return all our top tutorials s by highest total traffic to the lowest same thing it should contain the name the article the the name of the article the type of the article and its total traffic today so with that said I'll ask chpt here to create the tables it's going to create a top content table it's going to go for our months to be able to traffic to define the traffic it's looking at our top blogs so for any data blogs of type blogs it's sorting values by total traffic so that's been cons so that's been uh computed same thing for top tutorials it's isolating the column names and then it's creating a table for each of those which is great see I think this is going to be correct from the first try so I'm going to do plot here and what I'm going to so we already have a show figure here I'm going to do 2024 and as you can see boom we have you know AWS interview questions third blood so on and so forth right these are top logs and these are top tutorials amazing so what I'm going to do here is I'm going to add the show here what I'm also going to do is add the show here and then I'm going to take this I'm going to add this to my pipeline output tables of top blogs and tutorials and I'm going to take this input data now the last step that I want to show you is something that we look actually quite closely which is extrapolated monthly traffic um so looking at extrapolated monthly traffic I want to First explain it before I um I go into it right if you look at an article that was released you know last week or this week right um assuming that article has done you know 1,000 views in a span of 7 Days right we can estimate what it will give us in a month right by using the simple formula of total traffic gotten so far from the day it was released till today divided by the number of days it's been live for times 30 right so we calculate its daily traffic times 30 this should be how much it will give us on a monthly basis so what I want is to be able to just see one view like one bar that tells me how much monthly traffic should we expect to get given our traffic year to dat right uh so for example if we created 100 articles and the total monthly traffic potential of them is like 1 million then if we stop creating content next month we should probably get around 1 million in traffic from that content right so that's a good measure to see of like how much is your existing content going to give you next month the following months so on and so forth right so the prompt that I will use here is I want you to create a function that takes as input this data set and then returns total traffic potential for the content and a plotly bar chart here's how I want you to approach it for every article calculate its traffic potential its traffic potential is its total traffic divided by the number of days it's been live for times 30 we should uh we should use the public publish date column to calculate number of days since it's been live from today so I'm going to update this we should use the publish date column to calculate number of days since it's been live from today and then create a bar chart in ply it should be a single bar that shows us total traffic potential for the content so I'm going to go here I'm going to put in my okay so it's going to plot traffic potential what it's going to do is Define the traffic month since related right it's going to compute all the total traffic so far an article has gotten then it's going to take publish dat create uh and convert it to date time it's going to set errors equals coers which means that even if you get an error coer it U it's going to get today's date right and it's going to calculate days since live so today minus data publish dat in days using date time and then it's going to create the traffic potential column so traffic potential equals data total traffic divided by data day since life times 30 and then it's going to sum up the traffic potential for all of these articles and then it's going to create one bar chart that shows it I think this is going to be correct so let's see right and I'm going to do here 2024 it should be around 3 million or so indeed 3052 million for this content and what I want you to note is that we're calculating the difference from today which is January uh 28th of 2025 where the last signal of traffic given this is 2024 content and data was December of 2024 so we're probably underestimating a bit because we're essentially dividing for some content that was released in January by 13 months but by capturing up to 12 months of traffic only so what I'm going to do here is I'm going to add show and then I'm going to copy paste this when I mov because it's already been date time has already been imported above and I'm going to take this and then generate traffic potential and I'm going to put data right and then essentially if I did my job correctly here and I run this and I schedule this to run using so if I have here run I schedule this to run every day using data lab then I will get a daily report over how traffic is performing so let's run this see how it's doing so we have oh we have a little bit of an error here what's going on wa I'm going to run this again from scratch let's see what happens I'm going to first delete step five okay we're back to good monthly production volume of content stacked by type total traffic month over month cumulative monthly TR production volume stacked by type cumulative traffic month over month right we also had that third one which I think we didn't add the cumulative multi-line chart yeah we didn't add the multi-line chart so let's get this one let's do here fig show by type and then okay data okay so I got monthly production volume of content total traffic month over month cumulative monthly production of volume of content month over month stacked by type cumulative traffic month over month cumulative traffic month over month by content type top logs by total traffic top tutorials by total traffic top total traffic potential for all content and again if I remind you if I do scheduled run run on a recurrent basis so I said this to yes daily at 8 a.m. in the morning we just saved the majority of time that we spend on reporting for our team here on the editorial team on at data Camp um you know so we created essentially putting it all together I'm going to make sure that we answer questions but you know to recap what we did is we created a simple dashboard that can be used every day by a Content team in under 60 minutes right we created we used the pipeline Paradigm that makes this flexible right so for example what we've done in our own work right to make this flexible for next year and the year Beyond is that we have an input that says desired year and then this filters on all of the content that was created next year this year so so on and so forth right um so a pipeline framework makes it flexible you don't need to do many code changes over time right and that said this is a prototype right so like how we made this production ready right we did the following steps one we connected it to our data warehouse so no need to get data manual um it runs daily um connects to other data sources like time spent on page other quality sources so on and so forth right and then the type of questions we were able to answer is like you know does updating content helps right what content needs to be updated anyways what content is declining in traffic um what is the different performance of content in AI versus Cloud versus um you know Excel versus python so on and so forth um which content is declining right um what is the most effective piece of content by extrapolated monthly traffic right so and you know we were able to do this in 55 minutes or so right if we had an extra couple hours we were able not only to answer these questions but debug that you know problem that we had on the first month and this all shows you just how a paradigm shift is coding with llms right and my advice here is you know keep the feedback typed um and in the same context so don't try to avoid having multiple chats keep the feedback loop really tight with chat GPT keep it in the same context that way it remembers what your data set is if you remember the first prompt I ever had here was internalize the data understand it right and that's a pretty important step here uh focus on close-ended problems give system design tips when it struggles right uh it didn't work out for me but it it did on the practice run um make sure to review the code and the output don't take it for granted like you saw how we were testing out um on Google Sheets making sure that the data was accurate right because you don't want to be in a situation where you present a dashboard and the data is in accurate right uh and most importantly framing the question it's super important really really important um and that's one of the keys to succeeding with coding with jpt so with that I'll go to q& so let's answer some questions so I'm going to keep keep the slide here just in case um I see only a couple of questions so far I see a third one if I'm not mistaken um okay so first one are you able to export the data into Libra office scal so I'm not exactly sure what this is what is Libra office yeah so technically it uh if it runs on CSV yes you should be able to do this um another question from Carlo Lee so um car Lee sorry very impressive so far is there a recommended process for assessing how many prompts you need to break up a complex request like this or is it more trial and error thing I would put it as a trial and error thing to be honest because at the end of the day um you don't want to be able you want to be able to understand sometimes like because maybe I'll reframe at the at the end of the day um it's going to be touch and go based on how like effective CHT is within the context that you're creating uh the code in right um so for example if I started off with a completely different data set I was like here here's a new data set update now the framework right uh the the the request it will probably answer different things um so and if you have a really strong understanding of the system design that you want to go for then yes maybe doing step by step is really good uh another question here um that I'm seeing how can I optimize the performance of a python dashboard um quite a few different ways that you can do this um you know think about well it depends on the quality of the code that you have right so Chach is not necessarily welln for writing most um um effective or useful code but it gets the job done um but you can probably use it to um you know look at the code make it more effective so on and so forth right um and then uh so that's that's one but two it also comes with experience this is like what experience coders actually know it's like how to make code run more effectively right so do check out for example we're going to release a video on like three tips for like faster pandis code so one thing that we could have done here Allah lot is probably use the chain method or aign instead of creating new new column so on and so forth so another question I'm in newbi how do you embed the data into a dashboard hope it's not a stupid question nothing is a stupid question so uh if I'm not mistaken uh I think you are referring to how do I embed the data like this um into the dashboard is that correct let me know comments if that is correct because I will be answering that accordingly um here actually we're using a um a plotly uh figure called table right so go. table right and we're able to display the table accordingly um that said uh I do think that pandas also has a display function if I'm not mistaken and you're able to also display as an HTML for example in your in your notebook all right I don't see any additional questions I have to say it's been a pleasure it's been an honor I really really appreciate everyone joining in for this code along I hope it was useful let me know LinkedIn if it wasn't I'd love to see what we can do to improve uh the the codal along experience for you guys just one thing I would really really appreciate it if you like And subscribe to the YouTube channel it really really helps us achieve our goals and it would really get us in a good spot uh for next week goongs um only if you enjoy the content no requirements uh but I would really appreciate it with that said I'll give everyone back uh you know I took five minutes extra from your day so I really really appreciate those who stuck around to the end and with that have a great week

Original Description

Resources (including link to code along notebook): https://bit.ly/3Q08XbT Code Along with Us! https://bit.ly/3PUR6Dg Generative AI is breaking down barriers in coding and data work, making these skills more accessible than ever before. By transforming natural language into functional code and automating complex workflows, tools like ChatGPT can help anyone analyze data. In this session, Adel Nehme, VP of Media at DataCamp, will walk you through how to build a Python report using ChatGPT. Throughout the code-along, he’ll outline best practices when building reports using ChatGPT (GPT-4o), build a full-fledged report analyzing DataCamp media content, and cap off with additional tips and tricks for being effective with Python using ChatGPT.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DataCamp · DataCamp · 0 of 60

← Previous Next →
1 SQL Server Tutorial: Date manipulation
SQL Server Tutorial: Date manipulation
DataCamp
2 R Tutorial: Intermediate Interactive Data Visualization with plotly in R
R Tutorial: Intermediate Interactive Data Visualization with plotly in R
DataCamp
3 R Tutorial: Adding aesthetics to represent a variable
R Tutorial: Adding aesthetics to represent a variable
DataCamp
4 R Tutorial: Moving Beyond Simple Interactivity
R Tutorial: Moving Beyond Simple Interactivity
DataCamp
5 Python Tutorial: Why use ML for marketing? Strategies and use cases
Python Tutorial: Why use ML for marketing? Strategies and use cases
DataCamp
6 Python Tutorial: Preparation for modeling
Python Tutorial: Preparation for modeling
DataCamp
7 Python Tutorial: Machine Learning modeling steps
Python Tutorial: Machine Learning modeling steps
DataCamp
8 R Tutorial: The prior model
R Tutorial: The prior model
DataCamp
9 R Tutorial: Data & the likelihood
R Tutorial: Data & the likelihood
DataCamp
10 R Tutorial: The posterior model
R Tutorial: The posterior model
DataCamp
11 R Tutorial: An Introduction to plotly
R Tutorial: An Introduction to plotly
DataCamp
12 R Tutorial: Plotting a single variable
R Tutorial: Plotting a single variable
DataCamp
13 R Tutorial: Bivariate graphics
R Tutorial: Bivariate graphics
DataCamp
14 Python Tutorial: Customer Segmentation in Python
Python Tutorial: Customer Segmentation in Python
DataCamp
15 Python Tutorial: Time cohorts
Python Tutorial: Time cohorts
DataCamp
16 Python Tutorial: Calculate cohort metrics
Python Tutorial: Calculate cohort metrics
DataCamp
17 Python Tutorial: Cohort analysis visualization
Python Tutorial: Cohort analysis visualization
DataCamp
18 R Tutorial: Building Dashboards with flexdashboard
R Tutorial: Building Dashboards with flexdashboard
DataCamp
19 R Tutorial: Anatomy of a flexdashboard
R Tutorial: Anatomy of a flexdashboard
DataCamp
20 R Tutorial: Layout basics
R Tutorial: Layout basics
DataCamp
21 R Tutorial: Advanced layouts
R Tutorial: Advanced layouts
DataCamp
22 Python Tutorial: Time Series Analysis in Python
Python Tutorial: Time Series Analysis in Python
DataCamp
23 Python Tutorial: Correlation of Two Time Series
Python Tutorial: Correlation of Two Time Series
DataCamp
24 Python Tutorial: Simple Linear Regressions
Python Tutorial: Simple Linear Regressions
DataCamp
25 Python Tutorial: Autocorrelation
Python Tutorial: Autocorrelation
DataCamp
26 R Tutorial: The gapminder dataset
R Tutorial: The gapminder dataset
DataCamp
27 R Tutorial: The filter verb
R Tutorial: The filter verb
DataCamp
28 R Tutorial: The arrange verb
R Tutorial: The arrange verb
DataCamp
29 R Tutorial: The mutate verb
R Tutorial: The mutate verb
DataCamp
30 R Tutorial: What is cluster analysis?
R Tutorial: What is cluster analysis?
DataCamp
31 R Tutorial: Distance between two observations
R Tutorial: Distance between two observations
DataCamp
32 R Tutorial: The importance of scale
R Tutorial: The importance of scale
DataCamp
33 R Tutorial: Measuring distance for categorical data
R Tutorial: Measuring distance for categorical data
DataCamp
34 Python Tutorial: Plotting multiple graphs
Python Tutorial: Plotting multiple graphs
DataCamp
35 Python Tutorial: Customizing axes
Python Tutorial: Customizing axes
DataCamp
36 Python Tutorial: Legends, annotations, & styles
Python Tutorial: Legends, annotations, & styles
DataCamp
37 Python Tutorial: Introduction to iterators
Python Tutorial: Introduction to iterators
DataCamp
38 Python Tutorial: Playing with iterators
Python Tutorial: Playing with iterators
DataCamp
39 Python Tutorial: Using iterators to load large files into memory
Python Tutorial: Using iterators to load large files into memory
DataCamp
40 SQL Tutorial: Introduction to Relational Databases in SQL
SQL Tutorial: Introduction to Relational Databases in SQL
DataCamp
41 SQL Tutorial: Tables: At the core of every database
SQL Tutorial: Tables: At the core of every database
DataCamp
42 SQL Tutorial: Update your database as the structure changes
SQL Tutorial: Update your database as the structure changes
DataCamp
43 Python Tutorial: Classification-Tree Learning
Python Tutorial: Classification-Tree Learning
DataCamp
44 Python Tutorial: Decision-Tree for Classification
Python Tutorial: Decision-Tree for Classification
DataCamp
45 Python Tutorial: Decision-Tree for Regression
Python Tutorial: Decision-Tree for Regression
DataCamp
46 Python Tutorial: Census Subject Tables
Python Tutorial: Census Subject Tables
DataCamp
47 Python Tutorial: Census Geography
Python Tutorial: Census Geography
DataCamp
48 Python Tutorial: Using the Census API
Python Tutorial: Using the Census API
DataCamp
49 R Tutorial: A/B Testing in R
R Tutorial: A/B Testing in R
DataCamp
50 R Tutorial: Baseline Conversion Rates
R Tutorial: Baseline Conversion Rates
DataCamp
51 R Tutorial: Designing an Experiment - Power Analysis
R Tutorial: Designing an Experiment - Power Analysis
DataCamp
52 R Tutorial: Introduction to qualitative data
R Tutorial: Introduction to qualitative data
DataCamp
53 R Tutorial: Understanding your qualitative variables
R Tutorial: Understanding your qualitative variables
DataCamp
54 R Tutorial: Making Better Plots
R Tutorial: Making Better Plots
DataCamp
55 SQL Tutorial: OLTP and OLAP
SQL Tutorial: OLTP and OLAP
DataCamp
56 SQL Tutorial: Storing data
SQL Tutorial: Storing data
DataCamp
57 SQL Tutorial: Database design
SQL Tutorial: Database design
DataCamp
58 Python Tutorial: Introduction to spaCy
Python Tutorial: Introduction to spaCy
DataCamp
59 Python Tutorial: Statistical Models
Python Tutorial: Statistical Models
DataCamp
60 Python Tutorial: Rule-based Matching
Python Tutorial: Rule-based Matching
DataCamp

This video teaches how to build Python reports with ChatGPT, covering topics like data analysis, visualization, and dashboard creation. It demonstrates the use of generative AI in coding and data work, making these skills more accessible than ever before. By following this video, viewers can learn how to create effective prompts for ChatGPT, use ChatGPT for data analysis and visualization, and build dashboards with multiple visualizations.

Key Takeaways
  1. Create a dashboard to analyze traffic data using Plotly
  2. Import a CSV file containing month-over-month traffic data for articles
  3. Use a notebook environment to code and prompt along with ChatGPT
  4. Define a function to create a bar chart in Plotly that shows total traffic month over month
  5. Create a cumulative sum of traffic data by month and type
  6. Use Plotly to create a multiple line chart of cumulative traffic by type
💡 Generative AI is breaking down barriers in coding and data work, making these skills more accessible than ever before. By transforming natural language into functional code and auto-generating visualizations, ChatGPT can be used to create effective dashboards and analyze traffic data.

Related AI Lessons

How We Translate 300-Page Books Using Claude Without Hitting Token Limits
Learn how to translate long documents using Claude without hitting token limits by breaking them into overlapping chunks
Dev.to · 龚旭东
Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking
Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve model performance
Medium · AI
Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking
Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve LLM performance
Medium · LLM
A simple way to test model fallbacks with RouterBase
Learn to test model fallbacks with RouterBase using a simple fallback wrapper and OpenAI-compatible API surface
Dev.to · routerbasecom
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →