Analyzing Coronavirus with Python (COVID-19)

NeuralNine · Intermediate ·💻 AI-Assisted Coding ·6y ago
Today we are going to use Python with Pandas and Matplotlib to make some statistical analysis of the coronavirus or the CoVid-19. Datasets Link: https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases ◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾ 📚 Programming Books & Merch 📚 💻 The Algorithm Bible Book: https://www.neuralnine.com/books/ 🐍 The Python Bible Book: https://www.neuralnine.com/books/ 👕 Programming Merch: https://www.neuralnine.com/shop 🌐 Social Media & Contact 🌐 📱 Website: https://www.neuralnine.com/ 📷 Instagram: https://www.instagram.com/neuralnine 🐦 Twitter: https://twitter.com/neuralnine 🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/ 📁 GitHub: https://github.com/NeuralNine 🖥️ My Coding Setup 🖥️ ⌨️ Keyboard: http://hyperurl.co/neuralkeyboard 🖱️ Mouse: http://hyperurl.co/neuralmouse 🖥️ Monitor: http://hyperurl.co/neuralmonitor 🎙️ Microphone: http://hyperurl.co/neuralmicrophone ✏️ Drawing Tablet: http://hyperurl.co/neuraldraw 🎵 Outro Music From: https://www.bensound.com/

What You'll Learn

The video utilizes Python with libraries such as pandas and matplotlib to analyze COVID-19 data, focusing on statistical analysis, data visualization, and growth rate simulation. It covers various aspects of COVID-19 data analysis, including data preprocessing, visualization, and speculative modeling.

Full Transcript

what is going on guys welcome to this video chances are you are in quarantine right now and the reason for that is the corona virus the kovat 19 virus and in this video I want to talk a little bit about this virus but not really from a societal perspective or from a medical perspective or from a political perspective but from a mathematical and programming perspective so we're going to talk about how to analyze the covert 19 virus in Python how we can use libraries like numpy pandas and Matt pub Lib to get the statistical analysis of this virus how it spreads what's the mortality rate the growth rate and so on so this is what this video is about we're going to visualize the individual things in graphs like this one so this would be the tool confirmed cases of kovat 19 all over the world so we have Austria Italy us Spain France Germany India and we could add more countries if we would like to do that so this is what we're going to do in today's video so let us get into the code now of course before we can get into our analysis what we need is some data we need some data from the internet and one of my favorite pages right now is world domino's dot info slash coronavirus I mean favorite is probably a bad word for this but here you can see the most current data it's always updated almost always updated where you can see how many cases we have how many deaths we have how many recovered cases we have the problem with this page however is that it does not offer any CSV or XML or JSON files we basically just have this and have to work with it so of course we could do some web scraping with the bs4 library and extract the information here but instead of doing that I found another page which is this one here which is humm data or human data I don't know and it basically has some corona virus data in the format of CSV file so we have the confirmed cases we have the deaths and we have the recovered cases in CSV files here so you just have to download them and that's it I've done this already and I have them on my desktop so let's take a look into them and see what they look like as you can see here we have the province or the states here we have the country or the region then we have some coordinates with latitude and longitude and then we already have it's not easy to see right now but then we have the dates here and the number of total confirmed cases so this would mean that at the 22nd of January 2020 this country had two cases than three cases five seven and so on and this goes all the way up until yesterday so today is the 23rd and it goes up until the 22nd so tomorrow it will probably also include today so you have to update this file or actually you could also just download it into python directly if you want but when you have this file right now what you do is you go ahead copy all these files or actually you not copy but move them and put them into the directory off your Python file and then you can just rename them because the names are pretty long so you could just say covet 19 underlined confirmed dot CSV and this would be the deaths so we could say kovat 19 deaths dot CSV and then we have kovat 19 underline are recovered that CSV and this is how you get the files themselves so how you get the data and now we're going to talk about how to get this data into our script now before we can actually start doing anything with this data we need to import a couple of libraries and the most important library off today is pandas we're going to give it an alias of PD and also we're going to import matplotlib dot type lot as PLT these two libraries are data science libraries if you don't know them check out my data science tutorial series there I teach you how to use them basically so these two libraries are what we're going to need today and the first step right now is to import the basic CSV files into our script as data frames so we're going to say confirmed equals PD dot read CSV and we're going to pass kovat 19 underlying confirmed for confirmed cases and then we're going to do the same thing with the deaths and the recovery so we're going to say our deaths and recovered and of course we have to change it here oh I forgot to CSV sorry covert 19 underline deaths dot CSV and covert 19 underlined recovered dot CSV so these are the three files and they're now in our script but we cannot really work with them yet because they don't have the format that we need because if I go ahead and print confirmed for example we'll see that the structure is not really optimal for calculations Oh this was an old script sorry there you go so basically you can see that we have a province state with a lot of nada numbers with some random index here then we have the country and region or or the region then we have the latitude the longitude and then we have all the dates now what I want to have here in order to work with this data as efficiently as possible is I am NOT interested in the province or the state so what I'm going to do is I'm just going to drop that column and also I'm going to aggregate to combine all the individual country entry so for example if I have a lot of entries for the US for the United States I have New York I have California I have all these individual things I'm going to send them up into one entry which is us so I have one row for the US data one row for the China data for Japan for all the individual countries but I'm not going to split it further up into provinces and States also we're going to drop the coordinates because we're not really interested in them I mean they might be interesting depending on what you're trying to implement here but for our statistic analysis it's not really important so we're going to drop these columns so the first thing that we're going to do is we're going to say confirmed equals confirmed dot drop this is the function we're going to use in order to drop this and what we need to do is we just need to specify the column name so we're going to say drop province state and all that and I think the other one is long and we need to specify access equals one so then we can go ahead and see if this worked and as you can see we now have the country the region whatever and we have all the dates now the next step is to somehow aggregate these values because as I said we have a lot of us entries right now duplicate entries actually they're not duplicate they're just the same country but they have different numbers so we need one entry for the u.s. so what we need to do is we need to group by country so we're going to say confirmed equals confirmed dot group by and in this group by method I'm going to pass confirmed and I'm going to pass that country region calling region sorry and we're going to use the aggregate function here so what we're doing here is we're grouping by the country and a region and the result shall be aggregated so we're going to see aggregate and what aggregation are we going to apply we're going to sum them up there are also other aggregations but in this case we are interested in the sum and this is what we're going to do for all the things that we just imported so we're going to do the same thing for the deaths we're going to do the same thing for the recovered and let's just not make any mistakes here because debt would be stupid should be fine and then we do the same thing here again so we're saying deaths and actually confirmed was the first one sorry second one deaths covert deaths and don't forget this one your deaths and recovered they go we can now print it and one last thing needs to be done I mean actually doesn't need to be done but it's the way I prefer it because now what we have is we have the country's here as the key and then we have the dates the individual dates as the features or actually both are the keys but and the number then is the feature but I would like to transpose this and have it the other way around so the role shall be the dates and the column shall be the individual country so what I'm going to do is I'm going to say confirmed equals confirmed dot T which is which returns a transposed version of the state of frame so deaths equals deaths dot T and then recovered equals recovered dot t and when we now print this you'll see that we have it the other way around and yeah that's how we wanted we have the countries here the dates here and these are the confirmed cases now let's just check for the other two data frames as well just to make sure I didn't make any mistakes and then we can get to the next step yep it's fine as you can see so let us get into the next step so in the next step what we're now going to do is we're going to use the data that we already have to create some new data or to calculate some new data because we're not actually creating it we're using the data that we have to extract some new information because we now have the confirmed data the deaths and two recovered cases and with these three values we can calculate growth rage we can calculate the active cases we can calculate death rate we can calculate the recovery rate and all of that so the first thing that I want to do is I want to calculate the growth I want to know how much more people are infected every day because I have the confirmed cases and I don't have any increases in infections here I just know that on one day for example I have a hundred cases and the next day I have 100 in two cases but I don't have a data frame that tells me okay this day you have two more cases and also I don't have a data frame that tells me this is two percent more than yesterday and this is what I want so what I'm going to do now is I'm going to create first a new cases data frame and for this I'm just going to copy the confirmed cases so I'm going to use a copy function here it's very important that you use the copy function and don't just say equals confirmed because then you pass the reference and when you change new cases you also changed confirms so if you want to just copy the array always use the copy function here not your race already data frame so in this case we have a copy but this copy shall now be changed so we're going to say for every day in and now we're going to specify a range and this range shall go from the first day that we can do a calculation for which is the second day because what we're going to do now is we're going to take the cases today - the cases yesterday to see how many new cases we have and of course this does not work with a zero because we cannot look at the day before that so we're going to say from range 1 instead of 0 up until the length of the data frame which is basically how many rows it has and then we can say for each day we're going to say new cases dot Eyelock day so at this particular day we have confirmed of this day - confirmed of the day - one which is the day before and then we get a data frame full of the new cases so we can go ahead and just print it new cases our new cases dot tale maybe the last 10 entries and then we can see how many new cases we have ok we have an error here what is the error of course I forgot to use eye lock which is basically looking for the index because of course our key is a date we cannot just user number but now it works and you can see this basically means not that in Albanian we have ten cases but this day and this particular day we have two new ten new cases sorry basically saying that we have ten more cases than yesterday then we have five more cases than yesterday and so on and actually if you want to double check that we can just go ahead and say confirmed dot tale ten and you can then see hopefully that this is actually the case so you can see on the 13th of March we had 33 cases in Albania and on the 14th we have 38 which is five more and you can see yes in fact on this date we had five more than yesterday then we have 42 which is four more as you can see four more than yesterday this is how we calculate new cases and now we want to do is we want to calculate the growth rate so we want to know how much percent is this because if a country like China has ten more people it's not that much of a big deal but if a country like Switzerland or Austria or the Netherlands has ten or a hundred more cases this is a much higher number because we have less population here in Europe for example in contrast to China so what we're going to do now is we're going to say growth rate equals confirmed down to copy so we're going to just make another copy the reason I make a copy is to just copy the structure and then I change all the values so then we say again four day in range and we're basically doing the same thing for day in range length confirmed but this time we're going to say growth rate dot I log day equals and we're going to say basically new cases off this particular day so how many new cases do we have at this day divided by the confirmed cases yesterday so we're looking at the number of new cases compared to the number of K that we had yesterday to determine the growth rate in percent and what I'm also going to do here is I'm going to multiply with a hundred just because later on in the math public visualizations I don't want to see zero point eight but I want to see eighty percent so I'm going to say a 100 here times 100 here and then we have the growth rates and we can go ahead and print growth rate dot tail 10 to see the last 10 entries and they go you can see that the case that we looked at for the five new cases where 15% which means that the five cases were fifteen percent of the total are totally infected people a day before and you can see that this varies quite a lot so we have 43 of course if a country has one infected person and on the next day we have I don't know two infected people then you have a 100 percent growth rate we're just not really interesting because you're just dealing with two cases what you can see here is not a number and infinite which happens when up until now we had zero cases here and this is actually the first case so you have basically from zero to any number will be that number divided by zero and it's actually not defined but in Python this is infinity but you can start calculating from the next value on so this is how you calculate the new cases and the growth rate now these two values are definitely interesting but what is even more interesting is how many active cases do we have because of course I can talk about the growth rate but actually if more people are recovering than getting sick I actually have a negative growth rate but in this case I would get a positive growth rate because of course if two people get sick it's two more people even if a thousand people recover on that day it's still two plus because I'm looking at the absolute totally confirmed cases so they are not getting less they're always getting more or staying the same so we cannot decrease that number so what we're actually interested in is we want to know how many active cases do we have and what is the overall growth rate what is the actual growth rate of these cases so what we're going to do now is we're going to say active cases equals guess what confirmed our copy and now we're going to say for every day in range and this time we're going to start with zero I'm going to explain why in a second so we're saying from zero to the length we're starting with zero because here I don't have to go one day back I'm not looking at past data I'm not looking at yesterday I'm looking at this particular day and the amount of active cases in a certain day so I lock day is actually just the amount of confirmed cases in that day - all the people that have died from this disease unfortunately so basically deaths dot i lock day up until now and also - all the people that have recovered fortunately from this disease so this is obviously true because you have all the cases that were ever confirmed and then you just subtract all the people that have died from this disease and all the people that have recovered because the rest of the people are people that didn't have any outcome yet so this are these are already active cases and of course right now I can also calculate the overall or the actual growth rate and for this again I copy this and here again we sorry here again we say from range 1 to lengths confirmed because here I'm looking back so we're going to say that the overall growth rate at a certain day is based on the actual cases so we're going to say that the active cases on that day on that they - the cases of the active cases off the day before they - one basically this number here divided by the number of active cases one day before so basically again comparing it to the numbers off yesterday and again I'm going to multiply all this by 100 so that we have the actual percentages there and we can see that when we print that we get armed what do I want to print overall growth rate and I want to print the tale of that 10 less entries and there you'll see that we have an error okay what's the error did I again forget the eye lock yes I did now it's twerk yeah there you can see okay it's actually not that different but you have now different percentages most of the time at least because of course if you have cases like in China in China almost nobody is getting sick anymore you have like ten cases a day ten new cases a day but at the same time a lot of people are recovering like 100 200 a day or something like that so actually you have negative growth but if you look at the first growth rate here you'll always have positive growth or zero growth which is not really representative because of course numbers are technically growing in China but actually this country is already on the way to recovery so this is what we get here with the overall rate we could actually look at China if we want it so we can particularly look at one country here so we just say overall growth rate of China the last 10 days and you can see it's negative you can see that we have minus 8% minus 10% minus 11% so basically people are recovering faster than new people are getting sick which is a very good thing and I think China is the only country or one of the few countries where this is the case because in other countries let's look at for example Italy or the United States is also not not beautiful to look at these numbers are horrible but as you can see in Italy we have 40 41 18 16 12 % of new cases every day active new case it's not just overall confirmed new cases but actively sick people today which is really not a good thing so this is how you calculate the active cases and the active actual overall growth rate now before we now get into the visualizations we want to talk about two more values here and the first one is the death rate the death rate is very important because it gives us some information about the locality of the severity of the virus and in this case we're just going to again say confirmed dot copy and we want to know how much percent of the people who have been diagnosed with kovat 19 have lost their life to it which is a very unfortunate thing and this is one of the most important things that we to know about the virus how much people is it actually killing so what we're going to do is we're going to so say again for every day in range 0 up until length confirmed so we're starting with 0 again because we're not looking back a day we're just looking at each each individual day and we're going to say that the death rate at a particular day for a specific country so basically definitely I log off this particular day it's actually the amount of people that have died from kovat 19 which is the deaths at a particular day divided by all the people who have been diagnosed with koban 19 so we're going to say confirmed dot I log date this again gives us a percentage so we're going to multiply this by 100 and this is actually a death rate now the second thing that we're going to talk about is the hospitalization rate which means how many people how much of the affected people are actually needing a hospital now of course we're not able to just calculate this from the data here so we're going to have to use some estimates so we're going to say hospitalization auspey sation hard-worked right rate estimate equals and I'm not claiming that this number is true I'm just using it as like calculation here but I read in a in an article that this number should be true that 5% of the people infected with kovat 19 will need a hospital bed so even if this number is not true you can change it if you want to 2% you can change it to a hundred percent it doesn't matter but I'm going to show you how you could do some calculations with this so we could actually say hospitalization hospitalization need it equals confirmed copy and this data frame shall have the information of how many people need a hospital bed at this particular day in a particular country so we're going to say four day in range zero lengths confirmed we're going to say that hospitalization needed of a particular day is actually just the amount of tool or actually not confirmed but we need the active cases because just because you are once diagnosed with it doesn't mean that you are still active so of all the active cases that we have right now 5% are going to need one so actually the hospitalization rate estimate as I said this number doesn't have to be true it's just what I read it could be completely wrong so you can use any number you like but 5% is a real realistic thing might be lower might be higher I don't know but let's just work with this one here so basically you're just multiplying 5 percent times the active cases so this is important because every country every country has a certain amount of hospital beds so they have a capacity for sick people and in Italy for example this capacity is running short right now especially in the northern parts of Italy we have too many sick people and too little hospitals too little people to take care of these people of these sick people so this is an important number as well and these are the values that we're going to look at so now we're getting to the visualization part where we put our data into graphs so let's put a comment in here visualization of all of this year so the first thing we're going to visualize is just the confirmed cases for a couple of different countries so we're going to say countries and we're going to define a list year you can put in all the countries you're interested in I think most countries are in this list if not even all of them so I'm interested in Italy because that's a particularly hard situation right now there I'm interested in Austria because I live in Austria I'm interested in the US because it's one of the most important countries in the world I'm interested in China because that's where it all came from and let's just put India because a lot of my a lot of the people watching are Indians and also that's put France and Spain because these are also countries that have a pretty high infection and one more UK is also interesting because UK has also some problems right now so we wanna have a line a graph for each of these countries so we're going to say for country in countries we're just going to say that we want to plot the confirmed cases of that particular country what the hell is this sorry the confirmed cases of this particular country we're going to plot these so we can use pandas to plot of course it needs methylate but it thoughts directly out of pandas so we could just pick a series or a data frame and just say plot and then we could go ahead and say TLT dot show in the end but one more thing because like this we're not going to see anything let's put a label here so label equals a country name and then we're going to say PLT dot legend so that we know which line is what we don't see PLT legend the location of this legend is the upper-left because dear we want to have our legend should be fine okay doesn't work okay UK is not a symbol let's just ignore the UK for now I don't know what the name is here but for the rest it works as you can see we have Italy Austria US China India France Spain and you can see that the curve of China looks pretty pretty different than all the other curves because China had its time at its exponential growth time in January up until the mid of February and then it started the growth rate started declining as you can see and there barely even getting any new cases at all so it's almost stagnating here which is a very good thing if you look at countries like Italy for example this is still a pretty hard growth curve it's it's very it's growing very fast actually exponentially and also for most of the other countries as you can see here Spain France also the u.s. is skyrocketing right now Austria is also on an exponential growth rate here but it's a little bit better than most countries or maybe just a little bit earlier but most of these countries have some regulations now or some countermeasures countermeasures to this virus so probably they're also showing some effect as you can see India which is the purple one here is almost not having any cases at all might stay the same maybe I don't know might also change in the future I hope not but could happen I don't know I'm not an expert I'm not here to analyze this politically or from a societal aspect I'm just here to show you the math and the programming but this is how you visualize it and we're going to add a little bit more of a neural nine like style here so we're going to say axis equals P LT dot subplot and we're going to set some color values here so we're going to say axis dot set face color and this shall be just black then axis dot figure dot set face color this should be a dark gray which is one two one two one two and then we also have some tick parameters axis equals x collar colors actually white so we just want to have some white ticks because we have a black or a dark background then we also want to have a title set title and tile shall be again come on now shall be kovat 19 total told confirmed cases by country for example and the color of this should be white and then of course okay the legend is already there but I think that's actually it this should look way better right now let's see yeah there you go this looks a little bit more like a neural 9 graph yeah as you can see China's quite different we could also just remove China from the list or replace it by it let's say Sweden to zoom a little bit into the similar growth rates here what we could also do is we could cut off the first couple of days because as you can see there's almost no growth happening until the 21st 22nd when he third off a February so we could just cut off the first 30 days or something we could do this by just saying 30 : which means that we start at day 30 and then you could see the growth a little bit better or actually let's just cut off 35 might be even better there you go now you can see that it's really exponentially growing so let us just visualize the growth rate in our for each individual country now for the growth rate we're going to do this a little bit differently because we're not plotting a simple plot but we're plotting a bar chart and because of that I'm going to plot all the countries individually so that it's a little bit easier to see all the information so let's just use three countries here or maybe four countries China as well or maybe most of you are probably not interested in Austria so let's take Germany so total confirmed cases now we want to have the confirmed confirmed cases growth rate so we're now going to say for every country we want to have a bar chart but not from the 35 but from the beginning when I have plot dot bar and we actually don't need this year but maybe we are going to need actually this needs to be in the loop then because every time we do a PLT - oh we have to do the styling again so we're going to do it like this and we're going to say not by country but in a specific country so we're going to use an F string here and we're going to enter the country name here so I think this should work if I didn't forget something you there you go but actually this is not true because we're still plotting the confirmed cases we need to plot the growth rate that looked just to two exponential that's not true that would be a disaster as you can see this is the actual confirmed cases growth rate so here we have basically no cases then here we have some cases but I guess this is a very small number in total cases and then all of the sudden you have a growth rate of 570 almost five hundred and seventy percent in one day which is ridiculously high and then we get down to two hundred and eight percent which is still extremely high also 150 which is still extremely high and then you get a little bit more moderate but you still have 50 percent of growth every day which is like almost doubling or actually more than doubling every two days and then recently you have a growth rate of 13% 10% 7% which is actually quite good so this is a little bit better let's look at the next country which is Germany there you have similar patterns where you're at a growth rate of 12 then you go for the US the US has a pretty high our growth rate still they are at 30% which is very very high and then we have China and Chinese very interesting because it starts as a pretty high growth rate remains quite high and then you have zero growth here almost zero growth here very little growth and as we already talked about this is the growth of the confirmed cases so recoveries are not even in this statistic here so actually the growth is negative as we saw so this is how you visualize the growth rate so now I got back to the format of the first function of the first plot where we plot multiple graphs at once so I'm going to change this year to total deaths because that's actually a number that's really interesting here who cares about people getting sick if everything is fine in the end and no one has taken any harm but the problem with co19 is that it is also lethal and it kills a lot of people so this is something that needs to be looked at how many people is it killing by country and as you can see in China there was a pretty pretty steep curve of total deaths but they recognized things fast and stop the disease from spreading and they told with a death count of 3,200 around 3200 whereas Italy has already passed China in terms of deaths even though there they have not yet reached the infection level of China I mean percentual e they did but but not totally but a total deaths of Italy are over 5,000 going to 6000 and this is a really problematic situation you can all see the US and China not China sorry Germany and the US are pretty low still but we could also look at other countries like Spain for example which has a lot of deaths already but also France and there you'll see that there are also a lot of people die and it is growing unfortunately things are growing exponentially so then again let's look at the percentages so we have to go back to the other format and we're going to say dot plot dot bar we don't need a label then then we're going to use again an F string here total deaths off the country and we don't need a legend there you go this should be it actually not death sorry death rate by countries so we want to know how much percent of people are dying in each country you okay they go in France you have 8% death rate 10% death rate like between actually between four and ten percent of people who are getting coronavirus in France are actually dying which is a pretty high number what's happening here we have some problems what did we do okay of course sorry we need to show every time we plot it so again as you can see in we started with friends okay that's a problem in Italy you can see that the death rate is actually rising which is very concerning because actually 2% is a pretty high death rate if you say that 2% of people getting a disease are dying from it that is pretty high but Italy is currently at around 9% and it's as you can see the tendency is obviously rising so the tendency is that that more and more people are dying in Italy not only in total numbers but actually the death rate is radically increasing now of course you could say that's not the case because what could be an explanation for that is that so many people are getting sick in Italy that the numbers are not correct anymore so you could say that they say how many people are infected I don't know let's look at the website the website says in Italy right now we have 63,000 cases but chances are that they have so many infections already that they cannot test fast enough so they have so many people sick and so many people are dying but that the rate is going up because they cannot confirm enough cases to keep this rate low but of course it could also just be more people dying because the health care system doesn't have enough capacities so this could be another explanation for it I'm not here to interpret these things because I'm not an expert as I said but the mathematics shows that the death rate of Italy is exponentially rising or at least okay this doesn't have to be exponentially it could be a linear but it's definitely rising it's definitely growing which is a pretty concerning thing also in Germany you can see death rate is rising but here you have to take care that this death rate is pretty pretty low we have a death rate of these are already percentages so it's not like this is thirty five percent this is zero point thirty five percent so zero point thirty five percent of Germans are dying when getting killed with 19 right now as you can see ten C rising but it's actually a pretty low number here in Austria it's pretty similar Switzerland it'll it's a little bit higher in the US you can see that a death rate is decreasing I don't know why that is maybe because they did more and more tests right now because there was a time where the US did very few tests on people's on people and now they're doing more tests so maybe this is the reason why the death rate off the u.s. is declining this is a good thing then you can see China actually having a constant death rate of course because you know there were a certain amount of confirmed cases and they're certain amount of people died and they told with a death rate of around you could say four percent basically which is a pretty good indication that this virus has to have a mortality rate I think I'm just saying what I think right now it doesn't have to be some expert knowledge here but since they're not any new cases in China recently and people are recovering and still recovering or dying this is a pretty solid death rate at least for the country of China so you can probably expect the death rate to be somewhere around to four percent maybe I know experts say that it could be lower which is of course a good thing I hope this is true also you can see in Spain numbers are rising debt is rising around six percent and France is actually quite interesting because you have like 8% 10% and then it drops dramatically then it increases a little bit so they have a death rate of around 4% again which is moderate I would say and this is how you visualize the death rates and the actual deaths so now we're going to experiment a little bit around with different numbers that are not to be taken too seriously there are just some ideas some experimentations these are not actual numbers but let's say we have a potential growth rate or a simulated growth rate simulated growth rate of off confirmed cases let's say we have 10% a day so basically 10% more infections than yesterday now what we could do is we could simulate what this would mean for the future so how many people would be sick in 40 days in 30 days and so on you can of course adjust this number as you want in countries like Italy it is somewhere around I mean right now it's lower but it was somewhere around 40 to 50 percent a day and now it's more like 10 to 20 but you could also use like 5% a day if you want just pick a number where you say okay if this number is the growth rate every single day 10% people 10 percent more people are getting sick what will be the result in a couple of days so we're going to pick 10% here as the simulated growth rate and now we're going to just append some data on to our data set that we already have so we take the confirmed cases and we add the data to that so we're going to first define some new dates because we need to put them into the index column and we're going to say dates equals there is a function here called from pandas called date range which allows you to create some new date items and you say okay start is which they are we going to start the start is actually the day R is actually yesterday because today is the 23rd and the data set ends at 20 seconds so we're going to say ok 22nd of March is the last that we used 20 this is the start date from this day on or actually sorry not not true 23 because we want to start from the first entry that's not there so from today basically so this is a start date and now we're going to say periods equals we want to have how many days in the future do we want to predict let's say we want to see 40 days into the future with this simulated growth rate it's not the real future it's just our future that we you know simulate here and we're going to say the frequency is daily so d then what we're going to do is we're going to convert this that we got right now into a series so we're going to say PD dot series of dates and then we're going to change the format into a string so we're going to say dates equals dates dot DT dot strf time and then we specify the string format we want to have a month we want to have the day and we want to have the year because that's the format in our data frame if you don't know the format in a data frame just print confirmed and you'll get it I'm not going to do this not right now we did it in the beginning so these are now the indices the new dates for our data and now we're just going to say simulated equals confirmed dot copy so we're going to create a new data set here and we're going to say [Music] simulated equals simulated dot append and we're going to append a data frame of the date so we're basically taking the confirmed of confirmed cases data frame and just adding some dates arm at the end of it and we can see what this looks like print simulated and you should be able to see okay now we have the visualization first so let's just comment that out real quick or actually just we just have to remove the PLT touch show or we might get an exception because of that but that doesn't matter so basically we're having some we are having the dates here at the end which is of course a problem so what I forgot to do is I forgot to say index equals because the index in index equals date sorry we don't add the dates but we do index equals dates so that we just put the values that we just created into the index column as you can see we have the indices right here and we just continue from the third one up until the first of May which is in the future of course so we're going to make some simulated predictions here and for this we're just going to say for every day in range off the length of confirmed so basically the last day up until the length confirmed plus the amount of days into the future so in this case it's 40 and then we're going to say every day in this simulated every day in the simulated data frame we're going to say for every day the value is actually the day the previous day actually so simulated day minus 1 times 1 plus simulated growth rate because of course if I multiplied I have to multiply by 1.1 not by 0.1 because then I actually only get 10% so in this case I increase it by 10% there we get our simulated values and that's actually it now we only have to plot this again so this is what we would end up with for each country you could say so let's say futures simulation future simulation for let's say Italy aesthetically in here and then we would have to plot the simulated growth here so simulate it off Italy plot and we could just make TLT Dodds show I hope I didn't forget anything here you okay quiero 60 what's okay I always make the same mistakes always make the same mistakes I lock there you go you and the result is this year again don't take this too seriously because as I said I'm not an expert here but this would be the number after 40 days in this case it says it would be 2.6 million it's mathematically true that if you grow 10% every day in 40 days you would reach that number but of course you need to take into account that there is a limited population of Italy they have 60 million people in their country it's not really sure if it's not going to slow down if more and more people get infected because sometimes you reach at some point you reach a limit also there are countermeasures so don't think that this is necessarily what's going to happen in Italy but if the growth rate would be 10 percent every day this would happen we could also go with five percent see what happens if the cases grow five percent every day we would end up with a value that is four hundred and seventeen thousand a little bit better but of course if it's twenty percent and that's not a too high number because we have countries where the growth rate is still 40 percent a day we would see that this is a number that cannot actually be reached this would be like eight billion as you can see this is not realistic because at a population size of 60 million you cannot reach more than sixty million but this is just showing what would happen so you could say that if you have a twenty percent growth rate daily in 40 days probably all Italy would be infected mathematically speaking of course this is probably not what's going to happen in the real world we could also go ahead or and do this for other countries but the results wouldn't be that different for example if we take a country like I don't know let's take Austria again because we don't have too many infected people let's take a growth rate of 10% in 40 days you would have probably also a pretty big number you would have in 40 days 140,000 actually I thought it would be higher but it's still pretty high for Austria because that's like ultra has like 8 million people in its country so this would be a pretty high percentage here so this is the first simulation that we did on how things could grow if we specify a specific growth rate now one more thing that I want to mention here is of course you can go ahead and compute the actual mean growth rate the average growth rate of a country so you could go ahead I'm not going to do it right now I'm just mentioning that this is possible you could go ahead to the growth rate to the actual not actual growth rate but the confirmed cases growth rate data frame and you could run a mean aggregate function over all the values for a country and get the mean growth rate off that country and then use this as a growth rate but of course this is also not really realistic because some days a country will grow 50 percent 100 percent 500% as we saw and on a lot of days it will only grow like 10% 5% you could also take the median value but still it's not realistic for a country to have the same growth rate every day so but if you want to do that you could do that that's just one thing that I wanted to mention now the next thing we're going to do is even more speculative than this one and it has to do with the defferent we talked about that Italy for example has a lot of deaths and the death rate is rising because probably not enough people can be tested so a lot of people will be sick without being confirmed off or diagnosed with covered 19 so what we could do is we could get a an estimate of a death rate and then calculate the probable or estimate just the cases that we think could be the actual cases that a country has because of course when people are dying of covert 19 this is probably something that you're not going to missed it to miss at least not often so if 10 people died you're probably going to recognize that 10 people died of Coe at 19 whereas if 10 people get sick you're not necessarily going to know so you're definitely having more infect people than are confirmed cases but the death number should be pretty accurate and again here I'm not using any number that has to be taken seriously but what I'm going to use as a death rate here let's say estimated death rate here is that experts are saying that kovat 19 has probably a death rate of like 3 3% or 2% so we're just going to go for 2.5% here actually 0.025 would be the percentage here so if we use this death rate of 2.5% we could use it to calculate how many active cases or not active case but confirmed cases Italy should have in reality so we have the total deaths right now so let's say print deaths off Italy and what I want is the tale the last let's say okay let's just say 5 what did the default is 5 so let's just run this and you're going to see that five thousand four hundred and seventy six people have died so we could go ahead and say take value 5 here and I think this should give us the number no doesn't give us the number dot i lock five maybe Oh No I know what the problem was in Excel found as before yeah there you go this is the number of total deaths so now what we could do is actually the number of death of death should be the total amount of people infected times two death rate should give us actually the amount of people who died so this is the ideal world if you know what the death rate is this should be true so you have a bunch of people that are infected times a certain death rate and this should equal the amount of people who have died now if you know how many people have died and you know the death rate but you're not sure about the infected people what you could do is you could just say infected people equals deaths divided by death rate now of course again this is very speculative but I'm just saying if the death rate really was 2.5% everywhere in the world this should be the case so we could just take the deaths and divide them by the death rate the estimated that rate and we should get the number of estimated people that are actually infected in Italy and this number is 219 thousand and forty now I don't think that this number is entirely true because 2.5 percent is probably the death rate that you have if everything is working fine this is probably how many people the virus kills if they get medical care so because there are a lot of old people and a lot of sick people that will die of CO with 19 because of their physical state even though they get medical care but in a country like Italy more people are dying because they're not getting medical care they could survive if they would have enough healthcare capacities but they're dying probably because or they're buying more because they are not getting the medical care and you can see this because in countries like Germany or Austria you have a death rate of below 1% and in countries like Italy you have a death rate of I don't know a very high death rate I don't know what the number was wasn't it something like 8 or 9 percent so actually if you would use 8 percent you would probably get the number that is that you see in the news but of course I could also go ahead and head and say in Austria people are dying like less than 1% so we could say okay 0.067 because that's for example it's definitely in Austria and if I use this number on Italy which is completely unrealistic I would get a ridiculously high number which is not true so I would get almost a million infected people in Italy which is probably not true if I apply the death rate of another country because in Austria right now we have enough capacities we have a good healthcare system in Italy this is not the case right now people are not getting Medicare medical care so this number might be misleading so this is again something that is to be taken with a grain of salt but this is how you could estimate the actual amount of infected people if you know the death rate in the deaths now before I end this video I want to talk a little bit about the relevance you have everything that we just saw why is it important to care about covert 19 why is it important to analyze the numbers and why should we care about it now as I said in the beginning I'm not an expert on economics and not an expert on political policies I'm not an expert on societal issues are not on biology not on medical systems and everything like that so I wouldn't say that I'm someone that's qualified to do any interpretations here but I know that this situation right now since I understand the mathematics and I analyze the numbers a little bit is really concerning we have very high growth rates exponential growth rates people are dying in Italy right now because the capacities of the healthcare systems system are not enough and people compared to China and say that it's not a problem because China had more people infected and Italy has less in China but at the same time the deaths in Italy are twice as much as in China because of course the capacities in Italy are much much smaller than in China because China has like I don't know 1.3 billion people or something and the problem why this is relevant is because a lot of people are not understanding the mathematics behind us a lot of people are not really grasping the concept of exponential growth because of course if you take a simple serious mathematical series like 2 to the power of n it is growing very slowly in the beginning so you can say ok if that was the infection rate it would be pretty slow after 10 days because you would have one infection then 2 + 4 + 8 16 32 64 128 then you wo
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from NeuralNine · NeuralNine · 50 of 60

1 Visualizing Stock Data With Candlestick Charts in Python
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
2 Python Beginner Tutorial #1 - Installation and First Program
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
3 Python Beginner Tutorial #2 - Variables and Data Types
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
4 Python Beginner Tutorial #3 - Operators and User Input
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
5 Python Beginner Tutorial #4 - If Statements and Conditions
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
6 Python Beginner Tutorial #5 - Loops
Python Beginner Tutorial #5 - Loops
NeuralNine
7 Python Beginner Tutorial #6 - Sequences and Collections
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
8 Python Beginner Tutorial #7 - Functions
Python Beginner Tutorial #7 - Functions
NeuralNine
9 Python Beginner Tutorial #8 - Exception Handling
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
10 Python Beginner Tutorial #9 - File Operations
Python Beginner Tutorial #9 - File Operations
NeuralNine
11 Python Beginner Tutorial #10 - String Functions
Python Beginner Tutorial #10 - String Functions
NeuralNine
12 Python Intermediate Tutorial #1 - Classes and Objects
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
13 Python Intermediate Tutorial #2 - Inheritance
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
14 Python Intermediate Tutorial #3 - Multithreading
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
15 Python Intermediate Tutorial #4 - Synchronizing Threads
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
16 Python Intermediate Tutorial #5 - Events and Daemon Threads
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
17 Python Intermediate Tutorial #6 - Queues
Python Intermediate Tutorial #6 - Queues
NeuralNine
18 Python Intermediate Tutorial #7 - Sockets and Network Programming
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
19 Python Intermediate Tutorial #8 - Database Programming
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
20 Python Intermediate Tutorial #9 - Recursion
Python Intermediate Tutorial #9 - Recursion
NeuralNine
21 Python Intermediate Tutorial #10 - XML Processing
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
22 Python Intermediate Tutorial #11 - Logging
Python Intermediate Tutorial #11 - Logging
NeuralNine
23 Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
24 Python Data Science Tutorial #2 - NumPy Arrays
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
25 Python Data Science Tutorial #3 - Numpy Functions
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
26 Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
27 Python Data Science Tutorial #5 - Subplots and Multiple Windows
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
28 Python Data Science Tutorial #6 - Matplotlib Styling
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
29 Python Data Science Tutorial #7 - Bar Charts with Matplotlib
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
30 Python Data Science Tutorial #8 - Pie Charts with Matplotlib
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
31 Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
32 Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
33 Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
34 Python Data Science Tutorial #12 - Pandas Series
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
35 Python Data Science Tutorial #13 - Pandas Data Frames
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
36 Python Data Science Tutorial #14 - Pandas Statistics
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
37 Python Data Science Tutorial #15 - Pandas Sorting and Functions
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
38 Python Data Science Tutorial #16 - Pandas Merging Data Frames
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
39 Python Data Science Tutorial #17 - Pandas Queries
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
40 Python Machine Learning Tutorial #1 - What is Machine Learning?
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
41 Python Machine Learning Tutorial #2 - Linear Regression
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
42 Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
43 Python Machine Learning #4 - Support Vector Machines
Python Machine Learning #4 - Support Vector Machines
NeuralNine
44 Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
45 Python Machine Learning Tutorial #6 - K-Means Clustering
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
46 Python Machine Learning Tutorial #7 - Neural Networks
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
47 Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
48 Generating Poetic Texts with Recurrent Neural Networks in Python
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
49 Stock Portfolio Visualization with Matplotlib in Python
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
Analyzing Coronavirus with Python (COVID-19)
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
51 Making Text Images Readable Again with Python and OpenCV
Making Text Images Readable Again with Python and OpenCV
NeuralNine
52 Neural Networks Simply Explained (Theory)
Neural Networks Simply Explained (Theory)
NeuralNine
53 Motion Filtering with OpenCV in Python
Motion Filtering with OpenCV in Python
NeuralNine
54 Top 5 Programming Languages To Learn in 2020
Top 5 Programming Languages To Learn in 2020
NeuralNine
55 Simple TCP Chat Room in Python
Simple TCP Chat Room in Python
NeuralNine
56 Image Classification with Neural Networks in Python
Image Classification with Neural Networks in Python
NeuralNine
57 Edge Detection with OpenCV in Python
Edge Detection with OpenCV in Python
NeuralNine
58 S&P 500 Web Scraping with Python
S&P 500 Web Scraping with Python
NeuralNine
59 Simple Sentiment Text Analysis in Python
Simple Sentiment Text Analysis in Python
NeuralNine
60 Introduction - Algorithms & Data Structures #1
Introduction - Algorithms & Data Structures #1
NeuralNine

This video teaches how to analyze COVID-19 data using Python, including data preprocessing, visualization, and growth rate simulation. It provides a comprehensive understanding of COVID-19 data analysis and speculative modeling. The video utilizes popular libraries such as pandas and matplotlib to perform data analysis and visualization.

Key Takeaways
  1. Import necessary libraries such as pandas and matplotlib
  2. Preprocess COVID-19 data using pandas
  3. Visualize COVID-19 data using matplotlib
  4. Simulate growth rates of COVID-19 cases
  5. Estimate actual cases using death rates
💡 The video highlights the importance of considering growth rates and death rates when analyzing COVID-19 data, and demonstrates how to use Python and popular libraries to perform data analysis and visualization.

Related AI Lessons

Up next
Azure Security Priorities for 2026: Identity, Governance, AI Security & Zero Trust
Valto Microsoft Specialists
Watch →