Python for Data Science Full Course 2026 [Free] | Learn Data Science With Python | Simplilearn
Key Takeaways
This video teaches Python programming for data science, including data analysis, visualization, and machine learning techniques
Full Transcript
Hey there, welcome to this apply data science with Python course by Simple Learn. If you've ever wanted to dive into the world of data science and learn how to turn raw data into powerful insights, then you are at the right place. Whether you're a beginner or have some experience with Python, this course will take you through the key tools and techniques to help you tackle real world data problems and by the end you will be comfortable using Python libraries like NumPy, Pandas and Matt plot lib and you will have a solid understanding of the data science pipeline. So here's a sneak peek of what we're going to cover in this video. First we'll start with the basics of Python libraries including numpy, pandas and matro lib which are the backbone of any data science project. We'll also walk you through the data science pipeline a road map for solving data problems. Next we will dive deeper into pandas learning about series data frames and lambda functions which are essential for working with and manipulating data. Then we'll move on to some advanced pandas techniques including data inspection, column operations and how to aggregate data. You'll also learn how to visualize data with map plot lib and explore different chart types. After that, we'll jump into exploratory data analysis EDA where we will explore charts, histograms, scatter plots, and even heat maps to uncover pattern and insights. We'll also cover key statistical concepts like central tendency and inspiration and tackle data cleaning and outlier management to prepare you for data analysis. Finally, we'll explore more advanced concept like skewess ktosis and data prep-processing. We'll also touch on probability which is the foundation of many data science models. Also, if you're looking to kickstart your career in data science, then I highly recommend checking out the data scientist masters program in collaboration with Microsoft and Semilot. This 11-month program offers live interactive classes led by top industry experts giving you hands-on experience with AI powered tools like Python, SQL, and machine learning. You'll also work on real world projects including a capstone project, earning Microsoft certificate along with simply learn data science certificate to showcase your skills. So whether you're a beginner or looking to upgrade your skills, this course is perfect for AI enthusiasts, students and recent graduates. And the best part is you will get career support to help land your dream job. Enroll today and start your journey towards becoming a certified data scientist with real world expertise. To get started, here's a quick quiz question for you. The question is, which Python library is best for data manipulation? Your options are NumPy, Pandas, Mattplot Lib, or Cort. Let me know your answers in the comment section below. So without any further ado, let's get started. We are going to get started with this now. See, let's understand the journey to data science. Right? So, whenever you talk about any data science project, there is a uh crisp DM framework that we follow. We'll be studying that later. But you are doing a data science project. So like this. So whenever you talk about any data science project, so it will be always going through these all things. I'll just let you know in short what I mean by that. So whenever you have a project from the client, the first phase is where you understand. So let's say your client is FISA. So you understand what are they into, what is their domain, uh what is their problem statement, what are the challenges that they are facing. So a business analyst comes over here and understands all these things. Then he propagates the requirement to the technical team who will be getting the data, right? Who will be getting the data and starting to analyze the data, right? Because in a data set, let's say FISA, right? So there will be all pharma related terms. As a data scientist you will have to understand each and every column and correlated with the domain knowledge because if you don't understand that you will not be able to solve their real life challenges. So data understanding comes then after the data understanding comes the data preparation. So data preparation is the phase where the data in real life will never be as clean as expected. You expect the data to be awesome but it's going to be worst. You really have to make your hands dirty for cleaning the data. So cleaning the data, doing some visualization on the data. So this is where your libraries like numpy pandas mattplot lib and seab bond comes into picture right so we are going to see this right so all of this comes into your picture and then comes your modeling this is where your machine learning comes into your picture. So the library that you learn is scikit learn and some other libraries but mainly scikit learn. If you are doing deep learning you learn tensorflow and then you get into evaluation of the model. So obviously first you analyze clean the data then you create multiple machine learning models because one model may not be the best. So you try to create many many many just like in a class as a teacher I'm teaching to all of you and then I am choosing the best student. Okay. So I have to send them to some Olympia competitions. I will choose the best student and that is what happens in the evaluation stage. So in evaluation stage as I told you you select the best model and after that best model is selected you get into the deployment stage and deployment means obviously you have to share it with the model so you productionize it. So today we are here in understanding these libraries and once you understand these libraries then you are good to get into this flow in the machine learning part not now but one thing I would like to definitely say that if a data science project you if you are talking about a data science project and if a data science project let's say is taking a total of 100 days to complete successfully then I want to say that 70 to 80 days go in phase 1 2 and 3 Then I would say 10 to 15 days go in 4 and 5 and another 10 to 15 days go in six. So although machine learning is so so important but for actually creating models it does not take that much time. The highest amount of time taken is over here for getting these all things aligned. Right? So this is something which is very important for anyone to realize. So data preparation stage is one of the most important thing. Okay. So that's how the overall data science pipeline is and also you know at times if you are not able to create a good model you can also see that you may have to go back. So let's say you do this thing in a little bit hurry and you try to create a model quickly. Your model will be worst worse something that you know client will just not accept. So you may have to go back and do this. In fact here also if you do any misunderstanding you may have to go back and understand the requirements. So it's not an easy job friends. So that's something important to be understood by all of us. Okay. Now we need not get into all of this. This was just to give you an idea. What is more important is getting started with the numpy library. But you know this helps you understand the overall flow. What are you learning and why are you learning? One of the model you know which is production lines uh looks something like this. So what are we essentially doing over here? I have hosted this model which is productionize using the streamllet app. So if you see this is a tick prediction app. We will be actually dealing with this data set when I explore for you the seaborn and matt plot li. So that data set we will be looking into at that point of time. Right now it's not important but this is that data set for a US restaurant where I am trying to predict what will be the tip that a customer is likely to give based on his or her total bill. How many people visited in all the gender of the customer who paid the bill? whether he smokes or not, the day on which he visited the restaurant and whether he visited for lunch or dinner. So you can actually play around with this. Let's say I go to the restaurant and my total bill is $15. Dollar say it's a US data set $15 and I went with my wife. So total two people I was the one who paid the bill. I do not smoke. I went on Saturday and I went for dinner. So if these are the parameters, I'm likely to give a tip of $2.45. And in the same way, you can play around. Let's say Dan went for a lunch time, then $2.55. If Dan smokes, if the customer smokes, rather than saying that, then this is the one. If I went ahead and let's say the you know this is not this is some female then the tip given is this and if it is a male then the tip given is this. So male gives relatively long a higher tip is what we understand and smoker is giving 2.36 non-smoker gives relatively higher tip. So these are the things that we understand. You can also take a moment and try playing around with this because this is deployed on the cloud. So one can try that out. Now this deployment part which I showed you know that comes at the end of machine learning or somewhere at the end of deep learning. Further more ahead how does the data look like? In this case it's a CSV file. It looks something like this. This is how the data looks like where this is the dependent variable which I'm trying to predict on the basis of other factors. Yes. So we'll be working on this data set as well later on. So now let's get started with numpy. Right. So getting started with numpy. Um oh so what is exactly numpy? Well, NumPy is the backbone of machine learning and data science in Python. It is one of the most important libraries in Python for numerical computing. See you need to understand num in numpy means numerical and pi means python right and numpy why numpy I know lists why numpy I know the topic list. Then why do I need a numpy array? Because numpy is super duper ultra fast. So numpy is almost 10 to 15 times faster than list. That is what you need in data science, right? So numpy supports large multi-dimensional arrays along with the collection of the mathematical functions. Next, it is very powerful for doing this and it is super fast and efficient as it is implemented in C. Now C talking about Python. Yes. So if you you know just if I go back and if I say I want to write a function to check if a uh not check function to find factorial of a number that's how you write it and then while testing you test it like this right this is a recursive function to find factorial of a number that's how you write it in this. So when you are giving a call to this factorial num function then control goes over here and some logic is applied to do the things. Okay. So the function right if you see over here this function is in Python language. However, numpy is implemented in C and that is why it is the till today one of the fastest language is C only. C, C++ are the fastest language. So whenever I am going to call some numpy function then the body of that function is written in C mixed language programming it is called as cython I'll show you uh can you show the body of any numpy function as it is written in C language. You can see over here. I think I did not give it properly. So see whenever you are giving a call right. So how does Python use the and how is it implemented in the back end? because whenever I'm saying sum it will find out the sum of the array. So in numpy finding the sum is this as simple as that. But in the back end the code looks like this and in return when sum is called there is a python code which is getting executed right. So that is I want to say right. So that is what is called as Syon code Python and C together you call it as Syon and it looks something like this. We we are least bothered about it but I am explaining you how they have created all of this. Huh? How they have created all of this. This is not important. This is not important but internally whenever I'm giving any call this is how it is happening. Okay. So that is why I said numpy is super fast. Do we say the operations of numpy are vectorzed in nature? Vectorzed. Vector means the body is written in uh C and C++. Okay. Now one last thing is that the only difference between a numpy array and list. No. Also one more thing. Numpy creates arrays and arrays are collection of items of the same type. Very very important. List can have items of any data type but array compulsarily of the same type. Now how much is numpy important? Well, too much important uh just to show you. I'm just randomly opening few of the deep learning case studies, right? I'm just randomly opening few of the deep learning case studies and one machine learning case study as well. So if you see over here in this deep learning case study of image classification, I'm importing the numpy And what all functions of numpy have I considered using over here you see np new access np range then I say np where np.trandom random right so I may not be able to explain you what are we doing over here but I just wanted to show you that it is getting used np is what I have imported numpy as similarly over here as well you see npexpand dims np range np where nprandom similarly np.round round right then flatten function that's also of numpy np mean okay np log so this is just to make np dot array this is just to make you understand that c boss this is required ired. Uh this is just to make you understand that it is required and you know what see when till 12th standard in the examinations they never allowed us us means we the engineers to use the calculator in the exam but in the engineering when we took admission all of a sudden using calculators was allowed. So at that time why didn't they expect me to do all the calculations by step why was a calculator allowed because calculator was mandatory at that time when I was on 12th standard I was in 12th standard 10th standard or anywhere below that right first second right up to 12th standard well I was doing all the simple math but now I am doing advanced math. So I am already assumed to be a pro over here. So when you are doing the advanced mathematics if you need something you use that go up and continue coding the advanced level. So at that time numpy is like this you know assume this entire thing to be like numpy the basics are over here. So whenever you do any machine learning stuff you need not implement all these small small things for that you can say numpy would you please solve my issue and it will do the needful for you right as simple as that. So coming back you know that was just to make you understand the practical significance of this topic. Okay. Okay. So now let us get into numpy. Let us start with numpy. So first thing is we will have to import the numpy library. That's just import numpy. So import numpy as np. Now is it mandatory that you import it as np? No, it is it is a de facto standard to label it as np but it is not mandatory. So but most of the people or rather than saying most almost everyone labels it as NP only. In fact if you go to Google and look into any of the documentation as well you will see that numpy library only in picture. So import numpy as nbp and then I would like to print the version of numpy that comes as np dot double version double underscore double underscore s it's two times and done. Now if you go to collab maybe your version is different but that does not matter right you just need not worry about it whatever be the thing now let us create a onedimensional numpy array. So first thing is syntax np dot array you have to give the data and the d type. You have to give the data and you have to give the d type where data can be array like object and d type will be this and also d type is optional if not provided numpy will infer it by itself. So if it is optional I would like to first of all play around with it. So I say I want to create a numpy array. So I'm using the array function np dot array. And you can pass in a list or a tuple or a set and it will create a numpy array for you. But as I told you during the basics of Python that whenever you do this always 95% to 98% of the times you will be using list only. So I'll be using a list. So let's say if I say 1a 3a 5a 2a 8 these are the things you know. So I will print that array and also print the data type of that array. So print the data type of that array. Then after that you see this now how is a numpy ideally array is created. So, numpy array look at this figure of 1 3 5 2 8 1 3 5 2 and 8. This is exactly same like list. Yes. 0 1 2 3 4. In fact, -1, -2, -3, -4, and -5 as well. As you all can see, this is a 1D array consisting of total five items. Yes. So, let's get into this. So here we have got this creation of an array and the data type is numpy ND array okay n dimensional array because it can be 2D 3D ND etc. Now uh after this creation of this I I want to show you how I can do indexing on this array because I see the indexes over here. So indexing well it is exactly same as list. So I don't want to invest a lot of time. AR r of zero will be 1. A r of four will be 8 minus one would be 8 minus 3 would be five and so on. So a r of zero would be 1. A r of 1 would be three. A r of -1 would be 8 and minus2 would be two. And when I run it, you can actually see it in the same way. Slicing is also same as list. So if I say 0 colon 2 then the starting index is equal to 0 and ending index is 3 -1 which is considered as 2. So whatever we studied in list everything is applicable over here. Okay. Okay. After this uh I come to the creation part. Okay. So this is what we created. I also want to tell you that overall the D type is what I never gave. Right. So this this was the D type but I didn't give it that was the syntax. I'll take it over here so that the code becomes clean. So here I have 13 528 and I didn't supply the D type but I can print the D type. So I can say print the name of the array is a ar r. So I can say a r dot d type. So what will it give int 64? If you are using a very very very very very very old system it might show int 32 as well. It shows 64 or 32. We are least bordered. It is integer. Huh? Rest of things we never require. It's just saying over here that whenever I say int 64 inside this 64 is nothing but the bit integer. If you understand this technical thing well and good else it is just not important. If I supply over here one of the value as 2.5 then I said numpy needs all the values of the same data type. However, here you can see it is a float. So it's a clear case of error. Numpy says what I'll do is convert everything to float because float is a higher data type. Float is a higher data. So it converts everything to float. Okay, then I can actually create an umpire array with the same thing and I can use the DT type parameter. So D type if I say float then you can see it is of float type right and I I can I can also say float 64 by the way but not required right that's what I said right you write float or you don't mention it it will automatically figure out so you never mention it that's always good better you don't mention it there if I say integer. Obviously all the values are integer. So what is the point in me mentioning? But imagine if there is a value like this. Now if you say data is equal to integer, it will actually do type casting on 5.22. So 5.22 2 to become five because you forcefully asked it to get converted to this. Okay, that's one thing. Okay, so this was our 1D array. I want to say That was the 1D array. Now I am planning to create a 2D array. Something like this. Two rows, four columns having the values. It can be anything. This is the row 0, row 1, column 0, column 1, column 2, column 3. Let's call it as a ar r only. You know how do you define it? Every row is a list and this entire thing you have to put it in a bigger list. Every row is a list and this entire thing you have to put it into a bigger list. So here I say let us create a 2D array. So I will see over here how I create for 1D array I use one square bracket for 2D array there will be two square brackets if you recall this is how the figure was now there are two square brackets that's my row zero that's my row one each has to be given in the form of a list so it's built on the top of list only so I can say is equal to np array square bracket close it. So outer list created this outer ring bracket inside this another list 10 comma 20 comma 30 comma 40 50 60 70 80 that's what we said now see how conveniently I have created it out trigger and same everything same see 1D array singles Single square bracket. So single square bracket 2D is two square bracket. It is not mandatory that you write it like this one below other. You can also write everything on a single line. But this adds readability. So what I mean to say is it is completely okay if I write it like this. It's just that it is not that readable while creation. So I don't prefer it. I prefer this. You don't end up doing mistakes while the array is getting created. 10 20 30 40 50 60 70 80. It's a numpy array. How do I know there are two rows and four columns? There is a shape attribute. So if you remember that I can print a arr dot type and that gives me it's in 64 and if I say a ar r dot shape that will tell me how many number of rows and columns are there. So I want to tell you that D type and shape are attributes of the array and they are not the methods. So the method that we have seen is the array np dot array but d type do we have a round bracket after it like np dot array has a round bracket or here shape no these are attributes there are hardly five to seven attributes so there is nothing challenge as such in remembering them very very few attributes are there so right now don't ask me what all are the other attributes anyways we'll be studying them in detail so right now you just need not worry about it. I was saying it showed me that there are two rows and four columns. But but but but can I write the same code here in the 1D array? It's also called as a ar r. So it will once again declare a ar r as 1d array. And if I give shape then what happens? See here when they created the function the return type is a tpple. The return type is a tpple. So here also it will return a tle only five comma indicating that there are five items in the array. There are five items in the array or five values in the array. This is how you create a 2D array. And here if I want to access number 20 it is at row 0 column 1. So indexing print a ar r r r r r r r r r r r r r r r r r r r r row 0 column 1 20. I can also write the same thing as a ar r of 0 comma 1. Both are same both are same. You can see both of them are giving the same output. Whichever you find easy, you can write it down. Okay. Now, negative indexing is also there but you need not do it, right? Um, it's also there but it's not required. How muchever is required, we can definitely look into it. So, in the same way other things we'll do it later. Let's not get into the complicated part of slicing right now. I now want to do one thing and that is I am planning to create one single array here uh let's assume this to be it can be any values I'm just randomly assuming it this was 1D array This was 2D array. Now I aim to create a 3D array. Now the 3D array is not a true 3D area. So something like a three-dimensional object. No, nothing like that. It's a the it's called as a 3D array but it's not threedimensional. So you know how I say I say that they this entire thing is one array where this is the zero array and the lower is the first array. Inside the zero array, this is row 0, row 1, column 0, column 1, column 2. Inside the first array, this is row 0, row 1, column 0, column 1, and column 2. And the way I define it is something like this. First of all, every array is a list. So this is an array. I'll just mark it like this. So that's one. That's one. And you cover it by an outer one separated by comma. Done. That's one. That's one. And you cover it by an outer one separated by a comma. Done. And that's and that's separated by a comma. That is how it becomes a 3D array. say 1 D array one square bracket 2D array two square brackets 3D array three square brackets let's check it out whether that's really the case so I say is equal to uh okay it actually heard me you see over here now I can write everything on a single line as I told you but I am not writing And let's check it out. First of all, the way I have declared is it exactly same like this pink brackets. Print ar has printed it exactly same like my figure. Inside this figure when I say print array entire thing type of it is a numpy rd array shape you know what is the shape trying to say it says that we have two arrays each of two rows and three columns. Perfect. That's perfect. It says I have two arrays each of 2 + 3. You can see that this is 2 + 3. This is 2 + 3. And when I want to access 60, 60 is in the zero array. Row 1 column 2 0 1 2 that gives me 60. Or I can write 0a 1 comma 2 1 02 is it 90 1 02 1 is this 0 is this and two is this 90 that's correct so that is how you create a read array just pasting this as an image for your future reference so that becomes little convenient for everyone to figure out. Okay. So that is how you create it. Now I want to say that I said when you look at 1D node now anything more about 3D we'll check it out later. Right? Right now it's not required. Right? Now this much is more than sufficient. This was the 1D array. I said that here by any chance if I put a float value and I print it and I check the D type of it. I print the type and I also check the G type of it. Then it is everything converted to float because between integer and float because between integer and float float seems to have been given a higher precedence. Therefore everything became float. If I put a boolean value like true and a false they are not allowed because array has to be collection of the values of the same data type. But if I put it then true gets converted to 1.0 and float to 0.0. No problem. You know what I am planning? I want to also show you some other functions. Right? So I want to add over here some basic functions of numpy precisely the arithmetic operations or not arithmetic operations arithmetic functions. So I want to find the sum of these values. How do I find out? Well, print np dot sum of this. So some of these values would be given over here. In fact uh you know before I do this you know before I do this let us do it on a basic array without a float without a float. Let's say let's keep it simple as well. I think that would be one or let's say 10 comma 20 30 40 50 right. So let's see. So 10 20 30 40 50 is the 1D array which is created. I just say I want to check the d type everything is int. So it is int and the output is 100 50. That's good. Let us try other functions. So I have the minimum. I have the maximum. I have the average which is mean. I have the median, I have the standard deviation, I have the variance. Many functions are there. So I can say NQ dot minimum of the array which is 10 maximum which is 150 sorry 50 mean which is the average median which is the center value after you take all all of you are aware what is the median after sorting the values the one which comes at the center right yes After sorting the values, whatever comes at the center is nothing but the median. So, and then you have the standard deviation and variance. Uh, so we'll be learning this standard deviation and variance when we get into the statistics part. But in very simple words, standard deviation is how much you are deviating from the mean. For example, if I say for me to go from my home to office it takes 1 hour because the distance is 25 kilometers. However, if there is no traffic, I'm lucky. I end up reaching in 45 minutes, 15 minutes earlier. And if there is heavy traffic, I end up reaching in 1 hour 15 minutes. So, the average time taken is 1 hour average, which is the mean. And the standard deviation is 15 minutes. So I take mean plus or minus one standard deviation to reach my office. So it can be 1 hour - 15 minutes which is 45 minutes to 1 hour + 15 minutes which is this. So that is the range which I take to reach my office. So this can help me to take a decision. If I have to go to my office and if I have a meeting let's say with the manager then I will end up considering this time. So if I have a meeting I'll prefer to leave 1 hour 15 minutes early because I cannot be late and if it's a regular day and I can take that risk I can assume 1 hour or 45 minutes to reach the office. So accordingly I can take that decision as well. Okay. So that is nothing but the standard deviation. So what am I doing over here is just take out this as well and yeah you have it over here. Hm. So standard deviation and variance. So standard deviation is nothing but the square root of variance. So 200 and square root of this is 404. Okay. You also have the product. What will the product do? 10 into 20 into 30 into 40 into 50. You also have the cumulative sum. What is cumulative sum doing? Well, cumulative sum is saying what is 10 + 20? I here. Yeah. So, first it says 10. 10 + 20 is 30. Then this 30 + 30 is 60. 60 + 40 is 100. 100 + 50 is 150. So this is happening with respect to the plus. If I use compro then it will be like let's take this 10 down. What is 10 into 20? 200. What is 200 into 30? 6,000. What is 600 into 40? 24,000. 24,000 into 50 and so on. Okay. Difference. Now what is this difference doing? So as you can see difference it is finding the difference between consecutive values 20 - 10 10 30 - 20 10 10 and so on np non zero now I I don't know actually what is this function it's suggested but n it's never never required in my career ever np do non zero and What is it saying? So I don't know. It suggested me. So what will I do? You know, let's say I don't know. But it is suggesting me something. So copy this. Go to Google. And first thing what I want to show you is the numpy official documentation which is available on numpy.org website. It says this version is released. I can click on that. Okay, this is what is the one I can learn dumpy quick start tutorial and lot of functions lot of help available over here to let you see shape size all these things I'm going to first of all come and just put in the documentation then here second thing np nonzero I don't understand it so I'll search np.nzero non zero and I'll also open up this geek forge geek link and this link because you never know sometimes you don't understand from the documentation nowadays I prefer using chat GPD for getting this answers but you know I'm showing you approaches so this will return the indices of the elements that are non zero okay so there is a example as well to indexes of the elements which are non zero 0 comma 1. Where is this now? Huh? See, easy. 0 1. This is row 0, row 1, row two, column 0, column 1, column 2. So 0 and 1 is 8. 0 and 1 and 0 1 and 0 is 7 2 and 0 - 5 and 2 and 2 should be the last one that is what it is simple. So in this way if you get up get stuck up somewhere there are a lot of resources available as help nothing to worry about anyways so I can just execute that and none of the values are zero so obviously it will not give me a zero anywhere okay after that sort it's already sorted So can't do anything arc sort. Now what is this arc sort? Well uh that's little difficult to say at the moment but I'll explain you. See right now the minimum value in this array. If you see I'm talking about this argument. What is the minimum value? 10. What is the argument of that? Zero. Ark max what is the maximum value 50 what is the argument of that four then arc sort is sorting as per the argument so I'll do one thing for this arg sort I will Okay. Okay. It's not sorted. Now I say empty dot Argument argument would be 20 sorry minus 30 which is at index 2 arg max and ar here and here this is the smallest value it is at index two argument highest value is 101 which is at index one sorry index zero and if I have to sort it in the ascending order the ascending order will be minus 30 20 40 50 1 argument y + 30 is 2 20 is uh 1 40 is 3 then 4 and then zero same thing now that's arc sort what will be the argument if the array is sorted what will be the arguments which will be coming if the array is sorted so I'll only take the common functions now so we saw that These work. Okay, we saw that these work. Now here when I am taking one item float, do this still work? They work. Okay, they work. The value which I changed is this became 2.2. That's fine. When I convert this sorry add a boolean value you see over here boolean values true and false become one and zero and the d type is still float only so float is having higher precedence as compared to integer the boolean I'll do one thing Now let me add true ru and false. So we know that nothing happens. It will be float only. But what if I add a string like dan? Do you see uh uh uh what happened? So why is oh I wrote it in the same thing or what? Oh right. So we have this and added dion. So see everything became string. Oh that means float value did not convert everything to float that also became string which indicates that the highest precedence is string. String has the highest precedence. Okay. Okay. Fine. That's understood. And if I uncomment this prompt cell number tellwards you get this. It says that here only right on line number 10 it says I cannot find the total. Obviously how do you find out the total of these string values? They are not integer. Everything is converted to string and numpy call that unic code 32 and that is important if you recolct what did I say numpy is mainly used for what is the full form of numpy what is the full form of num so it is only used for numerical calculations you cannot do string calculations so it is very stupid if you try out any of the numerical computations. Right? So over here U stands for uni code. Uni code is nothing but a string representation format. If you are not aware of that uni code number system this is how it was 11 this is how it represents everything. So everything is represented by something right as per uni code level format. So that's okay. That is done. Now then I say images Abraham Lincoln. Whenever you represent images, images are represented something like this. Movies are represented something like this. So as a human this is how you see image but a machine sees it like this. Combined it looks like this. As per a machine number zero means black number 255 is white. So you see over here these are pure black and they are 0 0 or you can see over here third last row 255. Then what happens? You know, because all these values are between all the values are between 0 to 255. What will happen if I perform the operation of dividing by 255? So what will happen when 0 is divided by 255? 0. What will happen when 255 is divided by 2551? Now answer one question. Do you all agree that any number between 0 to 255 when divided by 255 will compulsorily be in the range 0 to 1? Do you all agree that any number between 0 to 255 when divided by 255 will compulsorily be in the range 0 to 1? Yes. This is what is called as normalization. That will be helpful in machine learning and deep learning. But this is called as normalization. And why did I all of a sudden go out of the topic and started explaining this weird thing is because it has some significance. Now so like because it has some significance I thought of explaining this to you. The next function which I want to talk about is NP.0 np.0 zeros. It says that if you give me five, print it. I am going to create an array of five values all zeros. If you give me 12, a numpy array of 12 values, all zero. You can also have the DT type facility to be chosen. So if you say DT type integer instead of all the float values like here it will choose as integer. Okay. Now it says that dashi give me va 5 then all the I'll be creating a matrix of three rows and five columns all zeros and this 3 can even be supplied as a tuple it's not common but you can supply it so that both of these are ultimately going to create a numpy array only if you want I'll just print the type of array. It's a numpy ind. And here as well, it's a numpy ind. It's not that because a list is given or a tuple is given something else is getting created. It's absolutely the same. Okay. So here as well I have the facility to give d type. It will create a numpy array of three rows and five columns. As you see over here, it will create a numpy array of three rows and five columns. Okay, that is what we see. Well, uh what I wanted to say. Yes, just like this zero, you also have once function which will create an array of ones. And just like this once you also have twos functions, threes, fours, fives by the way, which is wrong. I'm kidding. You don't have twos, threes, fours, fives. You only have zeros and ones. All the functions of numpy are added for some purpose. They are not added for time pass. These functions are not added for time pass. they are added for some reason. As I showed you, whenever images which are usually in this range are normalized, they become in this range. And that's why only zeros and ones exist. Twos, threes, fours doesn't exist because sometimes in image processing, what happens is let's say you are working on some let's say Instagram, right? You upload some image to Instagram. You know this is let's say the image that you upload to Instagram and you are trying to apply some filter on it. So Instagram will be creating a copy of this image. How is a copy created? The pixel values cannot be copied directly. First you create a dummy matrix of all zeros and then one by one as and when you are applying let's say a blur filter. So what will happen with that pixel? That operation will be replaced over here. So that zero will be changed. What happens with this pixel? That new pixel value is saved over here. So while creating a copy it can be a matrix of all zeros of the same dimension that you create or could be one. And that is why these functions exist as I told you all of them exist for some reason right. So these functions are mainly useful while the deep learning stuff that was the zeros and the ones function. Now can someone explain me in simple words what is an identity matrix? What are the properties of identity matrix? Uh okay, identity matrix is here and in identity matrix as well of course you have that D type. D type is almost there everywhere. So you see over here you have the DT type and there is a I function as well which also creates an identity matrix. I can show you NP do I of three. Now you will be like but uh how is this different? Because if I copy this and if I go over here then it's absolutely same. Why? Then what is the difference between the identity and the I function? Well here we have the K. Here we have the K. I I'll do one thing. So I'll create a identity matrix of pi + 5. Okay. So this is how it looks like. So let us let us let us let us let us create it and pi cross pi. So, Oh, row zero 1 2 3 4 column 0 1 2 3 4 by the way. That's K. K. Can take any of these values. How? Well, copying this, there is a parameter which is called as K. If I take K = 0. If I take K=0, no change in the output because K 0. Same same like this figure. But when I say a 1. Now see what happens. The diagonal the diagonal is here. Of course it's not a perfect identity matrix and I am not calling it as identity matrix either. Okay. when k is equal to 2. See it shifted here. So from here 11 1 1 started. When k is = 3, it shifted here. And when k is = 4, it shifted here. And when k is = 5, all zeros. And when k is = 6, all zeros. But you know we don't need it that often but it just goes in complement with this. So see all these functions exist for some reason but this is very very rarely required. The I function uh it's not required only. I mean it is applicable for positive values only. uh I don't know I never tried it to be very very honest and there is no need to try as well but let's say if I say k is equal to minus one does it start from back or not I don't know but okay it is minus one means it is taking the column wise row wise I never tried it first time you asked me and I'm trying it out yes I think but but but not required to be very honest this is very very rare rarely rarely required right so don't worry about it be I just wanted to teach you identity function because I function goes complimentary to that I showed you but it's not required okay so don't worry about it at all but I'm sure and sep you got the answers to your question am I correct now See we have seen so many functions inside our numpy. I don't even remember what all functions we studied. Let's say sum function main function sort function arg main function. You studied the arg max I on and on so many functions and I say that we all studied this inside the module. In fact, there are some other functions as well which I would like to talk about like a rand function, rand function, rand function. All of these rand rand rand r r r r r r r r r r r r r r r r r r r rand r r r r r r r r r r r r r r r r r r r rand r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r rand stands for random and because these functions are more or less achieving generation of some random numbers they are also dumped into a module which is called as random and the best part is they are a submodule inside numpy if you have to call let's say the rand function then you say np dot dot random dot rand that is how you call this function. I just wanted to show you that okay this is how it is done. So I will just get this things for you. Now we don't have to look into all of these functions but you saw in the deep learning notebook that random module is little used not too much but little used. One of the perfect application of random module is it can be used for generating OTPs right every time you're doing some transaction some OTP has to be generated which has to be purely random not following any pattern and that should go on to the user and then it must tally so how do I use it well np dot random dot rand end of five. So it will generate random numbers between 0 to 1. Five random numbers. Every time I'm running it is generating five random numbers between 0 to 1. Then you have random. It will generate the numbers between minus1 to one. See every time I'm running you have rand you specify the value and it will generate the numbers in between them low and high. So between 1 and 10 generate fine numbers. It will always be between 1 and 10. Look into the last line output. Between 1 and 10 fine numbers will be generated and many are there. So random model is responsible for generating any value between zero and one. Yes, very similar to the above one. I do agree and so on. We need not look into all but yes in deep learning as well when you have a set of mclassifications and if you want to randomly plot one of the mclassified image that random model turns to be of very heavy help. Okay. Another thing uh something very common in the machine learning like I told you sometime back code is not important right now right in machine learning let's say when you have a data set you always have to separate what you have to predict from what are used to predict like the pink values are used to predict the values in the yellow box. So you have to separate. So now what am I going to consider is an similar analogy where I have values 11 22 33 44 55 66. So I want to tell you that usually usually like here the tips column is the second column but usually the column to be predicted is the last one and the ones which are used for predicting it are here. So what I want to do is I want to separate it out this blue part into X and this pink part into Y. And before that I would like to write down some naming convention. This is row 0 1 and 2 column 0 column 1 column 2. I can also call it as -1 column, -2 column and minus3 column. Same row also can be called as -1, -2, -3. H let us create a data set. So I say np dot array Perfect. This saves so much of my time. Now, how I do it is going to be very very interesting. See, I say in my X array which rows I want and which columns I want. So I say I want all the rows that is from row 0 to row two. So I want it from zero to two. So I will write 0 column 3. and columns. I say I want all columns from C 0 to C1 for which I will write 0 col 2 then only it will take column 0 and 1. Okay, I can also write this as just colon because by default if I don't give the starting and the ending index then it takes everything and I can also write this as I want it to start from zero and go up to minus1. So when I say colus1 it's by default zero and when you say minus1 that means it will go from 0 to -1 -1 which is 0 to -2 column 0 to column -2 that part. Hey and As far as Y is concerned, I would say I want all the rows of column minus one. So these are my rows and this is my column single column which are single column. I didn't write colon minus one. Colon means start from zero. I am directly saying minus one. That means clearly I'm saying I want only minus one column. Let us check it out once. So, x is equal to array. You can see over here, right? That's X and that's Y. Have a close look at it. It's not difficult. It's just different. I will also first of all like to add this image so that this doesn't go and then you just quickly keep on observing this. I repeat my X's all the rows that means row 0 till this. So I can say 0 col 3 because row 0 to row 3 which is going to take 0 1 2 and column is 0 to my 0 to 1. So I will I cannot write 0 to 1. If I write one then it will go up to zero only. So I will say 0 to two. So it will go from 0 to column 1 column 0 to column 1. And this 0 to 3 can be conveniently just written as colon only because 0ero can be skipped and I can only write three. 0 is optional because by default it is zero. And here also 0 that can be written as just colon 2. I can write it as colon 2. Right? And 2 is nothing but minus one. Now 2 is nothing but minus1. So I can also write it as minus1. And that's what I wrote. Colon minus1. And here when I say colon I mean to say I want 0 to three which is as good as saying everything. And here specifically I can I can write I want column 2 or I can just write 0 col 3 and I want the minus one column. So ultimately it's one and the same. So now that we are uh done with this the next function which is again very popularly used in machine learning, deep learning and almost everywhere that's going to be your a range function. So first of all if you recall the range function where I used to say for I in range of five print I that used to print 0 1 2 3 4. Numpy says I also have a same function called as a range. Absolutely same function called as a range. So a range will work same like range function but the difference is a range would be storing these values into a numpy array. So here I say np do arange file and if I let's say save it into a variable like a ar r r then you can see over here it has this you know I can also print the type you can see this 0 1 2 3 4 it's a numpy array this 0 1 2 3 4 it's a numpy array in Even if I say np dot in range of 12 0 to 11 np dot in range of 1 to 12 it goes from 1 to 11. If I say 1 to 12 at a step size of 2, it goes like this. Exactly same like a range. The difference is it will create a numpy array. That is how this is going to be different. The most important thing over here is the output will always be a numpy array. The next thing which I want to talk about is if I say nparange of 12 there is one more function called as reshape. This reshape says I will reshape this array. Reshape says you have the value 0 to 11. Right? Using np range further using reshape I will reshape this into three rows four columns 0 1 2 3 4 5 6 7 8 9 10 11 okay also for this thing to work 3 into 4 must be equal to 12 if and only if the multiplication of this turns out to be 12. This is possible else it's an error. Let's try the various scenarios. I can say 4a 3 and it works. Four row three columns. I can say 6a 2 6 rows two columns. I can say 2a 6 works. I can say it 1 into 12 is also 12 right so obviously now the thing is what is the difference because np a range and np range and both of these are giving an output that's the first output that's the second output can you tell me what is the difference in the two or they are the same what do you think of course they look the What do you say? Very nice. Very nice. Correct. This is onedimensional array single square bracket and this is twodimensional array. Superb. Very good. Very good. 1D array and the 2D array. That is absolutely fine. Okay. Now this reshape I said whatever you put over here must the multiplication must be equal to 12. So 3 into 2 is 6 into 2 is 12. 3 into 2 into 2 is also 12. It does create a three-dimensional array. You can see it has created three arrays each of size 2 + 2. So if the multiplication is equal to this it works in the same way. If you want you can create here 2a 2a 3 3 2 are 6 6 2 are 12. You see over here two arrays of two rows, three columns and now you see over here this is four dimensional array four brackets. You never require it at least I never required it ever ever ever but this is just for your conceptual understanding that okay a 4D array is also possible. Okay. So, next thing after this is uh if you have a numpire. Now, this is uh I can call it as advanced slicing. Assume that I have a Numpy array and values are 0 1 2 3 4 5 6 7 8. Now in this numpy array um if I say bring a r of control a control s let's put it here only print a r r Oh, this is giving the output as this. Okay. See these are the rows and the column indexes which are selected. Row 0, column 1, 1. Row
Original Description
🔥Professional Certificate in AI and Machine Learning - https://www.simplilearn.com/professional-aiml-program?utm_campaign=K4st3DFOROE&utm_medium=Lives&utm_source=Youtube
🔥IIT Kanpur - Professional Certificate Course in Generative AI and Machine Learning - https://www.simplilearn.com/iitk-professional-certificate-course-ai-machine-learning?utm_campaign=K4st3DFOROE&utm_medium=Lives&utm_source=Youtube
🔥IITM Pravartak - Advanced Executive Program In Applied Generative AI - https://www.simplilearn.com/applied-generative-ai-course?utm_campaign=K4st3DFOROE&utm_medium=Lives&utm_source=Youtube
This video on Python for Data Science Full Course 2026 by Simplilearn, is designed to help beginners learn how Python is used in real-world data science projects. You’ll start with Python basics and gradually move into data analysis, data manipulation, and visualization using popular libraries like NumPy, Pandas, and Matplotlib. The course explains core data science concepts in simple terms with practical examples. You’ll also understand how Python supports data-driven decision-making across industries. By the end of this free course, you’ll be confident using Python for data science tasks and ready to move toward advanced analytics and machine learning. Perfect for students, professionals, and anyone starting a career in data science.
Related Videos:
✅ 1. Python Full Course 2026 - https://youtu.be/lRfjnTYJnJ8
✅ 2. Python RAG Tutorial 2026 - https://youtu.be/WQTaM7tBlvc
✅ 3. Data Science Roadmap For 2026 - https://youtu.be/ZpINzjm_4Ks
✅ 4. FREE Data Scientist Course - https://youtu.be/_NsDrON32ww
✅ 5. Data Science Interview Questions - https://youtu.be/vtCwJKUhizg
✅Subscribe to our Channel to learn more about the top Technologies: https://bit.ly/2VT4WtH
⏩ Check out More AI Videos By Simplilearn: https://youtube.com/playlist?list=PLEiEAq2VkUULyr_ftxpHB6DumOq1Zz2hq
#datascienceprojectswithsourcecodeinpython #pythondatasciencefullcourse #pythondatascienceproject #pythondatasci
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Simplilearn · Simplilearn · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Ethical Hacking Full Course 2026 | Ethical Hacking Course for Beginners | Simplilearn
Simplilearn
AWS Full Course 2026 | AWS Cloud Computing Tutorial for Beginners | AWS Training | Simplilearn
Simplilearn
Data Structures And Algorithms Full Course | Data Structures and Algorithms Tutorial | Simplilearn
Simplilearn
SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn
Simplilearn
Microsoft Azure Full Course 2026 | Azure Tutorial for Beginners | Azure Training | Simplilearn
Simplilearn
Shopify Tutorial For Beginners 2026 | Shopify Course | shopify dropshipping | Simplilearn
Simplilearn
Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn
Simplilearn
🔥Feeling Stuck? How Upskilling Can Boost Your Career! #shorts #simplilearn
Simplilearn
Growth Hacking In Marketing | Learn Growth Hacking Marketing Strategies | Simplilearn
Simplilearn
🔥Cracked 3 Job Offers with One AIML Course! | 20–30% Salary Hike #shorts #simplilearn
Simplilearn
Top 10 Must-Have Figma Plugins for UI/UX Designers in 2026 | Figma Plugins | Simplilearn
Simplilearn
Business Analytics Full Course 2026 | Business Analytics Tutorial For Beginners | Simplilearn
Simplilearn
Simplilearn Reviews | Getting future-ready with course in Artificial Intelligence | Roopam’s story
Simplilearn
Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn
Simplilearn
Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn
Simplilearn
Simplilearn Reviews | How David Went From Seasoned Engineer to AI Innovator #GetCertifiedGetAhead
Simplilearn
Complete Social Media Marketing Strategy for 2026 | Social Media Marketing Strategy | Simplilearn
Simplilearn
🔥Top 4 Cybersecurity Certifications You Need! #simplilearn #shorts
Simplilearn
🔥Cloud Engineer Salary in India 2026 | City-Wise Breakdown #shorts #simplilearn
Simplilearn
Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn
Simplilearn
Full Stack Java Developer Course | Full Stack Java Developer Tutorial for Beginners | Simplilearn
Simplilearn
Social Media Marketing Full Course | Social Media Marketing Tutorial For Beginners | Simplilearn
Simplilearn
How To Create LLM Chatbot Demo 2026 | Build a LLM Chatbot From Scratch | Simplilearn
Simplilearn
Digital Supply Chain Management Certification | Supply Chain Management Course | Simplilearn
Simplilearn
AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn
Simplilearn
ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn
Simplilearn
Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn
Simplilearn
ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn
Simplilearn
Simplilearn Reviews | Integrating AI & Music | Diego's Story
Simplilearn
Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn
Simplilearn
SEO Full Course 2026 | SEO Tutorial for Beginners | SEO Training | SEO Explained | Simplilearn
Simplilearn
PMP Vs CAPM: Which Certification Should You Choose? | PMP Vs CAPM | Simplilearn
Simplilearn
Complete Data Analyst Roadmap 2026 | How To Become A Data Analayst In 2026 | Simplilearn
Simplilearn
Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn
Simplilearn
🔥5 Jobs That Are Most Likely Safe from Layoffs in Today’s Market #shorts #simplilearn
Simplilearn
🔥Git vs GitHub – What's the Difference?
Simplilearn
What Goes Behind Building the Likes of Uber and Netflix? | Product Management Tutorial | Simplilearn
Simplilearn
AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn
Simplilearn
Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn
Simplilearn
Product Life Cycle 2025 | Stages Of Product Life Cycle | Product Life Cycle Tutorial | Simplilearn
Simplilearn
Project Management Full Course 2026 | Project Management Tutorial | PMP Course | Simplilearn
Simplilearn
PCB Design Course 2025 | PCB Designing Explained | How To Make PCBs | Simplilearn
Simplilearn
Python Full Course 2026 | Python Data Analytics Tutorial For Beginners | Simplilearn
Simplilearn
🔥Top Product Management Skills You Need to Succeed in 2026 #shorts #simplilearn
Simplilearn
SQL For Data Analytics 2026 | Essential SQL Commands | SQL Tutorial For Beginners | Simplilearn
Simplilearn
Simplilearn Reviews | Paving Way To Success With AI & ML Course | Soumik’s Upskilling Journey
Simplilearn
Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn
Simplilearn
Learn Snowflake In 45 Mins | Snowflake Tutorial | What Is Snowflake | Snowflake Explained
Simplilearn
🔥ML Career Tip – How to Start Learning Machine Learning in 60 Seconds! #shorts#simplilearn
Simplilearn
🔥Agile vs Waterfall in 60 Seconds #shorts #simplilearn
Simplilearn
Excel Full Course 2026 | Excel Tutorial For Beginners | Microsoft Excel Course | Simplilearn
Simplilearn
What Are AI Agents? | Types Of AI Agents | AI Agents Explained | AI Agents Tutorial | Simplilearn
Simplilearn
How To Create a Product Roadmap In 2026 | Product Roadmap | What Is Product Roadmap | Simplilearn
Simplilearn
SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn
Simplilearn
🔥What Is Phishing? #shorts #simplilearn
Simplilearn
Cloud Computing Full Course 2026 | Cloud Computing Tutorial | Cloud Computing Course | Simplilearn
Simplilearn
Simplilearn Reviews | Overcoming Rejection & career plateau to finding a New Job : Bhaskar Banerji
Simplilearn
Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn
Simplilearn
Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn
Simplilearn
VLSI Design Course 2026 | VLSI Tutorial For Beginners | VLSI Physical Design | Simplilearn
Simplilearn
Related AI Lessons
⚡
⚡
⚡
⚡
Web Scraping with Python in 2026: Best Libraries and Anti-Bot Strategies
Dev.to · Etrit Neziri
Python for Data Science — Probability Basics for Data Science
Medium · Data Science
Python for Data Science — Probability Basics for Data Science
Medium · Python
The Survivorship Bias in Your Funnel Data: Why Drop-Off Analysis Misses the Point
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI