Applied Data Science With Python Full Course 2026 [Free] | Python For Data Science | Simplilearn
Key Takeaways
Covers applied data science with Python, including data analysis and machine learning
Full Transcript
Hey everyone, welcome to the applied data science with Python course by simpler. Today data plays a huge role in almost every industry. It help organizations understand patterns, make smarter decisions and solve real business problem. So if you want to learn how to work with data using Python, this course is a great place to start. So in this course, we will take things step by step and build your understanding from the ground up in a very simple and practical way. Here's what we'll cover. We'll begin with the basics of Python and the learning setup. So you will get comfortable with the environment and understand how to start working with code. Then we'll move on to NumPy where you will learn how arrays work, how to perform mathematical operations, and how to handle data efficiently using dimensions, shape, indexing, vectorzation, and broadcasting. After that we will explore pandas which is one of the most important libraries in data science. You will learn how to work with series and data frames organize data filter it and handle things like categorical values and date time information. Next we'll look at data visualization using libraries like mattplot lib seaborn where you will understand how to use charts like scatter plots, histograms, box plot, pair plots and heat maps to explore patterns, correlation, outliers and missing values in data. We'll also cover the mathematical foundations of data science including linear algebra, vectors, matrices, probability. Now these concepts are very important because they help you understand how data science and machine learning work behind the scene and by the end of this course you will have a solid understanding of the core concept needed to start your journey in data science with Python. Also if you want to build a strong practical skills in Python and data science check out simply learns data science with Python course. This is designed for learners who wants to understand Python programming, data analysis, data visualization and machine learning in a very structured and beginnerfriendly way. You'll get to work with key libraries like NumPy, Pandas, Mattplot Lib and Seaborn and also gain hands-on exposure on important concepts like data cleaning, feature engineering, statistics and real world data analysis. On completing the course, you will receive a course completion certificate from SimplyLearn which can help you strengthen your profile and showcase your skills. Check the description below for the link and start your data science journey with Simply Learn today. Before we jump in, here's a quick quiz for you. Which Python library is mainly used for working with tables of data like rows and columns. Your options are NumPy, Pandas, Mattplot Lib, or Cabbond. Let me know your answers in the comment section below. >> Course is all about Yeah. Yeah. I got it. I got it. Applied data science with Python. Okay. So if we talk about this course in particular right the learning path that we are going to take is initially we will start with the basic course introduction that it is introduction to data science which focuses on data science and its applications that's going to be covered today. After completing this straight away we start with the advanced libraries of Python that is numpy. All right. So today we will start with the advanced libraries of Python that is numpy which focuses on concepts of numpy and its uses. Then after numpy we will move on to pandas which is definitely a basic library required for data analysis including types data structures using pandas and function. Then we move ahead to the data visualization library which focuses on visualization techniques and different types of charts. Another important aspect of data science is statistics and mathematics. So which focuses on fundamental of statistics, its type, data categorization, concepts of scalar, vector. Then we will move forward to probability distribution, how distribution is carried out. Then we will also understand in detail advanced statistics concepts, probability which will help in analyzing the results through hypothesis testing. And last but not the least, we would be practically loading the data set and performing analysis on it with the help of data wrangling and feature engineering. Getting my point? I hope today's introduction is helping you out to understand what is expected from you. How we are going to move forward in this data science journey. Yes. So can the audio be turned on so that we can communicate directly instead by chat. If you want to communicate with me you can always say to unmute you so that we can discuss also. So that's not an issue had if you want I can unmute you. We can have a discussion if you have any particular doubt but unmuting everyone can create a chaos in the session. Got it? Got it? Yeah. So again I'm telling you the format. Generally you can communicate through chat but if you have issues you can tell me to unmute you. We can discuss. You can even share your screen. I can help you out with that. keep my uh you know discussions very very birectional right and we will go in for break in the middle of the session that's after two hours and poll will be conducted even last 10 minutes we will keep it for discussion Q&A if you have any issues got it dharun deep tulapati soda sivagami yeah now tell me if you still have any doubts regarding the course the prerequis requis it. Please let me know if you are familiar with the format of simply learn. We give lots of hands-on exercises of Jupiter notebook. There is at the end you're supposed to complete the course and projects and of course the ebook and the reference material. So are we excited to start this journey? Yes, learners are we excited to start this journey? Yeah. Yeah. You will do the assignments right the daily assignments that we can discuss it on the uh session itself and the project is submitted at the end got it the yeah and I hope you are also aware about the LMS your learning management system from where you get the course material learners are you aware about that the learning management system right you can yes learners do you understand this or not LMS no just 15 uh days required just dika after the course ends then you will get two weeks to um submit the project. Okay. So this is the LMS. You can login into your LMS and this is the screen uh or the dashboard since I am taking lot of courses. So you have to log into your applied data science course. You see this on the left hand side you have the instructor slides, notebook, lab guide, incremental capstone data sets all this reference material I request you to download and be ready with that. Now clear heads say and any extra material that I will share that will also be shared on the LMS. All right. So as you all are very much familiar now that what is data? So Deepak says I have five years of experience from data background as a business. Oh, you're already working into this domain. So you have a lot of knowledge. So you're working in Tableau or PowerBI. What kind of business analyst you're working like deeper? So says you already have a software background. So you which language are you familiar with? Can you just give me uh the background? HTML, web development, PowerBI. Okay. That's good. So you have idea about data analysis deep that's great good SQL and VBM my working experience is process engineering sagnik saying posess pursuing MBA in rural management from symbiosis is it xim is symbiasis is that uh sagnik have two years of x consulting all right says web application developer full stack net sequence 10 plus on a break now okay Aron I'm inity assurance. Lakshi is in quality assurance. Okay. Daran is pursuing be maybe at a profession level. I used Python for simple. Oh great heads great. I'm having 2.5 experience in sales manage in uh okay so oh so that was Xavier Institute of Management Bhneshwar. Okay XIMv okay got it got it son. Got it. Great, great to hear from you all and as we as I move along I'll I'll keep on asking please give me your background how is it practically going to be used to you so I I love interacting uh from my learners and you can always share your experience especially analyst analysis where this learnings can be really helpful you can always always share it uh with everybody uh because it's a learning from all of us and you can always say I want to unmute share my experience knowledge knowledge it would be beneficial to all. Okay. So if I talk about Tulapati 13 years of experience in production support okay Tulapati that's good that's good says five plus India in CFD and thermal simulation what do I understand by CFDA I don't know what is CFD Navd says 20 years in IT infra and azour cloud so Navd you have quite working in that and now you want to f you know shift into this data analysis domain Nav Navep is that the case? Da says computational flu. Okay. CFD is computational fluid dynamics. So how is it useful? What is computational fluid dynamics? I would like to hear something on that. She says I have 10 years of experience from banking. I have been into operations and technal roles. Okay. currently on a career break looking into application of machine learning in banking industry having knowledge of okay that's great you have knowledge but you want to uh now totally get into the data science era right so Dika can you just put us a light on what is this computational fluid dynamics how is it really uh useful data science is used for pre-post-rocessing automation of the CTF if you can help out with that if I can unmute you. Yeah, please unmute yourself and just shed a light on data analysis. This is something very new to me. Yeah. Yes, Dea. We can't hear you. You also please check your mic settings. We can't hear you. It's unmuted. Please check out log out. Log in. I heads was able to later on do it. Something issue with your mic. There we go. Please check on that and the time the moment you are ready you can you can always tell it to me to unmute so that we can have a good discussion on that. Okay. Okay. So we all understand what is data. Data is nothing but raw facts and figure and I have been uh in 10 years of experience in automation. Okay. So Sudha. So now let me start with the course and then we'll do the interaction again. Now my question is is data part or is data now a big hit or is data has always been part of the human civilization? Yes. Is data always been part of the human civilization? That's my next question everybody. Always part. How can you say that? Tulapati the mani says yes it has always been but now it's been tracked and traced now. How is it tracked and traced now? Why is it a big hit now when it has always been part of a human civilization? Why is it hit now? He say why why is it a big hit now? And if I talk about um let's talk about ancient man. What was the data that ancient man used to use? Let's say we are just trying to understand that is data why is you know data now easy to track and trace is data has been always part of the human civilization. If I talk about ancient man where the man did not have anything it used to live in caves and make food by burning fire and doing that then also there was data. How was data stored in those days? Tell me. Hi, tell me Deepak. Yeah, we used to write it. It used to carve it on stones, not exactly paper, maybe on the leaves and bark. People who survey that kind of data are known as archaeologist. Yeah. They they even people who really study that ancient time. what was the how old were the coins and what is re written and you know the different statues associated with that. So they are known as archaeologists they have their own way of analysis as with the time you know especially with the invention of wheel and the steam engine. The industrial revolution became part of the human civilization. Agreed. Then it was next change which was there the industrial revolution you know the clo the everything became more faster and in large production. Do you all agree? After the steam engine, the next big revolution was the invention of computers or the information technology, the internet, right? Which has, you know, where we all converge. And now that data that was being stored in terms of files, paper, you know, where all the documents were kept, right? Now they're all being stored into the form of digital data. That is why you see a big buzz you know around globally you know where people are talking about saving of energy where this data is going to get stored you know data centers the energy to save uh to run these huge amount of data. Do we all hear that now? Yes or no? So Deepak says that data has always existed but the scale, speed, tools have now have now make it transformation. Absolutely correct. So it has always transformed with uh you know evolution or revolution and why data science feels like a revolution. It's not about having data but about finally being unable to yes we are able to unlock his power because with the help of computers the data is now digitally stored right and now the this digital data is being generated in huge amounts. We understand now data in terms of big data with five e volume variety right veracity velocity and the value that we want to add to the data. So there is huge amount of data where we can do analysis and bring benefits to the companies. Got it? And now we are living in the fifth big uh revolution of human civilization which is known as the AI revolution. And the new oil for the whole world is not the crude oil but data. So the every country is running for that particular power because now whichever country will have you know the biggest power resources the energy resources for storing the data processing the data will become the main part right do we see that globally also it is such such a big impact is happening globally you know people are just want to want to become the big powerhouse right heads mani Okay, can I call you Padya? Because there are two three Deepak. So I think so. Padhya is a good way I can call you. Can I can you can I do that if you allow me? Yeah. Thank you. Thank you so much. Right. So now you know data is the powerhouse uh you know semiconductor devices because see ultimately it is the hardware and the software which will go together. Data is not something alone which is kept. So it is the hardware and the software where we combine together to move forward. Got it? Getting my point? And if we look at data more technically, please try to understand. If we look at data more technically, it is divided into two parts. One is known as the categorical part. This is known as the uh yes AI can replace a data scientist. It can do the analysis but very important part of adhya we need to understand with all the AI that we see around is who is going to results of the AI right there is no verification when we talk about different LA uh you know the genai tools the the automation happening so human involvement at least if you ask me personally from my exper experience is extremely extremely important. We cannot be completely dependent on it. If you ask me from my personal view whether it's even a small response even a small answer that you get from chat GPT don't rely verification of the fact the answer the code or the result or the automation definitely needs to be done at the human level. Got it? Of course, it is going to make the work faster, right? Like for example, making PPS, designer PPS was a hardcore time. But now you can generate it in a in hardly 2 minutes time. But who's going to verify points covered or not? Got it. Heads say all getting my point learners. Please be very very clear with this point because that's the big revolution. I see this was never part of the discussion. Do you think it was part of the discussion even in my last three sessions in my of my career? No. But now I have to talk about Genai the output of it how is it affecting the whole thing? Right? It was never part of the discussion even 3 years back or rather two years back also with evolvement of all the chat GPT now claude anthropic is going to take big wave in the market you know they're trying they're always talking about that so we will see these changes in front of our eyes you know we are witnessing the revolution itself okay so will we also learn autonomous using Python here no autonomous this Python here that's different course that's geni course okay no autonomous over here here we will te learn the very basics how to code how to understand different libraries and how analysis can be performed we will not rely on any tools first learn the basics then only you will be able to analyze the results of the uh genai or the AI tool whether they are correct or Got it. Is this point now clear? Right. So I understand now learners have more confusion. There is so much to explore. So much to explore even at my level you know being in this field for the past 16 years. I have to learn a lot uh you know and there are so many tools which you know are there still uh open in the market which I have to learn explore and they are really really helpful to us. What level of statistics knowledge is needed? Yeah, statistics knowledge very basic statistic knowledge is required. Uh in terms of mean, mode, median, the graphs, the charts I think so that's more than enough and probability is going to play a major role you know in whole of AI machine learning probability has a major role to play. So few concepts of probability would also be covered. Got it? Yeah. Okay. So now if we talk about data, data can be divided into two main parts. First is categorical and second is numerical. Categorical data. What is categorical data? Which can divide it into categories. For example, what is your marital status? Are you married, divorced, widowed, etc. Single etc. Which political party do you belong to? Do you belong to the dominant Dominican Republican nationalist? What kind of party do you belong to? Eye color do you have? I eyes green, blue, brown, black. What kind of eyes color do you have? Right? So, categorical data. Then numerical data is further divided into discrete and continuous. Here I'm talking about absolute technical knowledge. How we understand data for analysis. And if I talk about discrete data, they are nothing but the counted item, the number of children defects per R. And if we talk about the continuous data, right? It is all about the weight, the voltage, that means the data can range infinity to plus. For example, the weight uh you know of any uh object can vary from 0 to 10.37 g to 115.34 kilogram or even more than that. So it's continuous that means it can take decimal values getting my point clear right. So now let's understand how practically data science is used in the real world or in for the business domain. Let's try to understand that particular point. So if you talk about data science, data science is basically you know used or the model that it works on it is known as the DI KW model. So what do we mean by DKW model? data. DIKW model stands for D stands for data, I stands for information, K stands for knowledge and W stands for wisdom. Okay. So let's understand it with a simple example what these terms mean. Data is raw facts and figures. We understand for example sales of a car company for the last one year. So data is nothing for the sales of the car company for the last one year. Processed data please try to understand there is a slight difference between data and information that processed data is known as information. What does what do we mean by that? that now when I have the sales of the car of the company which are the months which have maximum sales maybe I can represent it with the help of dark blue and the green ones you know with the minimum sales of the month getting my point are you understanding? So this is where statistics comes into the picture. Finding maxima, minima, average, right? The uh the standard deviation, the variance of the data. Getting my point learners, right? Now if I talk about this data information, this is we are talking about last one year. But knowledge is comes with experience. Knowledge will always come with experience. I'm very knowledgeable. It doesn't mean that you worked very in one year and you know the experience. So rather than taking the data for the last year, I talk about the data for the last five years. What I try to observe is that the same pattern is getting followed that during the month of festivities maybe during Navaratra's Christmas time or those times the sales of the car increase. This is what I have. I don't observe it in my last year but for the last 5 years and there are few months where the sale is very very low. Right? So this is the insight. This is what I detect from this data the pattern or this is the insight that I get and using this insight the wisdom that means now how I can use this knowledge this wisdom this insight to impact my business. So now I can tell my sales and marketing team to apply more offers during the you know the maximum um during the m uh during the maximum months. So how my sales and marketing team can work to have more impact on the business. Did you get this point suda head? So how we are actually using the data to have impact on the business and I'll give you one very practical example which was you know it's it's like it's now old it's like 24 uh 24 years old right because now we are living in 2026 it's a 24 no it's a 22 year old um example still it has a lot of impact that there was a a hurricane Francis was about to hit the Florida's Atlantic coast. But it was Linda M. Dilman, Walmart's uh Walmart CIO. She pressed that it's better to analyze the data that hurricane Charlie struck several years earlier, several weeks earlier. So what she says is right. So the idea is that uh you know that she thought that one one uh uh hurricane which struck 2 weeks before right let's analyze that data that what people were buying what was the data or the history that shows and that thing can be applied to this hurricane also. So what do you think are the uh you know items that people would like to buy when you know I say there is a natural calamity. Now we see these events a lot. So there is a natural calamity about to hit. What are the things that you think that you know people would want to buy more right? Basically that was that's the idea. So what our you know idea says that when there is a natural calamity which is about to hit. So the Walmart's uh people would actually think that people would buy more of the bottles, the flashlight, the groceries, the food items, right? Exactly. The shelter materials. Absolutely correct. But in the actual scenario, the same thing happened. What happened is the New York Times reported that the experts mined the data and found that the stores would not indeed need certain products, just not the usual flashlight, but it was the strawberry poptarts increase in sales like the seven times the normal sales rate. And ultimately, yes, Miss Dilman agreed that the prehurricane top selling item was a beer. So can you imagine this thing? What was the item which actually came out to be a big hit and that proved a big profit to the organization? It was not what we think logically but what the data speaks after the analysis. Getting my point? Are you understanding the power of data analysis and data science? Now everybody is getting this point and it's 22 years old, right? Example. Okay. So here I've just prepared a few uh you know uh quick uh okay we'll do the Python questions later on. Let's let's start. Yeah. Siva. Yes. Sivi share it. The example says that the hurricane was about to hit on the Florida's coast. Right. But a hurricane had uh hit before that. So she said that let's uh you know start you know uh doing analysis of the data where it the hurricane hit it before. So during the analysis they found it was not the grocery items or the essential items which were a big hit or a sale but it was the strawberry pop-tarts or the beer item. The sales had increased seven times. Got it now? Now why it was that that's that's another level of exploration you can check it out uh on the net you know because this is an right example uh taken right but you know ultimately something beyond what our logic says you know that what the data speaks that is what is your role as a data scientist to explore something more than what we understand and what the data speaks. Got it. Malipuri, Dika, Sivagami, Hay, Karthik, Kirana, Kamill. Are you all getting my point? So if you have downloaded the reference material, I request you all to download it. And now we begin with our lesson number two to understand what is data science. Yes, learners. Are you understanding? And the PT that I'm sharing at my level, I will be sharing it after the session. Right? Whatever material is shared by me beyond the LMS, I will share it on the LMS. Okay, got it everybody? Are you are we good to go or still anybody has any doubt please let me know. All right, got it learners. So in this particular lesson, we will understand what are the basics of data science, how the different data science processes are steps are carried out, what are the different uh you know uh packages used for uh data for data science and different types of plots available for visualization. All right, so we begin. So now do we understand the term data science learners in a more better way? So data science is a multi-disciplinary field. Why multi-disciplinary field? Because it does not only involve computer science or programming language but statistics, maths, you know, linguistic, every every field there and which uses scientific methods. We will don't draw conclusions randomly. that the conclusions are drawn on certain facts and experiments. It's you know we don't randomly accept any facts. Got it? So the field that uses scientific methods, processes, algorithms and systems to derive meaningful insights from structured and unstructured data. What do we mean by structured and unstructured data? Structured data refers to your tabular data such as your Excel files, your CSV file, SQL data are all structured data. And what do we mean by unstructured data? Anybody who understands this term? What do we mean by unstructured data? Yeah. Audios, videos, log files all come under unstructured data. Absolutely correct, man. Absolutely correct. Clear. Is the definition now clear to everybody? What is data science? So using a search engine or making a purchase on the Amazon provides valuable data to the data sciencedriven software systems operating in the background. Data on interactions with online platforms is gathered to understand user preferences and suggest search results or items to buy. So the idea is to give profit to the business to give more meaning and insights to the whole idea. So data science as I told you is a combination of sub subject expertise, scientific methodologies and technology such as mathematical and statistical model scientific tools and method such as Python which can operate not only on Mac even on Windows even on Linux. Different design, different libraries are available and different data processing tools are also available which help in data science such as Tableau and PowerBI are also very very important tools. Even SQL is an important tool. Right. Right. Right. So now if we look at the application of data science which we see all around us for the first uh you know application is in healthcare. So we all wear smart watches right? So our smart watches capable of telling our health. They are able to calculate the BP, the temperature, how many steps we are doing and how many and it tells if you're sitting for long go and take a walk and is it helping out? are are are the smart watches even you might have heard about uh the story where the smartwatch was able to predict that the person is having a heart attack and then the person was able to rush to the hospital and get itself cured that that's the prediction with the smartwatch had made have you have you heard that story you can check it out on internet so that's the advantage you know sometimes your BP is getting low your temperature is getting higher you stop activity so your smartwatch was capable enough you know because your movement and your body uh symptoms were different so it was able to predict that. So all the data gets collected it gets transferred to the server and can be analyzed and can help you in giving more informed decisions in improving your health in in you know in getting a more diet or more calorie conscious uh person. So that's how it can help in decision making. Getting my point? Is this example getting clear to everybody? Yes or no? Only to Arun. Only to Arun. What about others? Come on learners, you can respond. Devika, Dhuv, Kituka. Okay, that's great. Another example that we see normally and we never realize that this is data science that whenever we are typing into the Google prompt or anywhere you know it keeps on recommending the words whether we want data science in healthcare or a data science in healthcare research paper these kinds of recommendations are there. So which make it fast and realtime analytics is made possible by modern and advanced infrastructure tool. Now there is a big change in this kind of technology also. Now what is the technology that we generally use? Anybody who can give an insight this there is when we are doing any kind of search in today's uh scenario are we generally typing generally typing then what are we doing to then what are we doing to tell me can I call you Krishna not really autofilling no we are generally speaking talking to the machines okay now tell me AI is a general word they yes through voice we are now the machines are capable of understanding what we are talking to them isn't it come on uh you know search this for me this image for me that image for me we are talking we don't even want to type isn't it getting my point is this point getting clear to everybody or not and this is again a a part of AI I which comes under the domain of NLP. Please try to understand technically this comes under the domain of NLP that is natural language processing. Okay. So of course that's not part of this course. We are just here to understand data science and the journey is a little bigger that you have to understand then machine learning, deep learning, then NLP and then the computer vision etc. Okay. In finance domain again data science has a major role to play. Are you applicable for a loan or not. So when you want to file an application you file like what is your earnings? How many dependents do you uh are there in your family? Are you uh you know do you have a medical insurance or not? What is your civil score? So based on the data analysis you know your credit card, credit history, approved amount and risk a decision can be taken whether you you will be granted loan or not. Clear? So somebody into the banking domain NLP civagami is natural language processing. All right. So I think so now the data science how practically it is impacting our lives that point is getting clear to everybody. Now let's understand the different steps in data science process. First is the problem definition. We need to understand what kind of problem we are looking at. Right? You should be clear with the definition. What like what is the project? What are the people involved in it? Right? Once we are clear that why do we want to do analysis of the data then we move on to data collection. Very important that now data is in different forms. There's variety of data structured and unstructured data coming from different sources. There's variety of data. So integration of the data is also important. And the third and the most important thing is integrity of the data or rather not the integrity authenticity of the data. That data needs to be reliable. If you are working fake data, will you ever get correct results? If you are working on fake data, will you ever get uh correct results? You will never ever get the correct results. Getting my point? Right. So we have problem definition, data collection and here we have data cleaning and exploration. Got it? Right. After you do the data collection over here. Here we begin with data science that practically we would work on data cleaning and exploration then move on to feature engineering that the categorical data needs to be converted into um numerical data data binning is important feature scaling is important. So till here we are going to work over here in the data science right this course will cover up to this particular point model building uh you know training are part of the machine learning and the deep learning course. Okay. So model building and training are the part of the machine learning and the deep learning course. Then we go ahead with model evaluation and final deployment. Clear? I hope you are understanding is this journey that this is what all we are trying to cover in this data science journey of ours. Yeah. Difference between data cleaning and feature engineering. Data cleaning is cleaning of the missing values, null values, duplicate values. Feature engineering is transforming the data so that the data can be fed into the model. Cleaning and exploration is only dealing with null values, missing values, duplicate values. But feature engineering is preparing the data so that it can be fed into the model. Got it? I'd say and whereas feature engineering as I told you involves scaling, bin, winserization, uh encoding these are the uh you know concepts that we will understand in feature engineering. So now are we clear? The first step is defining the goal or the question to be addressed through the data analysis forming the foundation or for the subsequent steps. Data collection is gather relevant data sets or information sources necessary to address the defined problem. Data cleaning and exploration pre-process the data by handling missing values, outliers, other inconsistencies and explore the data to gain insights. All right. But feature engineering is transforming the features data set so that it can be put into the model's performance. Model building and training we need to understand different algorithm analysis so that we can fed into the data and ultimately it is evaluate optimize and fine-tune the model for peak performance. Python is a preferred language for data science because it's a highlevel language, readable language, interpretable language and moreover it supports multiple packages such as NumPy, Pandas for data cleaning, exploration and visualization. For visualization we would be covering Mattplot lib, seabborn and plotly in detail. Got it? So that is why this course is little advanced on the basic Python that we would straight away start from nump state straight away start from these uh libraries such as numpy pandas uh etc. Got it? As I've been telling you why or what is the biggest advantage of Python? It's an open-source interpreted highle language right that supports object-oriented programming, ease of use, simple syntax, scalability, availability of the wide variety libraries, compatibility with all major operating systems, creation of new data science libraries daily by vast number of online communities because it has an it's an open-source language language. So it has wide variety of community powerful visualization etc. clear. So the different packages that we are going to uh cover in this particular course are numpy. We'll start with numpy. It is a python library for scientific computing. Supports large multi-dimensional arrays, matrices and includes comprehensive mathematical library. Then we will move on to pandas which is efficient storage and manipulation of the structured data. After pandas, we'll move on to sci. It's a scientific python open-source library on top of numpy which is used for implementing the scientific formulas. Stats model is another library which is used to estimating many different statistical models and conducting statistical data exploration. So we have understood statistics is going to be an important part of data science. So there are certain libraries which help in implementing uh doing the statistics part of it. Then we have the scikitlearn which is widely used open-source machine learning library for Python and known for its simplicity also right so scikit we would not be doing in a lot of detail because it's mostly used for machine learning and if you talk about deep learning it is the pietorch and as well as the keras which is mostly used as the frameworks or the library yes but we would be covering in detail the data visualization ation library that is mattplot lib which is used for building static animated and interactive visualization for different graphs such as line plots, scatter plots, bar bar charts, histogram, pie charts etc. Got it Lana. And if we talk about Seaborn, Seaborn is a data visualization library in Python that is built on Mattplot lily. Then we will also cover plotly for creating interactive publication quality graphs and visualization and it is suitable for web based applications also. Clear? Are we getting all the points till here? Any questions? Any doubt till here learners? Just a question. Is it necessary to learn arrays? Where do we apply? Oh, we are going to learn a lot about arrays. Many and it has immensive applications. Okay. All the unstructured data, audios, videos, images all are stored in terms of arrays. Okay. So that's the first thing we'll start with. So nothing to worry. Okay. So let's let's do a quick revision of different graphs types of plots with example. Let's do a quick revision. So the first type of graph is a line plot. What does a graph consist of? A graph consists of x-axis and then yaxis. Right learners? A graph consists of x-axis as well as yaxis. And the line plot what does it do? It is nothing but it is always connected by straight lines often used to visualize trends or relationship between two variables over the time. So what are the you know graphs which show any changes during the time it could be the weather report. It could be stock market. It could be the sales over a month right which help us to take investment decision. Agreed learners? And if I just just to make the appearance better of the line line plot, I can add markers to it specific data points where there is a dip in the sales or when there is a rise in the sales or any kind of temperature or it could be any other entity. So a markup plot displays data even it could be your marks of a subject right result of any uh you know college school university right so a markup plot displays data points with markers useful for scatter plots and visualizing individual data observations. So a markup plot is used to display individual data points on the map such as marking specific locations for the survey. Great SA great. Then we also have this scatter plot. Scatter plot is nothing it is a collection of points plotted on two axis horizontal as well as vertical. Right? So over here both the x and the yaxis are numerical data. For example, if I want to find out the relationship between height and weight of a person uh you know the sales and the price of petrol. So where both the quantities are numerical then we use the scatter plot. So scatter plot analyzes the relationship between two variables like comparing the height and weight in the population. Clear? Are we understanding the different plots? And another important graph is the area plot which is also known as the stack plot. Please try to understand the area plot is also known as the stack plot plot because it is built on one on top of each other. An area plot represents data with shaded areas useful for showing cumulative totals or proportions over the time. So an area plot visualizes the cumulative data with changes over the time such as tracking total sales. So for example, if I want to keep track of the sales of a company, but you know this represents the first quarter sale and I think so we all are familiar with these graphs. We all understand these graphs. Let's say it's a very common popular graph which is known as the bar graph which is used for categorical data which shows the rectangular rectangular graphs that show vertical and horizontal data comparisons based on the other axis usually xaxis right so this is the data 1 2 3 4 5 it could be the sales the prices for comparisons right and grid plots where we uh divide the uh you know the map or the graph into different uh uh horizontal and vertical lines which help in assisting chart viewers in determining what value an unlabelled data point represents. So grid plots understand each value of the data points in detail and it also helps in enable sidebyside comparison of the multiple plots enhancing visual analysis. It gives the distribution of the data and this is for numerical data. Please try to understand this graph is used for numerical continuous data. Whereas if we talk about the bar plot, it is used for categorical data. Here is the technical difference. They both are not the same. All right learners, please try to understand things more technically. That bar charts are used for categorical data and whereas histograms are used for numerical continuous data and of course they give the distribution of the data set by dividing the values into bins. What are bins? Dividing it into these small groups, right? And representing the frequency of each bin with the bars. Got it? So histograms visualize the distribution of the numerical data like income levels or the exam scores and they help in finding the characteristics of the data the underlying patterns and guiding decision making process. And last but not the least I think so this is the graph which almost everybody uses that's the pie chart where you know the whole entity is taken as and parts of that entity are taken as its fraction or the pi right we all understand this graph which is a circular graph in which data are plotted when components and se as a segments of pi and the idea is it shows the proportions of the whole like the market shares or survey responses Got it learners? So finally a quick recap of what we have done in this lesson number two that data science involves the analysis and interpretation of the data to generate actionable insights. Now having understood the definition right and now after this understanding we will start working on the numpy library that is numerical python which is nothing but an open-source library predominantly used when working with arrays. Seaborn is a data visualization library in Python that is built on top of Mattplot lib and Python is a preferred programming language for data science projects across the industry. No. So, Numpy is the advanced numerical Python library available, right? And as we understand that Python is totally an open-source library. So, it has this open-source community which helps in moving with the numpy. So, I'll just send this link to you. Please check out this numpy link. Since Python is an open-source community, all the data things are available that NumPy helps us to create powerful n-dimensional arrays. Numerical computing tools, open-source, interoperable, performance, easy to use. All these things are the practical use of NumPy library. Here you can run the code also and it can be used for pandas, stats models, signal processing, image processing, graphs, network. It has enormous uses. Got it. Mani I think so that was your question. Sagnik, Sivagami, Laksh, are you all getting this point? Right. So this library has lots of uses. Now let's start exploring this library. Okay. So fundamentals of numpy. This is the link which is there. So numpy is a numerical python package which is free and it is an opensource library that is mostly used for mathematical operations in scientific and engineering applications. What are the other advantages? It is a Python library used for working with arrays. It consists of multi-dimensional array of objects and collection of functions for manipulating them. It conducts the mathematical and logical operations on array. So it's a very very powerful library which helps us to perform different mathematical operations on array. Another important point to be noted over here is the array object in the numpy is called the n dimension array. So numpy is a numeric python. It is a package for computation for creating homogeneous n dimensional array. Can you all tell me what does the term homogeneous mean? Can you tell me what does the term homogeneous mean? So the arrays are that they consist of same data type. What does the term homogeneous mean? That the arrays are of the same data type. We will understand it practically. So what are the different properties of arrays? Arrays are mutable. What does the term mutable mean in Python? We understand that arrays are mutable. What does the term mutable that I that can be changed or modifiable? Absolutely correct. Let's say by the users. So it's modifiable. Second, it is homogeneous. What does the term homogeneous mean? What does the term homogeneous mean? They are all of the same type. Absolutely correct. can be accessed using integer position as we can access different elements using list and tpple. Similarly, it can be accessed using integer position that is indexing. What are the two types? Yeah, what are the two types of indexing available in Python? Great man. What are the two types of indexing available in Python? Positive as well as negative indexing. Very good, Arun. Arrays always deal with numeric data. And third, last but not the least, it has high performance in calculation. That's the beauty of numpy arrays. So now let's understand how are arrays different from the list. So the first and the foremost point is that list consists of heterogeneous data. That means it can consist of a numerical value, integer value, float value, boolean value as well as string. And it also so stores the pointer to the data location. Please try to understand. It also stores the pointer to the data location. And when I talk about numpy arrays, it only stores the data directly and that two in continuous memory locations one after the other. Therefore, accessing the elements are faster and easier. Is this point getting clear to everybody? And can anybody tell me which memory address are we talking about? What not memory address rather which memory uh uh does the data or the variable get stored? Excellent Many, you were very quick on that. Yes, it is the RAM, the volatile memory which stores all the variables. Clear? Is this point getting clear to everybody? So how do we go about creating arrays? First and foremost, it is not part of the basic py Python. Therefore, we need to import the library. import numpy as np. So we need to import the library. Second most important thing is np do. array is the name of the function to create the array. So if I pass the value as 0 1 2 3 the array will get created. And how do I how do I know whether it's an array or not? By using the type function I can know what is the type of a and it belongs to class numpy dot nd array. ND array stands for nd dimensional array. Then the different attributes of the array are a dot endm. A dot endm gives the dimension of the array. It belongs to one dimension. Shape gives the number of rows and columns. So this data consists of four columns. So therefore four value. And the number of items is four. Therefore, the value of the length is also four. Are you all getting this code or do I need to repeat this? Yes, learners, are you all getting this code or do I need to repeat it? Everybody got this? How do we create arrays that we import the you know the library? Then np array is the function to create array. And these are the function which give me the type, the dimension, the shape and the length of the array. Got it? Now right now do we understand the range function in Python? Do we understand the range function in Python? Right. Yes, when I give range 40 that by default it will start from zero right and it will go till 39. It will start from zero and go till 39 right. So we have np dot arange function right and by using the shape attribute I am now changing these 40 elements into twodimensional array with rows equal to five five rows and columns equal to 8. Clear? So it will consist of five rows and eight columns. Getting my point? Can I change its shape to eight rows and five columns? Can I change its shape to eight rows and five columns? Yes. Why? Because the number of elements is the same. 8 into 5 is 40. Can I change it to four rows and 10 columns or vice versa? Yes. Can I change it to four rows and four columns? No. Because it consists of 16 elements. Can I change it to 2 into 2 into 10? Can I change it into that into three dimension? So that's the beauty of arrays. So what do we mean by onedimensional array? Onedimensional array consists of only one axis. Therefore the shape consist of four since it consists of 0 1 2 3 columns and onedimensional arrays are also known as vectors. Getting my point? Onedimensional arrays are also known as vectors. Two-dimensional array it consists of axis 0 and axis one. Axis 0 refers to the number of rows. It consists of two rows. Axis one means columns. It consists of three columns. Therefore, the shape becomes 2, 3. And two-dimension arrays are no is known as a matrix right 3D array you can consider it as a slice of bread. When you buy bread they are slices. So you can consider axis zero as the slices of the bread. Uh so there are four slices. Each slice consists of three rows and two columns. Right? So the 3D array the axis and shape are strongly connected with each other. Whatever is the dimension that many axis you will have and that will define the shape of the array. Clear? Is this point getting clear to everybody? And if we look at the different attributes of an array, it consists of its shape. Shapes gives me the number of dimensions of rows and column. End gives me the dimension of the array. D type gives me the data type of individual array. All right. in 32. 32 represents the number of bits required by the operating system to store. So 32 refers to the number of bits. And if we divide it by 8, why 8? What is one bite equal to? What is one by equal to? What is one bite equal to? So nobody knows in this batch 8 bits. So when I divide this uh 32 bits by 8 bits I get the answer of item size in terms of four bytes. Got it? Now are you understanding all the attributes learners? So have we understood the advantages of numpy that it provides an array object that is faster than the traditional Python list. It provides supporting functions. Arrays are frequently used in data science and they are stored in continuous place in the memory unlike list. We are understanding all these points. So now you can run this first code. Whatever code you feel is missing, you can ask me. I can give you the code or you can type in. Right? First try to find out the version of your numpy file. Tell me learners run this code and tell me the version of your numpy file quickly. Sivagami sagnik Tamil Daad tell me the version of the numpy. None of you are able to run the code. Good. Many. Many has 2.13. Kamill has 1.2 4.4. Even older version than mine. They've installed Anaconda pretty 2.35. Aron. That's pretty latest. It's double underscore. Yes, it's double underscore. Okay. So, what is the first and foremost thing? So, everybody is able to run the code. Nobody's getting any error. Everybody is able to run the code. Nobody is getting error. Great. Okay. So, what is the first thing that we are supposed to do? We are supposed to first import the numpy library. Right. Do we see the code? Right. And then we are printing this array. The type of array it belongs to is n dimensional right uh sorry the type it belongs to class numpy nd array the dimension is one the shape is 3 comma the length is three and the data type is that they all are of integer data type are you able to run this code everybody and if I create a array of a string. Come on. This code you this code is not there in your file right copy it. So if I create an array of name right this also belongs to numpy. Nd array the dimension is zero. Why? Because there is only one um word associated with it. Right? So the dimension is only one uh sorry zero and the shape is again zero and the data type is uni code 11. So the string data type is stored as uni code 11 learners are you there with me? So if I have a array right now what do you see the let me remove this and now 1 2 and there is this one float type all others are integers. So what do you see the output it makes the integers as floating point. Why? Because arrays are homogeneous data structure. They contain the same data type. Right? So all the elements now become of float 64 type. Got it learners? All the elements become of float 64 type. And the moment I add a string onto it, you you see the output. Now all the elements have become of the string data type and now the data type is uni code 32. The elements over here is uni codes for these two. Got it? And now let's start creating numpy arrays and multi-dimensional array. So the array with zero dimension that is only with one value is known as a scalar value. Please try to understand this highlighted code. An array with one value is known as the scalar. So any one value I have put it as an integer floating value. So the zerodimensional array is known as scalar. onedimensional array with one axis. Can you go back to the data type? Which data type? The previous one, head. Yeah. Yeah. What is the confusion? The data type is uni code 32. And you might say that ma'am here it is uni code 11, right? So it automatically decide seems it's becoming a list instead of an array. No, no, this is array only. It it's like an array. But here it's a it's a it's a scalar value. Here it is a scalar value. Got it? Here the list is converted to an array. Now clear list are list homogeneous or heterogeneous data type list and tpple are homogeneous or heterogeneous? Heterogeneous but arrays are homogeneous. Okay, that's also one big point of difference. Okay, I have a question. Does one data type supersede the Yes. Yes. The floats are super superseding the integers and the strings are superseding the integer values. Do you see this? So if I define one of them as string, all of them are becoming string. Yes. Yes. Yes, good question, man. But the irony is, you know, but the irony is that being a numerical Python library, it gives more preference to the string as compared to the numerical value. Isn't it? Isn't it ironical? Since it is a numerical Python library, it it should have more advantage give more advantage to a numerical value and nothing else. Right? Very very ironical. Yeah, that's that's what you know that that's the analysis which comes up Clear now. Clear. H say is your uh issue resolved? Yeah. So here we have this scalar value. Then we have onedimensional array. One-dimensional array is known as a vector. Twodimensional array is known as a matrix. Now a quick trick how to understand one-dimensional twodimensional that if you have these double squares that means it is two-dimensional and if you have these triple square brackets it's threedimensional one then one and if there is nothing then zero dimensional. Are you understanding this quick trip and trick everybody? Right? Are you able to see the output everybody? That if this is the output, it's a zero dimensional array. This is onedimensional. This is two rows and three column. Now look at threedimensional array. Since it consists of two matrices, the first axis becomes two. Second, each matrix consist of two rows. Therefore, the next dimension is also two. And each matrix consists of three columns. Therefore, what is the shape of three-dimensional array? 2a 2a 3. Got it? That is why the shape is 2a 2a 3. That is it consists of two matrices with two rows and three columns. Sivagami soda tlapati are you all able to do it? Odkumar Deepak Vad Kamill. So are you feeling the need of u you know the basics is missing upadaya the beginner level are you feeling difficulty in the code the learners who were saying that they don't know the basics of python are you able to cope up with this code or not but who all is facing difficulty sagnik says to learn need to learn the basics are you finding making it difficult Sagnik to interpret the code Sagnik. So my suggestion to you is rather than you know uh having struggling in this particular session it's better you take up the Python refresher course uh that will get make the concepts of Python more clear and then you can get back to this course. I would suggest that sadik the learners who are facing difficulty in loading uh understanding this code then I don't think so at this point going forward because the code is going to get more complicated with uh numpy pandas then visualization so I request to uh to all those learners who don't know the basic of python should go through the basics and then get back to the data science course Right? Or Sagnik, you would have to put in more effort to understand the basics. Got it. Anybody else who's facing difficulty, who's finding this course tough, tough, very tough, irrelevant. A quick feedback on that. Sudhanya, Shivagami, Tulapati, Ud Deepak. Would we learn the applications of array with examples? Yes, we are learning the arrays with examples. Yes, because when we not here in this but as we move to linear algebra and when we understand uh deep learning then of course arrays will come into the picture. Rather tensors are all n dimensional array man it's not an error. It says that the memory has been exceeded. So that's the disadvantage that it is the list is unable to store the huge amount of elements. Have you got this? Have you understood this point practically? And secondly, secondly, percentage time. This is a magical function. the percentage time which helps us to calculate the time required to multiply these one elements with two 10 times. So this loop works 10 time for i in a range 10 and array into two right. So now when I run this I get this output in terms of 234 milliseconds and if I want to see the output same output for the list what is the time it takes you see it takes much more longer than what arrays were taking. It's still not giving me the output. Let's see. So it takes 13.2 seconds. Do you see the huge amount of difference? Yes, learners, are you able to see the difference? So, which one is better? Arrays or list? Arrays. Arrays are much faster in communication. That is why arrays have been built. This library of homogeneous data points has been built. Clear? Everybody has got this point learners. Now let's move on to our next live uh you know uh file. Yeah that is 3.02A. What we are talking about over here is that we see that the time taken by performing the same function on arrays and list list take longer time to do the same operation as performed by arrays. So definitely arrays are much more faster and efficient than list. Now clear. So what makes arrays faster than the list? You tell me man my told you the reason. What is the reason? I told you the reason. How are arrays stored in memory and how are list stored in memory? The memory allocation part that numpy arrays are stored in one continuous place in the memory. Whereas list stores the pointer to the data. Now got it. Is this point getting clear to everybody? Now I want everybody to be ready with this 3.02. So what are the different attributes of numpy array? The first is the end dimension which gives the dimension of the array, the shape, the size, the data type as well as the item size of the array. Yeah. So, have you been able to load my file everybody? So, what is the first thing required to create numpy arrays? What is the first thing that is required to create these numpy arrays? We import right numpy as np. Yeah. The library. Can you tell me the dimension of this array? Can you tell me the dimension of this array? Why? There are only how many square brackets are there? Two. So it has two dimension right? So the dimension is two. The shape is 2a 3 because it consists of two rows and three column. Size it consists of six elements. The data type is integer type. The item size is divided by 8 and array dot data gives me the memory location. If I look the look at the output. Yes learners are you there with me? So this is the array. This is the type. This is the dimension, shape, size. Array stores integer 32 divided by 8. So the size of one array element in bytes. So each element takes four bytes of memory. An array's data is at memory location. This No, good to go. So now if this is my again twodimensional array size of one array in elements is 8 for me. Yes, man. Then this must be in 64 for you. This must be 64 for you. 64 divided by 8 is 8 only 8 bytes. Got it? So it depends upon your uh system. It depends on your operating system. Okay, man. See, it's it's system dependent. Okay, learners. Okay. So, moving ahead over here. So, dimension is completely based on the axis. Axis 0. What does axis 0 represent in a matrix? Tell me quickly what does axis 0 represent in a matrix or in a two-dimensional arno? No, it it represents the columns. Oh, sorry, the rows. Axis 0 represents the rows. Okay, man. Axis 0 represents the rows and axis equal to one represents the column. Very good. Got it. Learners shape gives me the size of the array. The output data type is always tuple. So what is the shape? It consists of two rows and three column. This consists of three rows and two columns. Got it? When I talk about the size, it is the total number of elements in the array. It is equal to actually the total multiply it 2 into 3 6. So here the shape is how many elements? 3 comma 4. If you look at this particular matrix consist of three rows and four columns. So what will be the size? 3 into 4 12. Got it? Yes. Learners, are you getting this? Moving on to the data type, it shows the data type of elements in the array. Numpy in 32, numpy float 64 and numpy in 16 and numpy in 64 also. It all depends on the operating system used. Uh it completely depends on the operating system that how many bits are being used to represent that particular element. Got it? Yes, learners. Is this point getting clear to everybody and of course item size. Item size shows the length of one array element in bytes. So float 64 64 divided by 8 because one bite is equal to 8. It is equal to 8 bytes. complex 32 divided by 8 it is equal to four bytes and numpy array data is an attribute offering direct access to the raw memory of the numpy array got it clear learns any questions any doubt till here. Dulabati, Sagnik, Kamill, Sivagami, Nancy, ets, Kamill, Badya, Dika. No, I hope these PPS are helping out in understanding the concepts better. The PPS that I prepared for you all. Are they useful? Many? Yeah. So now let's understand some basic functions of the arrays. Right. The first function is transpose. What does transpose do? It interchanges the rows and column. No, they are not from the LMS HQ. I will be uploading it after today's session. Okay, I'll be uploading it on the LMS after today's session. Okay, so nothing to worry. There's nothing to lose, right? So transpose is interchanging the rows and column. So if you look at the shape of this array it is 2a 3 and when I use the transpose it becomes 3 comma 2 right it becomes 3 comma 2 clear right another function is flatten it converts any uh array any n dimensional array to onedimensional array. There are two ways to perform this flatten function. One is in row major order or we can give order is equal to C with the parameter equal to C. Now what does row major means that first the elements of the rows are flattened and then the second uh you know elements of the second row like 1 2 and 3 4 column major order. This is like one uh elements of the first column and then the elements of the second column are fatten flatten using the parameter order is equal to f. Getting my point? Is this point getting clear to everybody again? When I say a do.flatten by default, by default it is row major that is order equal to c that first the elements of the first row then the second row and so on. But if I have passed explicitly order equal to f then the elements of the columns get flattened. No order code for row major. Yeah, by default it is row major only. But if you want to pass the parameter you can pass order is equal to c. Otherwise no no need. Okay. B is equal to A flatten by default. This is a row major. Yes. Yes. Yes. M. Yes. And then we have the reshape function where you can reshape the data from one dimension to one dimension to higher dimensions and from higher dimension to lower dimension. No restriction on that. So if you look at the shape of this data, how many rows are there? Six rows are there and one column. Six comma 1 can be reshaped to 2a 3 can also be reshaped to 3a 2. Agreed? Similarly, original shape is 3A 4. It can be reshaped to 6A 2, 2A 6, 4A 3, 1A 12, 12A 1. Right? See the multiplication factor has to remain the same. But I cannot reshape to 4. Right? I cannot do that. But can I reshape to 2a 3a 2? Can I reshape to 2a 3a 2? Right now I want to tell you one major difference. the difference between copy and view. Please look here learners, please try to understand the code. Right? We have created the array A over here and I have created a copy of A and I have substituted onto X. Now if I make changes in my original array, will there be changes in my X? No. Why? because X points to different memory location and A points to different memory location. So if I make changes in my A, it does not get reflected in four in uh X getting my point. But if I create a view of this B array, whatever changes I make in B that are reflected in my X because why? Because B and X are now pointing to the same memory location. Getting my point? Is this point getting clear to everybody? So now let's understand these concepts practically. So getting back to 3.02 right? So first is the reshape function which helps us to create a new shape of the current elements of the array provided they are have to be of the same uh you know factors or rather the same elements. So if I have created this onedimensional array of 12 elements this is how I get the output. Learners are you there with me? Yes learners are you there with me? Everybody is able to run the code. Now can I change this dimension of one dimensional array to 12A 1? Is it possible? Can I change it to 12A 1? Yes. Can I reshape it to 3A 4? Yes. Can I reshape it to 2a 6? Very good. And can I reshape it to three dimension with 3A 2A 2? 3A 2 means the number of slices or matrices. Two represents to each row and other two represents to the column. So these are my three matrices with two rows and two column. So this is my three matrices with two rows and two column. Got it? Can I reshape it to further dimension that is 2a 3a 2a 1 1. So with reshape you can change it to lower dimension to higher dimension, higher dimensions to lower dimension. Agreed? Yes. Learners, are you there with me? Great right and this if I give reshape equal to minus1 it will automatically flatten it back to one dimension Okay. Now coming on to flatten function. These are the parameters arguments that can be passed. By default is C which is means the row major style as we have seen it. F stands for column major order. A and K are very less used because that was used in very older version of um Python rather used in Pascal and forotron. So a means to flatten array elements in column major order and if the photon contiguous and in memory or row major otherwise k means to flatten array elements in order two elements laid out in the memory. So as we have seen that a and b a is my array b is a dot flatten by default it will do it in row major order and this flatten is equal to order f which is going to be in column y so kamill now you want a break you're just about to end the session in the next 40 50 minutes till you want a breakl is asking for a break learners do you want a break or no maybe a 5 minutes break is helpful. I can give you that. Okay. So let's get back to the discussion. So this is my flatten. So by default we understand it is in row major order. First row and then the second row. And when I give order F, it is in column major order. This is clear. But the important point to be noted over here is that B is a copy. That means it is a new variable created in the memory. So what are we need to understand that flatten creates a copy of the system not a view of the array. What does that mean? That if you change elements in array B the elements in array A are also not changed. Clear? Is the difference clear? So flatten creates a copy of it. And how can we check? If B is equal to 10, we have make changes in B that is reflected in B but not in A. Clear? Are you all there with me? Everybody is getting this point. Yes, learners. Great. Right now moving on to flatten function. It returns the copy of array flatten into onedimensional array. So flatten will always always create any n-dimensional array to one-dimensional array. So this is a threedimensional array. And when I do an array flatten dot order f, what does order f mean? column major order. So first and then seven then two and then sorry then four and then 10 then 2 and then 8 then 5 and 11. Getting my point? Is this point getting clear to everybody? Everybody is able to run the code, right? So this is how we get this one D a array threedimensional flattened. So if I do it by default it is going to give me 1 2 3 4 5 right and when I run the transpose transpose interchanges the rows and column. So if this is my twodimensional array with two rows and three column it gets changed into three rows and two columns. Got it? clear learners. So if this is my array and now I reshape it into threedimensional array with four rows, three row, four slices or four matrices, three rows and one column. Yeah, I can specify the dimensions of the transpose. Yes, I can. Let's see for the first example. If you look at over here, this is my first array. And when I take the transpose of it, the 2a 3 changes to 3a 2. This point is clear. Okay. Now look look at another example here. What am I doing here? I am creating a one-dimensional array and reshaping into a threedimensional one with the shape four matrices, three rows and one column. That point is also clear. I've created a three-dimensional array. Now when I run the transpose of it, so it changes the 4 comma 3 comma 1. Now do you see the transpose? This has got interchanged with this and this has got interchange. So now I have one matrix with three rows and four columns. Got it? Mani, Devika, Kamill, Odla and as you can see every file has a assisted practice associated with it. Please look here learners. Please try to practice this assisted practice here like over here this is the temperature uh you know list given over here array and once we convert it into an array you can find out its dimension shape size the data type the item size. So with my files uh hetski the advantage is you get the solutions of these assisted practice also. Okay. So, will you all be able to do it the assisted practice? Please do that so that you can understand more and understand it in a much much better way. Dura is ready. Hitski is ready. Quick quick response learners. Quick quick. Karthik Keruba, Laksha, Madhuberta, Darun, Ud, Sudana, Sudana. Okay, great. Now look at over here. What are we doing? This was the question that I asked you in MCQ. You remember this question? This was the question that I had asked in MCQ. So what were we doing? that list one is equal to 1 2 3 and when I do an asterisk two it is not multiplying each it is repeating the list agreed learners are you getting this point but if I want to multiply two with each of the elements in the list how do I go about it if I want to multiply each of the elements. How do I go about it? I have to explicitly run the for loop. Yes, absolutely correct, man. We I have to explicitly run the for loop. So, what am I doing over here? I have first created an empty list for i in list one and I keep on appending each term by multiplying it by two into the list two and therefore that's how I get the element 2 4 and six I mean multiply it by two got it is this point getting clear to everybody Ready? Other way is do we understand this kind of notation? Anybody who understands this notation, what is this concept known as? Very good. Many only man understands in this batch. Nobody else. This is known as list comprehension. It saves on lot of code and lot of extra memory space right I don't need to define another list empty list three I into two every element would get multiplied by two for every element in I just by writing this one single line I am able to multiply each of the elements now clear right so now do you see the difference. So now if I have to multiply or perform any arithmetic operations on the list, I have to explicitly run the for loop. But look at the beauty of array. When I just write a into any for loop directly I get the output or if I'm doing addition of the two I directly get the output. So isn't it giving me fast mathematical results? Right? And how and why is it giving me this is because it works on two main concepts. Please very very important point of numpy library that we are about to understand. It runs on two main important concepts that arithmetic calculation works on the concept of broadcasting. Please try to understand learners. The first point is broadcasting that is making the arrays of the same same shape. Broadcasting is the process that makes the arrays of the same shape. And second concept is vectorzation that is using for loop for element byelement operation. Got it? Broadcasting is making the arrays of the same shape and vectorization is using for loops for element byelement operation. Getting my point? Now again to make the concept clear, if it was a list, we have to explicitly run the for loop where each element gets multiplied to it and then we get the output. But in arrays, how do we go about it? I'm sorry for the pen. Again, I don't know on this slide, it always gives me this issue. I've seen that. Anyways, so for for arrays, how does it work? First, the broadcasting would happen. That is the arrays will become of the same shape, right? And then automatically add back end element by element multiplication will happen to get the output. Clear? Is this point getting clear to everybody? To understand it even better, let's understand it with this example. I want everybody to be concentrated here. What is the shape of this array? What is the shape of this array? Shape. I'm not asking the dimension. Hudsky, it's 4, 3. It consist of four rows and three column. What is the shape of this array? The second array. Many hats up. What is the shape of the second array? Again it is 4a 3. Why 5a 3? Sagnic 1 2 3 4. How come? Five. Right. So both the arrays are of the same shape. Yes. Both the arrays are of the same shape. That means no broading is required and element byelement operation is done. element byelement operation is done and then we get the output. So it is not necessary that every time the broadcasting will happen but definitely yes vectorization implicit for loop will run to perform the mathematical operations. Okay. Now look at the second example over here. What is the shape of this array? It's 4a 3 and the other one is 1a 3. Are they of the same shape? No. But the columns are matching. Do you see the columns are matching? And can I expand this one to four rows? Yes, I can broadcast one to four rows. And therefore now the two arrays become of the same shape. Sorry. And therefore vectorization or arithmetic addition happens. Clear? Give me a minute. Something slow. My net is working slow. So this is four rows and one column. And what is the shape? One and three. Now you would say ma'am nothing is matching. Please try to understand if the if one of the shape or the axis is not equal to one the others have to match. But in this case it has four rows and one column. So one of them is one. Can I expand it to four rows? Yes. Again over here it has one column and three columns. Can I expand it to three columns? Yes. So broadcasting happens in both the arrays to make them of the same shape and size and therefore the arithmetic operations happen. Clear? So the important point to be noted over here is that it is not necessary that broadcasting will happen and it is not necessary that it will happen for both the arrays. It all depends on the dimensions. Now clear let's understand it with more examples. Please look here learners. Okay. So first of all we have NPA range with element zero two. The shape is zero. Oh sorry one row and three columns. This is a scalar value. So it again gets expanded to one rows and three columns and then the vectorzation element byelement addition happens. Clear? Are you all getting this point? How on dimensional and scalar values are getting added? Then np do.1's this is 3a 3 that means we have created a matrix of the shape 3a 3 np do once create all the values once and what is np a range this is 1a 3 now since three and three are if this would have been four it will not broadcast it will give you an error since three and three are matching and this one can be expanded to three therefore So both the arrays become of the same shape and we get the output. So over here three rows and one column. Now do you see this? This gets expanded to three rows uh sorry three columns and this gets expanded to three rows to make them of the same shape and therefore vectorzation happens. You have you got this? This is the most important concept of numpy library that why are arithmetic calculations fast? Because of the concept of broadcasting and vectorization. Clear learners. This one. This one. Yeah, np do. Means that we are creating an array of three rows and three column with all values equal to 1. That point is clear. N a range means that it will start from zero go till two and the shape is 1, 3. Now since the columns are matching over here, can I expand this one row to three rows? Yes. So it gets expanded to three rows. So now both the shapes are 3a 3 and I get the output better man any doubt there we come but where broadcasting will not happen where the columns will not match or if they are not one suppose if one is 1 comma 3 and the other one is 1 comma 4, right? Then if these two are not matching, then the broadcasting will not happen. 4, three, the prior one. This one. This one. Yeah. What is the issue? Tell me. The concept is that if the arrays are not of the same size, then broad will happen. Broadcasting will only happen if one of the element is one or the other two dimensions are matching. One of the element has to be one to be broadcasted. Broadcasting will only happen if one of the element is one. Like this is one. It gets broadcasted. This is one row. It gets broadcasted. This is one column. It gets broadcasted. This is one comma 3. So one row gets broadcasted. If any one of them is one np a range five this is a scala value so five gets broadcasted into uh 1 comma 3 now clear this is if you see this is 1 comma 1 so it broadcasts into 1a 3 now clear hatsky man it is not necessary that the broadcasting will always happen If the two arrays are of already of the same shape then vectorization will always happen. It is the vectorzation the implicit for loop will always run. Now better. So now if you look at this particular example is broadcasting happening over here. Tell me a getting multiplied by 2 or a + 2 is broadcasting happening over here. Yes or no? Yes. Because it will make it of the same shape. and then multiplication or addition will happen. So vectorization will definitely happen. Broadcast depends on the shape and size of the arrays. Clear? So the different arithmetic operations that can be performed in numpy arrays are np dot add using the plus symbol, np.tubract using the minus symbol, np.gative negative using this negative unary sign. NP dot multiplied using asterisk NP. Divide. What is the difference between this and this symbol? Tell me quickly. Single division and you know multiple division also known as flow division. What is the difference? This gives me the quotient. Nobody remembers. This gives me the right that is five. What is 5 divided by two? It is 2.5 by single division. But if it is flow division, it will only give me two. And how do I get the remainder? using the mod right if I do 5 mod 2 the answer will be answer will be one the remainder modulus or the remainder clear is the difference getting clear between the operators that's part of the basic python so when I do an array with this shape and B. Now tell me is broadcasting happening over here? So one-dimensional array with zero dimensional array right? If I am adding on one dimensional array with zero dimensional array is broadcasting happening this is a vector this is a scalar. Yes. Is broadcasting happening? Quick response learners. Yes or no? Yes, it will become 10 10 10 and then element byelement addition. Agreed? You can use np dot add or a plus sign also. Let's understand subtraction. Yes, we are using np do.ts subtract or a minus b. Is broadcasting happening over here? What is the sha shape of these two? This is 2a 3. This is 2a 3. Is broadcasting happening over here? No, no need. But vectorization is element by element vectorzation happening 30 - 10, 40 - 20, 60 - 30. Yes, good. Hatsky. Good. And see Kamill. Now what about here is vectorization happening over here? Sorry broadcasting happening for A and B in multiplication is broadcasting happening over here for A and B? No. Because again they are of the same shape. Vectorization definitely yes. What about division? A and B of the same shape and size is broadcasting happening over here. It's no or yes it is yes because this is twodimensional. This is one dimensional. So it will repeat it one one. So broadcasting is happening and then we get the output. Clear? H that's key. What is the shape of this? This is 2 comma 3 and this is 1 comma 3. So this will get repeated. The one will get repeated and the columns are matching. Got it? All right. So the power of the power of is a and b again no broadcasting same shape. So it means a to the power b 2 ^ 2 is 4 2 to the^ 3 is 8 2 to the^ 6 is 64 right and why is it known as numpy? Now can you tell me after understanding the concepts that why is it known as numpy? Can anybody tell mebody tell me why is it known as numpy and why is it a homogeneous data structure? Why do we create it as homogeneous? See basically we are going to deal with numerical data. We don't do a lot of arithmetic calculations for strings right. So it's the data with either V be of integer type or float type. That's important. That's why it is homogeneous. It does the numeric computation fast because of the two concepts that is vectorzation and broadcasting. So this is the strong foundation whenever we talk about numpy right. Yeah. Because it deals with array. We create arrays. what are the properties of that array that you need to tell Sudhanya. Arrays itself is not uh you know things. So when we create arrays you the properties of arrays that you need to understand is that they are mutable that they can be chained of the same type. They can be accessed using integer positions. Right? indexing as we understand positive and negative indexing and they always deal generally with numeric data and they have high performance in calculation and one of the basic list and an array is that list is an heterogeneous type of data which show which stores pointer to the data location. It stores pointer to the data location so that accessing can be done. And whereas we have this numpy array where all the data is stored one after the other memory location that makes its accessing much more faster. And implicitly when we say broadcasting and vectorization, broadcasting is making the both arrays of the same shape and size. And vectorzation is that implicitly it runs the for loop at the back end. Getting my point? These are the points that you have to remember while answering the questions in your interview. Simply saying that it deals with arrays homogeneous will not uh you know be impactful. Now coming on to the practical part of it. How do we create arrays? First and foremost we need to import the library numpy and it is the nparray function which helps us to create the arrays. Right? We understand type this belongs to class numpy dot nd array. So a is an object of class nd dimensional array. The different attributes of array are end name which gives the dimension of the array. Dot shape which gives the row and column. Length gives the different number of elements in the array. And as we had the range function in array in normal Python sorry range function in Python we have the a range function for arrays in numpy it works exactly same as the range which will create elements from 0 to 39 and we can shape it to 5a 8 and we are very clear that the you know the parameters that we are passing over here is only those which are the factors of eight when we multiply it they should be factors of the 40 can I change it to three dimension also yes 5a 8a 1 can also work clear are you all getting this point learners yeah today I'll be uploading this headset today I'll be uploading it because we'll be completing this topic once we complete one topic then only I upload the uh slides today I'll be doing that fine that's it is it fine okay now the dimensions of the array the dimension of the array is very strongly linked along with the axis and along with the shape a onedimensional array is also known as a a onedimension array is also known as a vector Wrong man. Zero dimensional arrays are known as dimensional arrays are known as scalar. So zero dimension arrays are known as scalar values. One dimensional arrays are known as vector value. Twodimensional arrays with x's equal to zero and one are known as it's known as a matrix. And of course three and three more than three dimensions are known as n dimension array. This point is getting clear to everybody right. And what are the different attributes of an array? The shape. The shape gives me the number of rows and column. End name gives me the dimension. Dot d type gives me the data type based on the operating system. The number of bits required to store that data. Item size gives me the size of one item in bytes and size gives me the total number of elements. Clear? The attributes are also clear to everybody to the functions. What does the transpose function do learners? What does the transpose function do? It interchanges the rows to column or columns to rows. Either of the things can work. Absolutely correct. What does the flatten function do? What does the flatten function do? Reshape can reshape the array to any dimension from lower to higher or higher to lower. Yes, different dimensions. So, transpose is also clear. Flatten. Now flatten can happen in two ways. What does row major order mean? When the order is equal to C, this is the by default order. That means first the elements of the first row are flattened and then the next row or other one we can have order F in which the elements of the first column are flattened and then the next column. Clear? location view is the same memory location. Right? So copy is creating a new memory location or a new copy and the original is not disturbed. Very good learners. Good. And reshape we have to understand it has to be in the factors of that particular number. So we can reshape the objects also. Clear? So this is also different difference is clear. Now when we move on to the arithmetic calculations that need to be performed in arrays broadcasting is the first concept that first of all the arrays have to be of the same shape. No calculations can happen unless the arrays are of the same shape. Right? And it is not necessary the broadcasting always happens because if the arrays are of the same same shape then vectorzation will definitely happen. That is when we are using for loop for element byelement operation that is the implicit running of the for loop so that each and every element can calculated. getting my point. So when we are talking about the looping logic over here right in in arrays this is this talks about the looping explicit looping logic that we have to apply for the list. But in vectorization logic first the two arrays will become same that is the broadcasting happens and then element byelement multiplication happens to get the output and this is the power which numpy library has which makes its arithmetic performance faster getting my point this is the power which again I'm repeating numpy library has which makes it faster. Getting my point? So again repeating those examples since these two arrays are of the same shape 4a 3 no broadcasting happens but definitely vectorization happens because element byelement addition of each element. Here the is 4a 3 and this is 1a 3. Please remember learners the columns are the same and if any of the parameter is one then only it can be broadcasted. The rule is that the other parameters have to be equal or one then only the broadcasting happens. But in this case what happens? This is 4a 1 and comma 3. So if you look at rows one of the parameter is one. So we will expand the rows and this parameter is one. We can expand it to three columns again making it of the same shape so that vectorization logic can happen. Now clear mani soda tla dvika I hope this revision is helping to everybody and if I talk about npar range function right it creates again a 1 comma 3 array for me with one rows and three column and I am adding a scalar value to it right. So this is one row and three columns. So this is 0 1 2 and this is five. And therefore we keep on adding elements onto it. Clear? Right. So broadcasting is happening over here. And when we look over here we have this 3 comma 3 array and 1 comma 3. Since the columns are same and one of the uh parameters or axis is one, it gets expanded to change the shape. Similarly over here this is 3a 1 and 1a 3. They both can get expanded in rows and column to perform arithmetic calculations. And if there is a mismatch that the columns do not match. This is 4a 3 and this is 1a 4. See the one can get expanded but if the columns are not matching then the broadcasting will not happen. Clear? And now we are supposed to understand indexing and slicing. So what we can do is let me complete the theory part and then we move on to the practical file. Does it help learners? Let's quickly do that. So there are two types of indexing in Python positive and negative. And that indexing is supported in our arrays also. Right? So if we talk about onedimensional array, it's normally like a list and tpple. We understand positive indexing starts from the left hand side zero and moves from left to right indexing. Similarly, negative indexing starts from right to left. So if I say what is the value of a at equal to two? Can anybody tell me what is the value of this array at index position two? Can anybody tell me? So it is equal to three. We all understand. And if I give this as minus2, what is the value? at minus2 the value is five. Yeah, this is simple as we have been using in a tuple or list, right? When I talk about twodimensional array, here we have the rows and column. So if we are talking about positive indexing, we start from zero and go till two. And if I talk from columns, talk to column, it is from 0, 1 and two, right? So here, how do I access any element? Suppose if I want to access five, this refers to the first row and first column. If I talk about this, we are talking about zero row and the second column. Getting my point. Can we have a mixture of positive and negative indexing in arrays? Yes. Please be clear with this particular point. This point is not there in basic Python. that can we have a mixture of positive and negative indexing? Yes. For example, rows I've given positive indexing 0 1 2 and columns as min -1 -2. So I can access this as 0 row and minus2 column. Can they all be negative also? Yes, it can be -1, -2 and -3 also. getting my point? Okay. So, another important point that you need to understand is when we are talking about twodimensional array, the rows always come before the column and the comma. Before the comma, we are talking about the row value and after the comma we are talking about the column value. Clear? Now, let's see how well you can answer this question. Right? So this is the array before the comma I am talking about the zero row and if you remember slicing which column am I talking about third and the fourth column because the fifth is not included. Agreed learners are you understanding my point? So if I talk about this, this is 0 1 2 3 4 and five row and this is again zero. So whenever we are talking about this twodimensional, it's a two-dimensional array. So the rows if I'm talking about positive indexing, I would start my rows with 0 1 2 3 4 5 values. Agreed? and my columns as 0 1 2 3 4 5 this point is clear if this is given my array this point is clear okay now look here now if I am asking about this indexing okay if I'm asking about this indexing before the comma this is my row so I'm talking about this as my zero row and after the col comma Do you understand slicing soda? This is going to start from the third column, right? 0 1 2 3 start from the third column. Five is not going to be included. So the fourth is including. So what are the intersecting value? I will get the output as 3 4. Now clear Sudhanya Hki Dv key Deepak Aron Navd what does four colon mean? That means we are starting from the fourth and going till the last. That's for the row also. That's for the row also as well as for the as well as for the column. We'll start from the fourth and go till the last. Right? So which are the intersecting common elements between them? 28 29 34 35. Now tell me what will happen in this case. Before the comma there is only colon. What does this colon stand for? Scolan stand for only Many knows the answer. What about others that we are talking about all the rows? We are talking about all the rows and second here about the second column. So we get the output as 2 8 14 20 26 32. Agreed? Getting my point? Is this point clear? Now Suda? Yeah. Now who's going to tell me the last point? There are two colons over here. Who's going to explain me the last point? What is this concept known as Arun? This this is known as basically this concept of single colon. Single colon is known as slicing and concept of double colon is known as striding. Agreed? Right. So what happens over here? Here if I am talking before the comma I'm talking about the row. So if this is my zero 1 2 3 4 5 so starting position is two. I start from two and I will go till end. The step parameter is also two uh two. So it will jump two steps. Okay. So we are talking about the second and the fourth row. And if I talk about columns, it starts from zero. 1 2 3 4 5. It starts from the zero column. Jump to parameters. And it's talking about the zero, second, and the fourth column. Clear? So what are the elements which are intersecting? 12 14 16 and then 24 26 28. Getting my point to threedimensional array. Learners are you there with me? Threedimensional arrays are specified with the help of three axis. Axis equal 0, axis= 1 and = 2. Right? So the first axis or the index basically represents it helps us to select the matrix. Please try to understand learners. The first index I helps us to select the matrix. Second index J selects the row and the third index K selects the column. Getting my point? Is this point getting clear to everybody? Right? So if this is my array, a threedimensional array, since I've used three square brackets, a threedimensional array is being, you know, created over here. Now if I want to access a particular element, we understand three parameters I, J, K are given. Value of my I is 2, value of my J is zero. and value of my K is equal to 1. Right learners, can you see that on the screen? So this is uh J my net is going slow and my pen is not you know that smooth. So now do you see I JK values what does I represent to the which selects the matrix. So there are three matrix. So here we have or the slices of the bread as 0 1 and two. You getting my point? This is as Z 2 right. So we are talking about this particular matrix. J is representing the row. So we are talking about this 0 with row and K over here is one. We are representing the column 0 1 and two. So this is the value that I get when we talk about the matrix, the row and the column. We get the output as 31. Getting my point? Did you get this? Everybody got this? How are we accessing element in threedimensional? Yes or no? I'm waiting for the response. No ma'am. Okay. Yeah. Please look here. Have we unders, J and K doing? I selects the matrix. Uh second index selects the rows and the third index selects the column. So if I is equal to two, which of the matrix see n dimensional array we have the number of slices. So which of the matrix get selected is 0 1 and two over here. So we are talking about this matrix J represents the rows right. So this is the zero row that I am talking about and this is and this is the k that we are talking about 0 1 and 2. This is the K that I am talking about. Clear? Is this point getting clear to everyone? Now let's see how well you understand the next example. So please be clear with the commas. Commas play a very important role. So here I is the number of uh matrices that we are going to select there. one here one colon represents the J and here the K value is 0 colon 2 getting my point right so here when I talk about I it means it is talking about the first two planes because the second one is not getting considered if I talk about J that is the row 0 1 and two so one after the two. So now since there are two matrices or slices considered we are talking about the first and the last row and similarly with the column I'm talking about the first two columns. So the elements that come out column uh in uh which are common come out to be 13 14 16 17 23 24 26 27 got my point sagami. Why I is not included in this? I is included in this. I is 0 colon 2. So it starts from it slicing 0 1. So the second matrix is not included. We are talking about the zero and the first matrix. We are including the I also sagami got it. Now J is the number of rows and K is the number of columns. Now getting my point. I stands for the matrix selection. J stands for the rows and K stands for the column. Right? So, numpy library has a lot of applications in the real world scenario whether it's data analysis, machine learning, NLP, neuroscience, image processing, robotics, climatic science, all of this use a lot of uh applications. Got my point? Can we move ahead? Are you ready? I'm waiting for your response. Can we start with 3.04? Are you ready everybody? So again, Numpy library also supports positive as well as negative indexing. So what do we understand? Numpy library also supports positive as well as negative indexing. Getting my point? Got it. All right. So here we have created three arrays. Do we understand? This is my one-dimensional array because one simple square brackets. These are two square brackets. So we use two dimensional arrays. And here we have three square brackets. We are using the threedimensional adding. Getting my pointed learners. All right. And positive and negative indexing. Do we understand that positive and negative indexing? This is 0 1 2 3 4 and 5 and minus one. So when I give a minus1, I get the answer as 6. And if I do positive indexing one, I get the answer as two. Got it learners? Everybody is able to interpret the output. Run it. And if I talk about and if I talk about this this is my twodimensional array with two rows and three columns right and if I say a is equal to element 1 comma 1 what will be the answer? Can you tell me if I want to access 1, one element, what will be the answer? This is zero. This is my first row. This is zero. And this is my first column. The answer will be five. Why are four? It will be five. Got it? Everybody is getting this uh point. It will be five. Getting my point learners? Right now if I talk about threedimensional array and shape now do we understand this? Now if I say find out the element 1 comma 1 comma 1 how will you do that? So this is my first matrix. This is my second matrix. So this is zero. This is one. So we are talking about this matrix. In this we are talking about this row and this column. So the 11th element comes out. Everybody's getting this point. Everybody has understood indexing very well. So if we if I want to access zero matrix zero row and the first column. So answer is two. Are we getting this? Why are we getting two over here? Facing difficulty. Please let me know. Anybody who's facing any difficulty, please let me know. for you different datas are displaying. Why have you not uh copied my file or have you taken it from the LMS? Even if the data is different what what difference does it make? The answer logically you should understand it correctly like I keep changing values in my file. So that could be one of the reasons sagami otherwise the arrays are the same right yes or no or that is why it's okay it's okay but are you able to get the concept or logically the output that's okay that's perfectly fine it's okay I can I keep changing things. Maybe I can put this as 1 0 1. What will be the output? So I keep making changes. So it's okay. Yeah, we can have more than three dimensions also. Yes, we can. Should I give you an example of more than three dimension? Let's say you want me to create something with more than three dimension. Okay, I'll explain you with that. It's something with more than three dimension. Can you tell me? Can you tell me how will I create that? Okay. Now, let me give you the code for that. for example. Okay. So now let's do this array a R rp like we had done. NP dot A range start typing with me. No what we'll do is we'll play smart with it. NP do a range 40. we had given like this right and now I will straight away reshape it to the dimension that I want. Suppose if I want it as 4 comma 5 comma 2 and maybe I can now break it further to 2 comma 2 right got it are you getting my point that's the beauty of reshape Okay. So now do you see this? It's a so of course a four dimension one would have four brackets. But I did not write the code. I did not put any logic. So what is it doing? One. So how do I know this is four dimension? 1 2 3 4. Right? 1 2 3 4. So it has got four dimension. Now how is it working? It has got two rows and two columns. Then what is this uh two? Every uh you know item has these two. Every item has this two two groups five times. Are you getting this point? How is it working? Now suppose if I want to increase the dimension of it, how will I go about it? How will I go about it? Since all are prime numbers, I will just keep on adding one onto it. So now it becomes one column. And suppose if I add one over here, I can increase further column. So 1 2 3 4 5 6 exactly but the number of factors need to be the same. Got it? Sa had said mani seagami that the number of factors have to be the same. Now clear right? Of course, you know, understanding and dealing with them becomes a little more uh tedious, right? So this is with a array with four or six dimensions. So this is equal to maybe I can put this as D. So now how will I access the element? Maybe element as 0 1 0 0 1 0 1. So if the logic becomes a little more complicated, you know, 1 2 3 4 5 6. So 1 2 3 4 5 6. 1 2 3 4 5 6. So what is the error? So they change index one is out of bounds for axis five. So this is out of bounds. So that's why it's giving me an error. So let me take the right element 22 gets extracted. So logic logically little changes like in rows and column then the number of slices then the number of matrices little change in that you know now clear is it better say is your doubt clear or still uh you have doubt hate man does if you want I can share this code with video you can uh play around with it. Fine. Is this code fine with everybody? Okay. So now getting back to the uh yeah the example we were here. So if I access one element I think so indexing is quite clear to everybody and can I access individual elements and do arithmetic operations on that of course that if I access individual element as 2 1 then I can do addition or uh subtraction multiplication anything clear? Another important uh factor or point for numpy library is that we can have you know mixture of indexes both positive both negative or positive and negative indexes. Getting my point? We can have both mixture of indexes. Clear? So when it is beyond three there then matrix would pick I pick up I for here. Then you have to check you know see the logic changes over here. If you you know what I have understood is that uh basically if you go back to this particular slide please look here what I'm trying to explain over here since the shape is one how many it talks about axis zero so how many columns are there four. So that is why it only gives me the columns. But the moment it becomes twodimensional, it becomes row and column. Right? But when it becomes uh threedimensional, it becomes the slice or the matrix and then rows and column. So the logic keeps on shifting of the row and column. Getting my point? So with four dimension it becomes uh with the number of matrices and other things and more and more complicated. So sometimes it's difficult to visualize also see also to understand. Got it man? So I J K is going to be the last three ones in in in the if I talk about six dimension or any of the four dimension I J K would be the last three ones and the first one would represent the other matrices now clear and that logic would keep on shifting it's not straight logic now got Exactly. With the dimension, it changes with the dimension. Absolutely correct. Got it. Arun Navdeep. Okay. So now we are clear that in whenever we are talking about indexing and slicing, we are talking about both positive as well as negative elements. Okay. So if this is my twodimensional, I can access this 1 one threedimensional is also now very much clear to everybody that we are creating these three brackets. And when I use 1 1 0 this is the first first matrix with first row and the zero element. I hope you all are getting this. I hope everybody is getting this point. Everybody you can play around with the code. I would suggest everybody to be confident enough now to play around the code and uh you check out the results. Okay. Now can anybody uh tell me that this is my array 3D1? Okay, this is my array 3D 1 and this is my array 3D. Okay, with all the three brackets, why is this giving me an error? Why am I getting the error over here? This is my 3D 1 and this is 3D. Why am I getting error over here? Yeah, because they cannot be broadcasted together. That is why it is giving me an error. Broadcasting the shapes are different. Got it learners? Negative indexing is also clear. It starts from the right hand side to the left hand side. So when I give minus3 it gives me four. And for the twodimensional it is minus1 the last row and the last element. Clear? And another beauty of numpy is we can have of positive as well as negative indexing. So now you will be able to do the assisted practice since now if you were not able to do it yesterday you're getting one week break then we can discuss it next week will you be able to do the assisted practice well I've if you're taking my files you already have the solutions also will you be able to do it and if you face difficulty we we can discuss it again right learners Now if I talk about slicing, can you tell me what happens in slicing? How does slicing work? We have one colon and then there is a start and an end parameter. If the start parameter is not given, it always starts from the zero index. If the end parameter is not given, it will go till the end of the particular array. But if there are two colons given with in between along with the step parameter this concept please be clear nobody understand this concept in Python is known as striding. Getting my point this concept is known as striding. Clear? Please do not forget they both are not same concepts. This concept is known as slicing and this concept is known as striding. Got it? Any doubt till here? Yeah, I am saying this had say that when we have this one column with start and stop parameter this concept is known as slicing. But when we have two columns with the step parameter that is known as striding fine hats. So Mi says thank you for this. I never knew the name of the concept of striding with a yes that is there. No most of you most of the you know uh people don't know this concept. This is known as striding. Clear. Great. So do we understand now if this is my uh array on one-dimensional array and if the starting position is missing it will start from zero position and go till n minus one second position 0 1 2 and if nothing is mentioned nor the stopping position it will give me all the values. Fine. Are we good to go? Now, how do we go about printing the list of three subjects from fourth index to the end? So this is 4 colon how do I go about it or - 4 0 1 2 sorry 2 3 4 it will start from the fourth and go till the end. Agreed? Similarly, if it is negative indexing -1, -2, -3, -4. So, it will start from -4. So, in this case, the value is coming out to be the same for Hadoop. It will start from Hadoop and go till the end. All right. Learners, please go back to the above line. This one example one let's say yeah what is the issue. So if you do it without the colon then what will it give you the output? Tell me. It will give me an error because you are not supposed to send it black. Obviously it's a syntax error. Either you give one value or you give the slicing now clear. Yeah, good to clear your doubts. Any question? Any other doubt learners? And what about slicing with step value or striding? If these are my numbers, If these are my numbers, what does 1 col 6 mean? 0 1 2 3 4 5 6 7. So we start from 1 and go till six. So this is the output I get. And when I'm using the step parameter as two, it will start from this jump to. So I get the answer as 7 five and then three and if it is three parameters to jump it starts from seven 2 3 7 and four will be the output. Are you understanding everybody or is it hati and mani only understanding? Can I get a quick confirmation from everybody? Are you all there awake in the session? Odair Deepak, Tulapati, Sudhana, Sagami come on respond. Mulapedi, Kamal, Dian, Devika, Arun, Akil. Great. So is 1 is to 6 is to2 and no no suda no no let's try that let's let's try this see 1 is to 6 is to2 is that I start from the first position go till the fifth and I jump two parameters all right but the other thing that you are trying to say over here is start from first colon colon 6. That means what does it mean? That it will start from the first position and jump six parameters. Are you getting this point? That means if this is zero, this is one, it starts from one, then jump six because it has to go till end. So 1 2 3 4 5 6. So the output is 7 and 1. So there is a lot of difference. So this is the answer right now. Clear what is the difference? Everybody is getting this point. Please clear your doubts. Different from striding start end. It is different from striding. Striding is the step parameter is jumping six positions. Here we are jumping two positions. Here we are jumping three positions. But here we are only jumping one position. We start from the first position and we go till this position. Got it? Let's say by default there is no jump parameter. We are moving one after the other. But here we are jumping. Now clear or stepping now. Clear. And what about twodimensional array? As we are very clear. You have to be clear very careful about the comma. Before the comma, it represents what does it represent before the comma? It represents the row we are talking about the zero row and 0 1 2 and three column. So this is the element that I am talking about. All right. Clear to everybody? And of course negative slicing is also clear. We can slice out negatively also. That means we have to first write the left hand position the lower position. This is min -1 -2 minus 3. So it starts from 56 and the last one is not included. So therefore we get the answer as 5637. So again this is one colon this is slicing. This is again slicing. Clear? Is there a negative stride? Yes we also have negative striding also. I'll show you. So if I talk about negative slicing, I'll just show you with this Now if I say col minus one what will be the answer? Can anybody tell me what will be the answer for this particular stride? It will read the whole array and give me the reverse of that particular thing right getting my point Hudson. Yeah, it's going to be reversed and if I make it minus2 then it will jump two parameters. So, Suda say is not clear and if I do minus2 it will jump two parameters right. So let's understand it over here. Now look at over here. my array of negative slice which starts from 13 and goes till 24. Now if I'm doing a negative slice the start parameter says I have to start from zero right end parameter I have to start from 24 but jump parameter is minus1 that means I'm here so I have to jump minus one minus one minus one therefore I get the output first as 24 then 37 57 reverse of the array now clear so da hat say nika I make this minus2. What will happen? Now it will this jump two parameters. So 56 then then 69 and then 34. Right? It will jump two values in the negative direction. It will always go in the opposite direction. Plus means zero. Positive indexing is left to right and negative is from right to left. Fine hat now clear to everybody. Suda still anybody has any doubt or question negative indexing or slicing is only used for reversing or does it have a lot of use? we are yet to know about mostly for reversing for reading it in the reverse manner. Okay, practically that's the only use, right? So this is again the assisted practice with solution. I hope you will practice it out in your break. Right? Will you all be able to do it learners? Okay. Now getting back to 3.03. Are we understanding all these concepts of uh arithmetic operations? How broadcasting and vectorization gives output for addition, subtraction, multiplication, division? Are we clear with the difference between these two operators? Yesterday we discussed what is the difference between these two operators. Tell me quickly values both operators give quotient. Absolutely. One gives the uh decimal value other one gives the integer. So the flow divide gives the nearest lower integer. Excellent. Good learners. Very good. So everybody is clear with the arithmetic operations that we had done yesterday. Yeah. Can we begin with statistical function? Everybody has this file 3.03. Anybody who needs this file 3.03 03. Do you need the file? Hi is the yes for that. Come on. I'm waiting for the confirmation learners. You got it? Yeah. Can we begin? Everybody has this 3.03 file. Okay. So if we talk about statistical functions in numpy. Yes. Can you tell me the difference between median and mean? We discussed this point what is the difference between mean and median. Yeah. Now tell me nobody knows the difference between mean and median learners. Mean is the average, median is the center. Right? So where is the what is the advantage of median or mean? Which one has more advantage? Can you tell me advantage? Median is the most of the data concentrated not affected by extreme values. Okay. Mean is the average of the numbers affected by extreme values. Good man. So which one has more advantage? For example, if you are, you know, students of a class and I want to take the average of their heights, which one would be the better method? A mean or a median? Both are true values. Mean man, I don't understand the term true value. Both are true values. So I'm saying if you have to take the average of height of the students, which one would be the better uh method? Mean or median? Let's say mean most of the data surrounds. Why? So, Dana mean why mean would be better? Then if if I want to calculate mean, I want to know the height of each and every student. Will I measure the height of each and every student and then calculate it? Well, that's possible. But why to do that? I can arrange them in ascending order or descending order and then pick the uh one in the center. Right. Agreed or not? Hi. Are you getting my point? Okay, median represents of what most of the population is. What I'm trying to say is that if in a student of class a class I want to find out the average of the height mean is better I want to know the I need to know the height of each and every individual of the student. Some student might not know the height exact height. So do I need to measure measuring with the tape? That's also possible. But with median the situation the posi the uh you know calculating the average would become easier that I would arrange them in ascending or descending order and take the value in the center getting my point with one person's height the height of the others can be estimated see idea what my point had Okay, that would represent the average. That's the beauty of median. That would represent the average. Median are nothing but the center value. If they are 11 students, the sixth position or the fifth position will represent the height of the all of the students. Right? That's your notion. Statistically, it is correct. Getting my point or still not Yeah. So please be clear with statistically one way to calculate average is that we take the sum of all the values and divide it by the total number of errors and other is through median. We arrange the data in ascending and descending order and then pick the center value. Clear? Is the difference between mean and median getting clear to everybody? Yeah. Then we understand STD that is standard deviation. Very very important. Sometimes it is important to understand that how far are we away from the average value that is also equally important to understand. percentile is that if you if the whole data is divided as 100% and if I want to know the 25% of the data that helps me to do that 30% of the data 70% of data 65% of the data so it helps to return the nth percentile of the element in an array minimum it returns the minimum element of the array max returns the maximum element of the array Hey, getting my point now clear. So what are the different functions available in Python? Please make this point noted that no data science is complete without statistics. So numpy also supports statistical functions. Pi uh pandas also supports statistical functions such as sci stats model and statistics library which also help us to understand so that we will cover up in detail as we move on. Got it? So if this is my a range function right and if I want to calculate the median over here right since there are 11 values over here ranging from 0 to 10 the middle value is again five. So the mean and median could be same sometimes and it could be different also. Now clear standard deviation np. STD is used to calculate the standard deviation. NP.VA is used to calculate the variance of the data. Similarly, even if I have the beauty of all these function is now if the array is onedimensional, two-dimensional or higher dimension, we will need to use just the same function. I will just use NP median. It will give me the center value of all these points. Now, do you see that there is a difference in median and the mean value? Do you see the difference between the median and the mean value? This is the difference that we get right and this is the value of the standard deviation and this is the value of the variance. Getting my point Learners, are you there with me? And calculating percentile. Percentile, if I say 50th percentile, it gives me the same answer as the median. So if you check it over here, so it was a so it was not array. So this is the answer. Clear. Coming on to the string functions in numpy. What does the plus operator mean in strings? Yesers. What does the plus operator mean in string functions? It concatenates them. So will I be able to directly concatenate x + y? If this is my array, please try to understand. Will I be able to concatenate x + y? No, it doesn't allow me to concatenate. U function add does not contain. Okay, so this is so how do we do concatenation of numpy strings? by using np.care dot add function. Clear? We need to use np.care dot add function. Clear. Right? So as I tell you this is not possible over here. Can anybody tell me why this is giving me an error? Both are numerical type. Why x + y is giving me an error? Anybody who can tell me? Yeah, because broadcasting is not possible. Got it? Okay. Now, if I want to replace a old substring with a new one, then I can use np.cad.replace replace that is str hello hello with hi. So I the hello gets replaced with hi. Got it? And of course you the simple function by using the upper the lower uh you know case gets converted into upper using the lower function the upper case gets converted to lower case. simple that much we understand in Python. Okay. So this is the a very important library which decides or which is useful in data analysis. Practically it helps in loading of the data how different analysis can be performed. So after pandas library you will get that confidence. Okay, this is how the realtime data gets loaded, how the cleaning gets done and how the different analysis can be performed on the data. So this is going to be very very useful. Getting my point right. So this is the library which helps in representation of the data in the form of tables. Now when I say in the form of tables it means that the data is generally structured it's represented in terms of rows and column right so rows and column and columns these slides are my slides I will upload it after the session these are my slides I will share it with you okay Right. So these are the column names over here. Got it? Right. So pandas basically consists of two data structures. First is known as the series which is like a onedimensional array and the other one is data frame which is a twodimensional array. Right? Can we say that a table is nothing but a twodimensional array? Can I say a table, Excel file, SQL, CSV files are all there? So the question arises when we already had arrays library where we could uh create n dimensional data what was the need of this pandas library till now I'll tell you the reason till now what kind of indexing was being used till now what kind of indexing was being used positive and negative indexing that was by default part of the uh language but in pandas's library we can define our own label indexing I can give my own indexes so this makes the whole tabular data more approachable and understandable. Getting my point learners? Is this point getting clear to everybody? Right. So the two types of data structures available are pandas series which is nothing but onedimensional array which is nothing but the column of your table and panda's data frame which is nothing but your twodimensional array which is nothing but the table itself. Getting my point? Is this point getting clear to everybody? Right. So the different data structures in pandas are we have the series that is onedimensional that is onedimensional labeled homogeneous array. So the difference comes over here is that they are labeled which we can label them according to our uh way and data frames are nothing but twodimensional tabular structure with potentially heterogeneous type column. Right? So a column will always have a data of same type. For example, name, marks, address, they will all be of the same data type. Whereas a t a table or a tabular data structure can be heterogeneous in nature. Got this point? Is this point getting clear to everybody? Right. So now if we look at the pandas series, what happens is that this column of your data you see that indexing over here is positive indexing. So if I do not give my label my own label, then this will refer to positive indexing. So a panda series is like a column in a table. It is onedimensional array holding the data of any type. Clear? So a series consists of a series name. Its index value if we define the index value it will contain that. If we do not define that yeah it's always and always one column series is always and always onedimensional data. Okay. And if I do not provide any label indexing, it will take the default in positive indexing that is 0 1 2 and of course the value. So now if you look at the code the first thing that we require is to import the pandas library right since this is not part of the basic python import pandas as pd getting my point are you there right then we move ahead with the temperature right this is what is What is temperature and days? What is the data structure of temperature and days? Can you quickly tell me? Yeah, they both are list. And how do I create a series? I will use the function PD dot series. Please remember Python is a case-sensitive language. So capital S is being used and I pass the parameters temperature and index is equal to the days. Index is equal to the days. Right? So my data is equal to temperature. Please look here. So this is my data. Now do you see the alignment output is automatically vertically aligned. Do you see that? That was not in the case of one-dimensional array. It was giving me flat straight line. So it is vertically aligned automatically. It gives me the data type of the data. And now since I have given my index equal to days, these are my index values. Clear how the series are they are and this uh pandas library is built on the uh numpy library but still it is different. It makes the understanding accessibility much more easier. Do you see that Lana? No. Not necessary. Not necessary. Aron, there is no rule for indexing over here. It's not like database indexing. No, there is no rule like that. Clear? Yes or no? Aron Hut, Manzi, Sivagami, Devika, everybody is getting this point. Can I move ahead? Right. So again, if there are two same indexes, which value it will be pick. Now let's try to understand that Arun we'll do it practically what's your question. So if these are my two series over here right since I have not given the index value so by default it will have 0 1 2 3 and four and this is again it will take the first one around we we'll do that we'll do that I'll answer your query 1 2 3 4 All right. And if I do addition of the series, it will automatically add the two. And what is the concept behind arithmetic operations? Same as numpy. What is the concept behind arithmetic operations? What are the two concepts? Quickly tell me learners. What is that known as? Hats see broadcasting. So now both the series are of the same shape. So only vectorzation happens element byelement operation right or implicit running of the for loop only vectorization happens. Getting my point? Is this point getting clear to everybody? So again uh you know this is the opensource for pandas library. All the uh you know libraries that we are doing whole of python is an open-source community. So all the functions everything you will get it online right. So of course there is lot to explore over here. Yeah, please open the link and this is the all the documentation related to the pandas library. Got it? So basically if we talk about pandas the advantages of pandas over numpy is intrinsic data alignment that the data is you know you will see a beautiful output for column it will give you as in vertically align and for tabular data also it gives you in prop pro in a form of table. Then it has two data structures. data operations are easy to perform as it is built on the numpy library. So this has again advantages of high performance in memory merging and joining intelligent and automated data alignment tools for reading and writing data. A fast and efficient data wrangling are also part of it. Right? And are we also clear about the two data structures? First is series one-dimensional label direct another one is data frame that is twodimensional structure all right if we can get the learning materials in advance that I will try heads because uh see what material I prepare I keep working on it then that is why I always give it after the session it's a little difficult to give it before the session but I'll try Okay, thank you all so much. Now let's begin with series. A series is a one-dimensional array like object containing data and labels or index. Right? So as we have understood that the specialtity of pandas library is that we can create our own labels, right? series can be created with the help of n dimensional array with the help of dictionary scala value as well as list right so now let's understand how to go about it first I would like everybody to run this import pandas pd is this code is this import uh f you know importing running fine anybody who's getting an error if we get error then we always do pip install pandas but I I think everybody has everybody is running this import pandas as pd without any error quick confirmation from everybody kamill dika great all right so now look at over here what are we doing please try to understand the code. Okay. So if this is my data 112383 and this is days right and I create PD dot series this becomes my data. So sivagami you had this doubt that d type is equal to end 64 end 64 refers to this data type right different data types now again your question was that is this going to be homogeneous or are we are we going to change this let's let's see if I change this to 2.3 do you see it's like arrays it changes into homogeneous all My data type changes to float. Devika, are you getting this? Sorry. Um, Sivagami, are you getting this point? Right. Now I am passing this as my data. Index is equal to days. So here I am able to get Monday this was the value, Tuesday this, Wednesday this, and Thursday this. What happens if I do not pass my index values? Okay, if I do not pass my index values, then by default it will uh take it as positive index value. By default, it will take it as positive index value. 0 1 2 3. getting my point. Okay. And I if I interchange the two, the interchange can also be done. There is no restriction on the interchange. Right. So this is the data type object of this questions any doubt earlier and again if I add any uh string value all my uh values in the uh series change output It's a onedimensional array but all of them are vertically aligned along with their index value, their data value and their data type. Do you understand the beauty of pandas library? Yeah. Because in pandas we do not call this as string. We call this as object data type. The string data type is known as object data type. Fine. Headset. Yeah. Okay. Right. And if I want to change the specific index value, then I can create indexes which can be heterogeneous in nature. And to be noted over here is that data can be homogeneous but the index value can be heterogeneous. If we have two strings in the data also it will show the yes yes yes. If one of them see they are three string type all of them have become the object data type sivagami. Do you see this? learners. Are you all understanding this? Now, as we have understood, there are two types of indexing available in the pandas library. One is integer or positionbased indexing. That is the normal one. We use positive or negative indexing. Another one is labelbased indexing. Does the index? No. That's what I'm trying to explain. It is not necessary that it needs to be unique. So if I change this 5.7 to 100 again, let's see. All right. So again getting to this point that series with index. So another important point to be understood over here is that it is not necessary have to uh you know access it with these values only. If I'm not accessing, I can use the default indexing such as 0 1 2 3 also to access the value. So what I what I mean by that is see one way is I pass the value 100 whatever uh okay uh I'll just show it to you. See one way is I pass the value 100 right and the other way to achieve it is if I give the value as zero give me an error but if I pass this IL parameter that I'm specifying it that now I am using inte integer base indexing. So then I get the first value. Now clear both ways you can use both if you want to use the default indexing then or integer base indexing needs to be passed. Clear? right? So here if I say I Lo0 that I'm accessing the zero element but if I want to use the label index then I can pass it as L O C also label based indexing that is B and of course negative indexing is also part of it. This is minus1 -2 -3 -4 and here we have -5. So min -5 position is at 10. Clear? Integer location is ILC. Label based is LOC and integer base can be both positive as well as negative indexing. Even if we have float value, it will work. Let's make it let me change it to 3.45. Okay. So now if I give this as 3.45 I get the value as 30.96. Got it? It will work. It will work. No restrictions on that. Had do you see this with ILOC? No, not with ILOC. ILOC is only positive and negative indexing. Only with LC. It will give me the same answer if I do LOC or over here. Not with ILO. The moment I give it as LOC, it will give me an error. Got it. Yeah. So now when I want it for a particular index b it is cat and for 100 it gives me both the values. Is slicing also covered? Is slicing also a part of it? Yes, slicing is also a part of it. But there is a difference attached over here. Please look here learners. Right? If this is my series in text and I've given 2 2 4 what does it mean? It is like 0 1 2 3 and 4. So let me change this. Let me give it to one. Yeah, please look here learners that there is an error. Okay. Now look at over here. What is happening? Please look here. Siva, are you getting the code? Now I'm running each and every code. Okay. So if this is my series are you getting the iss that this is integer base 0 1 2 3 and four it starts from first position. Fourth is not included. It starts from B and fourth is not included. So it gives me B 3.53 outputs. Clear? But when I give B2D in label indexing the last part is included. It is inclusive. This is the difference in the slicing. Please be clear. This is exclusive in ILOC but this is inclusive in LC. Getting my point. Now creating panda series from dictionary. Let's try to understand and create series from the dictionary. So what is this? What is a dictionary consist of? A dictionary consists of key value pairs. Right? Very good. So the keys automatically become my index. The keys will automatically become my index and the values will become my data. Again they can be of heterogeneous type you know because dictionary is a heterogeneous type of data structure. Clear? Right. So we can create series with the help of dictionaries also which is very simple. The keys become my index value and data becomes my data. Got my point? Is this point getting clear to everybody? use I no LOC and ILOC now it becomes series we can use it with series only okay man don't get confused now there is no concept of dictionary then the concept we have converted the dictionary into series okay are we good to go we move ahead all right very very important some basic um functions that we need to understand in pandas series function. The first function is the head function. What is this head function? Head will always return the first n rows. If I give three, I get the first n rows of my series. If I do not pass any parameter by default it will give five. Right? By default it will give five. And what will give what what does the tail function do? Tail will return the last n rows. If I have passed it passed the value as two, I get the last two values. Got it? And can you tell me what will be the dimension of series? What will be the dimension of the series? always and always one dimension. If the dimension is not one obviously it is not a series. Got it? When we talk about the shape since it is onedimensional it will only give me the number of columns. Right? Shape axis dimensions are all related with each other. Right? Another very very important function is the describe function which is part of heart of any pandas library because that gives the tabular descriptive statistics. So as I've telling you without statistics no data analysis is complete. So describe function tells me the count, the number of values, mean, the standard deviation, the minimum, the first quartile, the uh second quartile median and the third quartile values along with maximum value. So that is the beauty of this describe function which gives these all summary values. Got my point? Is this point getting clear to everybody? All right. So what are we doing over here? We are again creating a series function and trying to find out the unique value. So what does the unique value of series function do? It tells me the unique values in the data. So I have 1 2 3 5. So there is like you can use set to find out unique values but there is no set as such. So the unique function tells me the unique values in the data. Clear? And n unique gives me the number of unique values. n unique gives me the number of unique values. Siv are you able to run the code. Are we clear till here? Are we all understanding series or anybody is facing difficulty in understanding the concept running the code? Please let me know. Learners, please let me know. No, no response. Are you there in the session or no? Yeah. Now let's understand the different operations and transformations that can be performed. Right? So now we are clear. This is my data. I convert it into series. These are my index value. I create series value. And now I add the two. So let me you know just just cut paste it to other things. So this is my series. Have I passed any index value to do to this series? No. So by default the indexing is 0 1 2 3. Getting my point learners? So by default the indexing is 0 1 2 3 4. And now if I do it with index, what was the output? This is with a b c d. Do you see the difference between the two series learners? I've just separated the code. So I just keep doing these little little changes. So this is normal series PD do. Data this series with index. getting my point? Yes or no? Right. So if we do element Y series addition how since they are two different values of the indexes. Now do you see the indexes are not the indexes matching? No the indexes are not matching. So when I want to do the addition I get all the null values. So none of them get added. Are you understanding the output? So while doing arithmetic operations in series index matching only indexes match then the arithmetic operations can be done. Got my point? For example, for example, I give the same data and I create this series one. All right. Since the indexes are matching, can I now do series plus series 1? Yes, then I'm not getting null value since the indexes match. Now, for example, some indexes don't match, some match, then they will get added, others will not get added. Let me show it to you. Let's keep it as the basic one here. I'll add my own index. 1 no, let's keep it as zero. then 10 2 then 30 and then 4. Okay. Now let's see the how it is working. Here I've not made any changes. Index values are 0 1 2 3 4. Here I've only the matching ones are 0 2 and four. Others are not matching. So this is my new series. And now if I add them now look at the difference. Now you'll understand that for zero it gets added one is not matching therefore it will be nan two matches three it will be nan four matches and 10 and 30 will also be nan what does n stand in python not a number what does n stand in python not a number now Clear? Have you understood how the arithmetic operation is carried out in series? Yes. Are we good to go now? Have you understood the differences? Yeah. You can have this small code. Of course, you can have that. Yeah. Take this first code. Paste it. Take the another one. Take it. And then we go ahead with series and series one. Yeah. Yes learners. So now when the indexing is not matching that is why the output is null. Now are you able to interpret this output? Why all the values are nan? n stands for not a number. Got my point? There is another function available in the pandas library that is is null which helps us to check the missing values. So n stands for missing values or null values and that is why I get all the output as true. Now you see the data type has changed to boolean automatically. We don't have to do anything. Automatically it will change the data type and give the output. Getting my point. And exactly the same way we use the is null. There is this function as is na. But both mean the same thing. Both are checking the null value. NA also stands from the same function. It does exactly the same thing. Got my point? And how do we deal with null values? When we check null values, then we can use fill na. So wherever there is na or null value, automatically the values get filled with 100. You can change any values. It's completely up to you. You want to change it uh maybe to 50. That's also possible. Got my point learners? Evagami are you able to run the code? Others Kamal Dika good good Dika good sivagami the concept of series is very important to understand the huge errors in Python throws when there is an error so hopefully don't scare me as it is good good man can we do exclusion for NA what do we mean by that hat what do we mean by exclusion for NA what do you mean by that elaborate your point. Yeah. Yes. Learners, do we understand lambda function in Python? Quickly let me know what do we mean by lambda function in Python function with no name and you can pass and single line expressions. You can always say single line expression. So what are we doing? We are passing one argument and multiplying it by two. So s do map is that it will take each element from this particular series right and then multiply it by two. So 1 multiplied by two 2 * 2 3 and four. So you get this output. But can we directly also do this s into two? Can I do a direct multiplication? Yes, I can do that. And how is it possible? How is it possible? What is the concept that is running behind? Can anybody tell me? Yeah. Broadcasting also because the scalar value will be broadcasted and then vectorization. Good man. Good. had say clear. Is this point getting clear to everybody? Are you there with me to creating a text series and a dictionary one and then we are trying to map the dictionary. So what are we trying to do? Let's understand this. print s underscore text and then we are doing print mapping underscore depth So now do you see this? In the first case I've created a string series and this is my dictionary. Do you see the outputs learners? I've just printed the output so that we can understand it better. Right? And now we are taking this apple to red, banana as yellow and orange as orange color. So automatically it maps them and my new dictionary is mapped like this. Got it? So how are we able to do it? We are able to do it with the map function. So map function matches or maps each element of the dictionary with whatever parameter has been passed. Here we are trying to match with the string values automatically it does that I'm not doing any effort and here we are trying to multiply each element with the two. Got it learners. All right. All right. So basically you can there are other functions and transformations also available on the series function. For example, we have this apply function that is we are going to add each element. Every element gets added with two value. Then we can map onto the dictionary. We have we have just seen. We can also sort the values uh output with the help of sort values and then check the null values also. And what is fill now? If there are any null values, we can fill it up with zero. Are you understanding all these functions that help in transformation of data? Everybody is able to run these functions. Nice. So this is my series and this is the other series and when I add and map them this is how I get the different output. So wherever the values are true they they are filled with zero. Do you see this? Would we filled with Yes, we can do that instead of zero necessary for data. Yes. Yes, we all and practically we do that. Many absolutely correct and practically we do that uh so that it is easier to fill it with mean or median values. Yes, practically we definitely do that. Absolutely correct. All right. Good. Yeah. So moving forward, can we query a series? Selecting and filtering data based on specific condition is an essential aspect of quering panda series. So how do we go about it? Can you tell me how can we create series with the help of dictionary? What happens to my key values? My question is how can we create series with the help of dictionaries? What happens to the key? No. Wrong. What happens to the key value in dictionary when it is converted to a series? Absolute it becomes the index value. Right? And my data becomes my data. Please be clear with this particular concept. Got its key value will become the index value of the series function. So how do we go about it? Again simply by using the logic of series that series we want to all the greater than 30. So those values get printed values which are equal to 20 get printed and in this case the index is unique. Yeah. In this case it could be unique, it could be non-unique values. It's not giving any error. It's completely up to you. In this case it is unique. Yes. Okay. Then the values which are not equal to 40 then you can also do multiple condition check using the and operator. Then you can also check is in function whether that particular value is in the list or not. Then we have the loc that is a c and e. And finally series i lo function also. Got it? So now you can check out the output. Please try to understand the output. Learners, are you understanding all these conditions? Hidika, Sivagami, Kamal, come on. Other learners can also respond. Akel, Arun, Kiruba, Maduprata, Maliputi, Nikita, Nab, MKkesh, Ud, what about you? What about you? Deepak, you've not been responding today. Suda, Tlapati, yeah, are you able to interpret the code now? Try to understand the code on your own. Run it. one by one there is no ACE. So series loc AC it's not able to extract because it doesn't have that it has 0 1 2 3 4. So that is the issue with LOC. Do you see series LOC? There is no A C E values right. So you can u comment this. We have already understood. Since it is not there, it will give you an error. Comment that particular line. Query based on index labels selected by index. Got it? Put a Yeah. Now run it. And the rest we can do is that if we can create a series of string type. Yes, we can create a series of string type by default index. And then we can also find out the values with start with BA. So banana starts with BA rather than BA we can say it with B. All right. So if we talk about descriptive statistics, do we understand mean, standard deviation, minimum, maximum, Q1, Q2? Let's understand it quickly. Mean, we understand. It's a sum of all values divided. And somewhere down the line, sometimes median is more useful than mean. Let me be clear. Let me explain you the difference between mean, median, and mode. You might have heard about it. Mode is generally used for categorical value. For example, I want to you know find out which of the t-shirts sell more. Is it the Excel medium large size that sell more? So will meaning mean be useful to me? No, there is no value that I can calculate. Median will also not be useful. Mode is the maximum number of sizes of the shirts that get sell. Right? Mostly it's going to be large size or medium or excel. So that is going to tell me the frequency the maximum frequency of the item is known as mode. All right. So the there are different ways to calculate the average. It is not always we do it through mean. Mean is only useful when we know the values. Right? But it is most affected by outliers. If I don't know the value then mean is not used. Median is when when we want to find out the positional mean as as I gave you the example that if there are you know 50 students in a class or 100 or 500 students in a class we will not measure each and everybody's right. We'll arrange them in ascending or descending order and pick the middle ones. Got it? So these are the different ways in which statistics helps us in the analysis. For skewed distribution, median is a better choice because medians are not affected by outliers. Mean does, right? Standard deviation tells me that how far we are away from the actual mean. Quartiles tell me that as I've told you they divide the data into 100% and how much we can divide them into four parts right and are we clear about describe function if we have numerical value 2 3 4. So count gives me the number of values. Mean gives me the average of this. Standard deviation. How far are we away from the mean? Minimum maximum and the quartile values. Got it. Is this describe function clear to everybody? Right. And if my data is objective type or categorical, count gives me the number of categories. Unique gives me there are three unique letters P, Q and R. Top value over here is P and how many times it is getting repeated? It is getting repeated two times right and the biggest power that pandas library has that it helps in representation of the data in the form of tables. Getting my point? It helps in representation of the data in the form of tables. Right? So table means a table has rows and column. Every row has its index value. If I'm not defining label to it, by default it will use positive indexing. And every column will also have its name. Getting my point? Is this point getting clear to everybody? So the two data structures are series and data frame. Series refer to onedimensional labeled homogeneous array which are generally mutable. So series refer to one column of the table and data frame refers to the twodimensional you know tabular data structure that is potentially heterogeneously typed right so we understood panda series is like a column and the two components of any uh series that we create is its values what the data we are passing and is its index value. If I do not explicitly pass the index value, it will contain positive index labels. Getting my point right? So this is what we had uh you know seen that temperature and days are my two list and to create series I use PD dots series function. Right? This is temperature and this is equal to index is equal to the days. Right? So now we can convert the temperature into days. Getting my point? Another beautiful point about pandas library is that it gives the data in vertical alignment tabular form along with the data type of the data. Getting my point? These are the values. So it gets gives that type and whereas index value can also be of any type. It not it it not it it is not necessary that it has to be of end float or string data type. Can I interchange these two list? Can I make the data as the days and the index as temperature? Can I do that? Yes, we can do that. That is allowed. Everybody's getting this point. Now this pandas library is built on numpy right the pandas library is built on numpy that means what are the two main concepts of arithmetic operations in numpy are the two main concepts of arithmetic operations in numpy Okay. Broadcasting and vectorization. Excellent. Devika. What is broadcasting? Broadcasting is making the arrays of the same shape and size. And vectorzation is element byelement operation. So what is vectorzation? Here vectorization will happen. Since they have the same index value, the elements get added together to get this output. And similarly subtraction. Do you see the vertical alignment? The data type along with the labels. This is how this series is different from one-dimensional array. Getting my point? Is this point getting clear to everybody? We are doing revision. We also understand statistics also plays a very very important role. So we have Panda's DF dot mean function standard deviation as STD. Do you remember this point? Then we have the min. The max function using the quantiles. We can calculate the quartiles function. Clear? Are you all getting this point? So do we understand? What does the describe function tell us? Tell me what does the describe function tell us? It gives the statistical summary of numerical values, the count, the mean, the standard deviation, minimum, minimum value, Q1, Q2, Q3 and maximum. And if the data set is categorical or string values then it gives the count the unique values pqr the topmost value and the frequency of that particular variable. Now let's start tell me you know till what we had done. So data frame is a twodimensional data structure that is the data is aligned in the tabular fashion right. So the features of data frame are potentially columns are of different data types, size is mutable, labeled axis rows and column and it can perform arithmetic operations on the data frame. Got it? Yeah. Yeah, I remember Tlapati. Yeah, I remember you in the last session. Let's say also I remember now we've done this right. So axis equal to0 refers to the df.t index value that's the you know the labels for the index right and then we have uh access equal to one right this is the columns that we have right the column and to get the value we get df dot values and I think so somebody from you only told me that you were unable to see the data ties from here let's say I think so it was you right now are you able to see the screen yeah now I remember correctly Okay. Okay. Yeah. Did we work on how to load the CSV file? Did we do this also? You remember that? Good. So data frames can also be created when we load the readers CSV file from music dot CSV and we get it a form of data frame. Another way to create data frame is using the data structure dictionary. The function that is used is PD dot dataf frame and we pass the value data. So the keys automatically become the heading of the data and the list becomes its value. Clear? Getting my point? Is this point getting clear to everybody? Right? Did we also understand the indexing that this is the the two types of indexing available in pandas's library? One is known as the label indexing. You remember that the LOC did we do this also? And the other one is integer or position based is known as dot I c. You remember this point right? So if you want to access the labels the column values are employee ID, scale, age, pay and name. Right? And the row index is this. But can I access them through the default values? Yes, I can also access them through integer based positive indexing also. That is also possible. Clear? So the two types of indexing is label and iloc. For label we pass the values as zero like uh you know integer values here. List do you understand list index want to access more than one column or rows then I can pass them as the list. But the difference in indexing comes only in the case of slicing. What is the difference that if I start from zero column it will also include the last part if I'm using label indexing but for integer based it is same as normal indexing that we have been doing that only the zero and the one column is extracted. Got it lanas to make the point more clear this is list indexing. So if I pass employee id and skill so only the column employee ID and skill get extracted. Clear? Getting this point right then if I say df. A L O C0. Then if I talk about the first row over here, what is the like value which gets extra extracted? The first row, the index based of the first row gets extracted. Clear? Why the row? Because no comma is given over here. So zero is going to be considered as the row. This is again list indexing. Here we are talking about the zero as well as the first row. Got it learners? Now look at over here. This is label based indexing. This refers to the rows and this refers to the column. What is this concept known as? The one with two colons. What is this known as? Striding. Very good. So we start with zero and then we jump to values. We start with the zero column and then we jump to values. Clear? Is this point getting clear to everyone? Over here if I say employee ID till age. So employee ID is also included. Scale is also included and even the last parameter stop parameter age is also included. So this these value get extracted from here. And did we also understand the in place parameter? We also understand the in place parameter that whenever this parameter is true that means the changes in the original data frame are seen and when I give to false that means no changes are being shown. Getting my point? Yes. Learners, are you there with me? All right. Did we do this correlation part? I think so. We were we have not done this correlation. We stopped here. Did we do this or no? Not yet. Okay. Okay. So, now let's uh get back to the code. Everybody has this uh file. Everybody has 4.02. Are you ready with 4.02 learners? Yes, learners. Everybody is ready with this 4.02. 02. So now we are clear with the code part of it. So what are the different ways from which data frames can be created? Yes learners tell me what are the different ways from which data frame can be created. First is using a dictionary. The keys of the data frame become the column. First is dictionary. Second one we understood from list of list. Have we done it? Have we run all this code? Yes or no? List of list. Twodimensional arrays. That's the third point. Then we loaded through the CSV file. Then we loaded the Excel file. Yes, Dika. Let's say Arun, did we do do all these? Then how can we access the data frame column? We did the beginning only, not the XLS. Is it that we did not do the XLS and the CSV? We have to from here. Okay. We did not do the Excel. Okay. Okay. So, I'm sharing this data set. Please download this Excel file and be ready. Yeah. Please download this Excel file and put it in the same folder as these files. Iris.xls and load the data. Yes, learners. Everybody do it and if anybody is not able to do it please let me know Navdin Matu yes lers are you there with me others I'm still waiting for response Please download the file and copy it in the same folder where you have copied this pandas and then load the run the code. It will run fine. Anybody who's getting the error, can I move forward? Let's say Aron can we move forward? Can I get a quick thumbs up from everybody? So panda's data frame involves employing various methods for selecting and retrieving data whether it be specific column rows or individual cells. No there is no library there is no library to to be installed. No, no, no. You don't need to. You simply do PD. Excel. Let's say got my point. Which is the library that you in installed? Tell me quickly. No, not required. Not at all. Not required. Is it not running for you normally? Did you install this library? H say, can I unmute you quickly? >> Yes. unmute. >> Hello. Can you hear me? >> Yeah, I can. >> Yeah. Uh, so it asked me to install this uh stuff. Then >> numpy.xls not required at all. We are using pandas over here, not the numpy. >> Okay. Uh, so I think it is the p or empire. It just it just pop in here and then I did install it. It says >> uh let me put on the chat. >> Yeah, we did. Uh >> or you want to share your screen? I we can do that also. You want to share your screen and show us? >> Uh sure. Let me do that. >> Yeah, please go ahead. Um, I'm sharing, right? >> Yeah. >> Can you see my screen? >> Yeah. Just zoom out. >> Uh, okay. So, you see this one here? It just ask. >> No, you don't need to zoom out. It's so small. I can't even see read it. And it's Yeah, the screen is so black. >> Okay. I cannot zoom it. This is a just >> copy, paste it, and put it on the chat. Okay. Uh I'm going to copy and paste in the chat. So conditions that analysts can navigate and extract the necessary information from the data frame for further analysis and manipulation. So if we look at over here, look at this particular code. We are converting a data frame from the dictionaries of list right and if I want to access a single column how can I do that? Simply by giving indexing simple column name. But if I want to access multiple columns I have to pass it as list index. This point is also clear. Are you all getting this particular point learners? Yes, learners. Can I get a quick confirmation? Are you all there in the session or not? Or are you all sleeping ahead? I I told you this point say that this this single bracket is for the indexing. This is for indexing and the other bracket is used to pass it as a list for multiple items. All right, got it now. This concept is known as list indexing. Right? Now if you look at dfilo0 then you can extract the first row and see these items right the elements of the first row. Got it learners? Please run the code and if you're facing any difficulty please let me know right. So now if we have DF columns greater than 10 right if I want to extract certain values based on conditions then it can be done that only the values which have only the rows or that have column name values greater than 10 are extracted. So you can access rows based on condition. So single columns, multiple columns, access columns based on condition. Are you understanding the different syntax? If I want to access a single cell by label, then the other function can be used is add function, right? by label or you can also use loc mean the same thing loc or 80 mean the same thing this refers you to your zero row and the column underscore name clear so just to make things easier I'm just separating out the rows Got it. Hats. Soda yog integer at 0 0. Right. So what is the value at 0 0? We can also access it via the positional values. Are you getting it or not? Please look here learners. The the label indexing is column name column one column 2 another column. And if I talk about index, this is 0 0 0. And the label index is also 012. And the default index is also 012. So when I do the DF dot LC 0 comma 0 what will be the output? I'm talking about the zeroth row and the zero column. So the answer is five. Now clear right now getting my point everybody. Can I use conditional statement also that all the column names value greater than 10? So conditional access for the value of another column. So here we are accessing with column name greater than 15. But I want to extract the uh value of another column. Got it? Is my point getting clear to everybody? Yeah. Then we also talked about the basic uh you know uh functions available in Pandas's library. Do we understand the basic functions? Head function. Do we understand? Tell me learners, do we understand the head function? Yes, it refers to the top five values. Tail, the last five values, right? Info gives the summary of the data types and non-null values. Describe gives me the descriptive statistics. Summary of the statistics. Very good. shape gives me the row and column of the tables. Column returns the column labels of my data. Then we have understood label and integer based indexing for accessing of elements. Sort value sorts the values. Then we will also understand group by groups the data frame based on one or more columns. that we will understand as we move along. Getting my point? Apply, merge, plot and drop. Clear? So now let's take this forward. Import pandas as PD. Again I've created my data frame. So now if I explicitly pass the parameter head over here. Yes, please look here learners. If I explicitly pass the parameters head over here, then it will only give me the first two head values. If I pass tail over here, it will give me the last value only. Data frame summary gives me info and shape is the number of rows and column. Everybody is able to understand the code and run the code. Yes, learners. Are you able to understand the code and run the code? Clear. We also need to discuss the statistical operation. Statistics is the heart of data science. I've been saying it from day one, right? Without statistical operations, you cannot get deeper insights into the data. So what are the different statistical operation that we can do? So one of the favorite function is describe. We all understand describe. We have understood it pretty well. Calculate the mean, the median and the standards. Right. The same functions were available in the numpy library also using np dot mean, npmedian and np.t standard deviation. Yes, learners. Can you tell me the difference between mean and median? Can you tell me the difference between mean and median? Exactly. Exactly. Absolutely correct. Mean is the average value. Median is the middle value of the data. Are you getting this point? And look at the beauty. If I want to use the data frame dot mean, it automatically gives for each of the columns. Do you see that? And if I want to see it for rows, then I can pass it for you know maybe access equal to 1. Yeah, I can show for each row again. See, by default it is access equal to zero. There will be no change in the output. Look at over here equal to zero means I'll get it for each columns. Yeah. And the moment I change the axis, I get it for that. Now clear say yogesh yeah another important you know kind of analysis is correlation analysis and this correlation analysis is completely based on coefficient Pearson's coefficient of correlation Yes, learners. Are you getting this point? Okay. So, the car's pearson's coefficient of correlation will have values ranging from minus1 to +1. Okay. And here this is the formula that we are supposed to understand. Please try to understand. And we need to understand the mathematical intuition, not the formula behind it. Okay, sorry. We need to understand just the formula behind it, not the mathematics part of it. I'm really sorry. So these are the value we take it the difference from the average then yi take the difference from the yi right and then we are able to do that. Right? So if you look at this, what does + one mean? + one means perfect positive correlation. What does it mean? That when X increases, there is a direct relationship between X and Y or a proportional direct proportional relationship. X increases, Y also increases. decreases, y also decreases. Right? What is negative correlation? That they share an inverse relationship that if x increases, y decreases or if x decreases, y increases. Getting my point? Are you all getting this point or not? Yes. Learners, are you there? And when r is equal to zero that means there is no relationship between x and y. So the stronger the value closer to one more positive relationship and the value more closer to the negative one the more stronger the negative relationship. Getting my point learners right? And which is the mathematic which is the python function which helps us to find correlation. It is df.core function. Which is the function? It is the df.core function. So if these this is my data frame with maths physics and history marks. Right? And when I run a df core it will always and always create a square matrix n by n. Why? Because here there are three columns, it will also create a 3x3 matrix. The diagonal elements will always be one. Please be clear. I have a very strong relationship, a perfect relationship with myself, a strong relationship of maths, strong relationship of physics with physics, history with history. But there is more valuable information added over here that maths and physics share a very strong positive relationship whereas maths and history and history and physics share share a negative relationship. Got my point? Is this point getting clear to everyone? Yes, learners. Are you all getting this point? Right. This point is clear. No, we are not going to discuss regression. That's part of machine learning. That's part of machine learning. I'd say we will not discuss that. So getting back to the code, how do we go about it? Here I've created my own data frame and when I run df.core, this is the you know output of the correlation matrix. Getting my point learners I think. So I've made few changes in the data. So here you can see that again diagon elements are all one posit negative relationship between column 1 to column 2 and positive relationship between column one and column 3 right and a perfect negative relationship between column 3 and two. So it is a negative perfect relationship. Yeah, I've made a few changes in the data values. You know, you can check it out and you can get this output. Clear. Clear to everybody. Right. Then there is this another important function value underscore counts that the function tallies the occurrences of unique values. It is very very useful for categorical data. Very very useful for categorical data. How do we go about it? that if this is my category A B C D, it gives me the category along with the number of items repeated. Clear? Yes. Learners, are you all getting this point? Right. So, if this is the region, right? Right. So the how many categories are there? East, north, south, west. And then it gives me the number of time it has been repeated. Clear. Value counts function is also clear. Can we begin with the next uh topic? Are we all good to go? Ruby, Mutu, Deepak, Navd, who all are that this is the first session. Who all are the ones? 2B is there and I think so it was Yogesh also. So I would just uh you know suggest you that please go through the recordings and if you feel any difficulty then you can get back to me. Okay, Ruby you understood then till now that's great can we move forward then Ruby let's say can I can we can I move with now 4.03 03 learners. Okay, thank you. Thank you so much. What about others? Nav deep Yogesh but you're catching up man. Okay, that's that's okay. Okay, great. Now we need to understand the most important aspect of real time data is the date time module. If we talk about real time data, every data will contain date or time as one of its column. Right? Whether it's aviation industry, what time did the flight depart? What time did the flight come? Whether it's retail industry, at what time did you buy those grocery item, whether it's our session, how long the session is going from this time to that time. So everywhere date and time logs are created. Agreed? Agreed learners or not? Right. and pandas library you know the datetime module is very general to pi python but pandas you know especially pandas library because here we deal with date time uh data a lot therefore it is our duty to convert the data into its special data type we don't want it to keep it as object data type so whenever any CSV file or an excel file is loaded Right. The by default the date type the data type of date column is always object data type. Clear? Right. It is always of the object data type. Right? And it is our duty to convert it into date time data. What is the advantage? What is the advantage? If I want to know what will what will be the time after 1 week 3 hours after 1 year that is where the date time module will help me to access that. Clear? Are you getting this point or not learners? So in Python there is this class date time right which has been created which has different modules in it. Date gives all the access to the date. Time class gives all the access to time. Date time is a combination of date and time. Time delta is the class which helps us to perform arithmetic operations on date time object. Please be clear with this particular point that it is this time delta function which allows us to do mathematical operations on the date time object and then we have the tz info which is known as the time zone info clear yes there is a specific library for it I'm just telling you that we have this date time that's the module We will import the date time module. Then we will specifically load the date uh class and then find out the today function. All right. Right. So the creation of datetime objects can be through date or datetime class getting current date or time conversion of object data type pandas date range function or the string parse time. We can convert a string to a date time object also. Okay. So how do we go about it? Let's get back to the file. Okay. Now please write today's date. Today's date is 73 2026. All right. And if I print this date, it gives me the value minus 22 and it says it is of class int. Right? So am I able to add any value? If I want tomorrow's date, it does not give me the correct answer. And even if I put it in strings, then then it does not even allow me to add the value. Do you see this? By default there is no date time object. How can you create datetime object? Datetime object can be created by importing the datetime library as DT. Please try to understand right. Please try to understand by importing the date time library as DT. Got it? H say now Devika Navdeep Yogesh Mani Ruby. And now if I print it, it prints me in the format first year. year values come that the year is 2026, month is 03 and the date is 07. And this belongs to it's its type is date time object. Now clear and similarly I can do a date time now which gives me the combination of date and the present time. Here I'm waiting for your response. I don't want to be in a hurry. Learners, are you there with me? Everybody is able to run this code or not. Everybody's understanding this. Great. Great. But we understand the beauty of pandas that now if I want to No, I did not change any format. That's this is the by default format. By default the daytime objects are stored as year, month and day. Okay, I not changed any format anything or anywhere. You want me to change one? Okay. So now if I want to change it it's its format right? No, I want to change the format of the date time object. Right? This is what you want me to change today. Right? So I will use DT dot you can write down learners with me. DT dot strp time right then which is the date I want to change today. Right? In which format I want to change? I want to change the format as percentage dash percentage M percentage capital Y. Okay. that I'm using the function correct stp time. Okay, it is a string that I can change. And if I want to format it. Yeah, sorry. We will not use strp time. I'm so sorry. That is only used for the string. We will use strf time. str takes at most one argument. Okay. Yes, this is how we will do it. Sorry for that. Sorry for that. Got it now. Sorry for that. Yeah, I'm just copying the code now. You wanted the month first. Yeah, now better. Can I change it to different styles also? Yes, I can do that. Let's change it to small Y. And let me put this as B. Yeah. Look at the different formats. Now it's March 7, 2026. Clear? Are you all getting this point? How can we change the format? So the small M I'll just show you all the points. And this is my capital Y. Now clear everybody is able to run the code along with me. Okay. So if I get back to the PPT I think so the PPT will give you more clarity. What is the issue? Who says I have issue? Let's say what is the issue? Would this format of the date which is into string be used for time series? Yes. Yes. Exactly. That's the idea. Not taking percentage t. Share your screen quickly. Let me check. It does it for all the values now. Clear. Many. Yeah. And your question had this this point is clear. Yeah. Yeah. Coming to your question, does it sort automatically? So let's let's put this date as 2026. Um let's put it as 10th month. This is 23 and this is 23. Okay. So let me see that this is the data frame, right? So DF let me let me do the sorting for you. No, it will not sort it at the moment. It will not sort it at the moment. Please try to understand now. Please understand. Let me explain the concept and then I'll show you the sorting. First try to understand that when I am creating this data frame what is the data type of this date by default it is of an object date type agreed let's say so even if I want to do sorting of this u okay let's it considers this as string yes okay let me try it The function is index. No, we want sort values. It sorts. Is it sorting? Yes, it is. Got it. Yeah. Now from this column is it possible to fetch just the month? Yes. Even if it is a string type. No. It's difficult for a string we to extract. But now then accessing then why do we why do we want date and time column? So now the beauty is man that now we will try to convert it into date time object. Now you see by using PD.2 datetime I am able to create this new column of datetime object. Clear? And then we have this DT accessor. Now suppose if I want to access it from the date. Date is of string type. Does it give error to me? Yes, it will give me error. Right? because date is of string type or object type. But the moment I convert it into date time object then by using this DT accessor I can extract the day month and the year now clear it does not give me any error. Now suppose you give any wrong value over here. For example, let's understand. Suppose I give this month as 13. Okay. Does it give any error over here? No. Why? Because it is a it is a string. It does not make any difference to it. But the moment I convert it into date, it will give me an error. That's the beauty of date time object that the month will be in 1 to2 12 pos 1 to 12 values for this particular date. See it gives me exactly where the problem is. So that's the beauty of date time object. That's why the logics are inbuilt right and the logic is so beautiful. The logic is so beautiful that if I give the February month day, this is 28th and if I suppose give it as 29th February, there is there was no 29th February this year, it still gives me an error. Got it? Now that's the beauty of this library and creating date time objects. Strings will not do any kind of verification for us. Got it everybody clear learners are you all getting this point what I'm trying to say that the date time object will also check the month it is so smart that if you give like suppose you give the 10th month suppose you give the month as uh which month Is this March? March has 31 days. Okay. So, let's put it as uh maybe uh April 04 and I pass this as 31. It will still give me error will still give me error. Let me put it over here. See how smart it is. It will still give me error. So it's the fourth month and the date is 31. Got it? Everybody is getting this point. But the moment I give day over here, it gives me the correct answer. Got it? Now man says dates have been very challenging for me which are the most important function we need to keep handy which converts string to data object vice versa then fetching the month or day from the date object I'm telling you that only man first of all conversion of the object to the datetime object can be done right using the pandas library directly using this um conversions can be done using this function for the pandas library please use this function this is the most handy function pd2 date time okay this is the first function and if you want to make changes into the format of date and time then you can use strf time that I explained over here. Okay. Second point is clear. Third point you are saying that you want to access specific month and date out of it. Then once you have created the date time object then use this DTX specifier for day, month and year. Now clear are your three three queries clear? All the three queries clear? Yeah. Now another important thing that I want to tell you is that now suppose you know man you are interested in creating a roster right where you don't want the weekend values to be entered you only want the values to be from Monday to Friday right that can also be there that kind of logic can also be there right so when you say that I only want the days from Monday to Friday that logic you have to implement over here right for example let's let's start with seven because we understand the today's scenario we are on 7th of March today and within 1 week these are the values that will be generated right from 7 to 13th of March agreed right now when I want to extract the weekday value by default called this datetime library will put zero value for Monday. 9th is Monday. This is Tuesday, Wednesday, Thursday, Friday and Saturday, Sunday. It takes five and six value. This point is getting clear. So what logic you will apply? We will apply the logic. Everything it will not do. We will apply the logic that if the division is equal equal to 5 then it is a weekend or it is greater than five then also it is a weekend rest all the values are weak days agreed have you understood this concept yeah that's the logic that they have put in you know you can say that yes Python calendar or value starts from zero Z right and it is not only for this suppose if I give the values for 10 right 10 days right so again for the next Monday the value will be zero for all Mondays the value will be zero got my point and for all weekends the value will be five and six better is this Mine getting clear to everybody. So you will have to put certain logics if the functions are not there. Now clear and another important thing that you need to do is addition and subtraction of the dates. Right? Now suppose you want days after one week. What is the value day before and what is the value day after? Suppose I want to run this. What is the error? It's not day, it's days. Okay. Yeah. So 7th of March, the previous days was 6th and the next day is 8th of March. For 8th it is like and if I change the date now suppose if I change the date so that we understand it better. So we start at 28th of February and then try to find out the values. So do you see 28th of February previous date it is saying 27th February and next date it is automatically telling me 1st of March. I have not put put any logic clear. So this is what you want to play around. Now it's going to be easier once you uh load the data with this datetime uh uh library. Learners, are you all getting this point? So the time delta function in pandas helps us to create duration or differences between the dates or time. So this is the uh you know function or the library which helps us to perform mathematical operations right. So how am I creating a datetime object over here using pd dot date range the start value is 1st of January 2023 with frequency h and these are the values that I get are you all getting this point learners the frequency specifies that now the val 10 values will be created after each power right and the beauty is if I give the frequency as D it will create it on daily basis now if I give the frequency as month end it will create it for 1st of for January February fe March then April May June for 10 months right so everything can be created with this time delta function or the date time library. Clear? Is this point getting clear to everyone? Yeah. So now moving ahead now we want to add 3 days onto the delta. So now when I run this code so 31st of January 3 days becomes 3rd of February. Similarly, 28th of February uh 28th of February adding 3rd of March. Clear? Is this point getting clear to everybody? So, how can you perform arithmetic operations of on the date time object? Time delta objects can be used to perform arithmetic operations on dates. For example, adding time delta to a date results in a new date. Right? So it's not about only days. You can add weeks, days and hours also. So again if my starting date is 31st, you can add the you know the next week along with the number of hours and days. Got it? It will do the mathematics and the logic all logic for you. Got my point? Is this point getting clear to everybody? How do we learn about parameters for the date function? You can check out the online resource uh mani that would be the suggest suggestion date time delta python. Then so this is the link I can share it with you. Got it? So online resources are the best since everything is available online. You can find out what W, what RS, what everything means. Everything can be studied over here. Clear learners. Is this point getting clear to everybody? questions any doubt over here. So this is the link I've added in this particular file. Now clear are you able to make the changes along with me? Everybody is getting this point and then we move on to the resampling of time series. This point is clear how arithmetic operations can be performed. Then there is also known as resampling the time series data. Right man, sometimes you want to increase the value or resample it. So time series often comes with irregular time. Reampling the process of changing the frequency of time series data either sampling or downsampling. So again that can also be you know you can find out how resampling can be done right. So this is how the resampling can be done. That will give you more idea since you work maybe more on the date time. Can you quickly share mani like what is your background where are you using are you working as an analyst at at the moment learners are you understanding this point yes man can I get about that so Here what we are doing we are setting the index with date in place is equal to true that means we'll make the changes in the original data then select the data types include number reample D and find out all the values. So what it is doing let me run it. So these are my values over here. It's not been able to give me the sum of the values. Let me put it to ours only. I'm again changing back the data. Right? So when you take the values, it will give me the sum. You can also take the sum, the average, whatever you want to take. Right? Okay. Okay. Many got it. Clear to everybody. Are we clear with this date time module? The resampling of the data values is suppose you know you have these values you know for the last previous months and you want to calculate all the values you know then you can use resampling to calculate the sum of all the values from all that particular date resample it you know take the summary of the data got it let's say are you getting this point Right. So have we unders understood this date underscore range function which helps us to create date time index object belonging to a particular range. How can we extract the date and time components using the DT accessor accessor where we can extract its month date and time also. And then we have also understood the time delta function which helps in mathematical calculations of the data by adding subtracting the days, weeks or the years. Clear? So now if you want to know how how much time a particular flight took from one place to another then by using time delta you can perform these operations. Clear learners. It's non- numeric. It's qualitative. Very good. What are the two types of categorical data? Tell me what are the two types of categorical data? We have quantitative data as well as quality. Oh, sorry. We have uh ordinal data as well as nominal data. Ordinal as well as nominal data. No. Do we understand ordinal and nominal? Do we understand that man? No. Not qu. Yeah. They categorical data is also known as qualitative and we convert it into binary. But what is ordinal data? Something which is in order. For example, the grades. Exactly. The marking, the ranking, they all are ordinal data, right? And when we talk about nominal, there is no ranking. For example, postal codes or gender. Why is it necessary to analyze categorical data separately in data science or analysis? Tell me why is it important to analyze it separately? Because machine learning algorithms please try to understand because machine learning algorithm do not understand them separately. Okay, you have you they don't understand your eyes colors are red, green or brown or you're male or female. They need to convert it into binary data. Absolutely correct. Absolutely correct. we need to convert it into binary data and there are functions available in Python. I've given you a quick snapshots of that. If the data is nominal, we generally use one hot encoding. In pandas, we use the function get dummies which is very very popular. We use it in lot of machine learning algorithm. And then we use the scikit learn as one hot encoder clear. Then we also use as label encoder that is the pandas cat. That's converting it into categorical values or we can also use the label encoders. So scikitlearn is the library that we use for machine learning. So I've just given you insights about that. But here we will understand Pandas's categorical codes. How does it work? Let's understand if we have a column in our data set with different categories such as apple and orange. So by default this fruit column the data type is object right. But the moment I do the data conversion as type category I get this output. Then the moment I do the conversion as type category we get this output. Clear? So we get two types of objects or categories that is apple and orange. It looks the same but internally it has converted the string data type to categorical data type automatically by detecting. So pandas give us function to handle categorical data which is that function pd do.categorical data. So here I pass the values such as low, medium, high, low and I pass the categories. So it understands that low has value lesser than median and medium has lesser values than high. So low is lesser than medium, medium is lesser than high. Clear? Is this point getting clear to everybody? And how do I calculate the frequencies of different categories? How do I calculate the different frequencies of category? By using the value counts function. Very good man. Let's say are you getting this point? Nav Yoges Pika. Great. And how can I create I can how I can do one not encoding for ordinal and and nominal for the nominal one I can use the pd.get get underscore dummies function for df category pref prefix is category so this is one hot encoding so the output will always be in true and false so since the first value so I've not printed the data frame over here let's print the data frame right? If this is my data frame and this is my categorical value. So since the first column only contains a, right? So the first value will be a. So let's let's print like this. I think so that would be better. Please look here learners. So since a is there so only the first index value for a is true. So it's alignment has shifted. So again let me share the code. This is the category. Now look here. Now since the first uh value is A, therefore the first value is true. All all all other values will be false. For the second row, B is true. All others will be false. Got it? One hot encoding. Get underscore dummies function is clear. And what how can we do label encoding by simply DF do.category changing the category type as categorical codes. And then we get the output as 0 1 2 3 4. Clear? Very good. Many it's generally converting the lower case to upper case, upper case to lower case, stripping, finding the length. So the pandas library also gives me support of this. Right? So if I have created a data frame of all string values or a column consist of string values and if I want to know the length of each of the column then I can use the strl str.length function. Getting my point is this point getting clear to everybody? So if this is my data frame right and I create a new column length just by giving a new value and just passing this strl str.len Len function I give length of each of these word getting my point say are you able to run the code yoga Ahead. What will this str function do? What will this strl lower function do? Yeah, it will convert all the values into lower case. And if I give it to upper, what will happen in the upper Yes, Ruby. Got it. Then there is this function known as contains. What will contains function do? What will the contains function do? Learners, please check the code and let me know that it will if a string contains a particular data or not. That if the data is part of that string or not, it is there. It gives me true. And it is not necessary that it has to be a whole world. It can be a character also. For example, I give it as L. So, hello also contains L. World also contains L. Other two become false. Right learners getting my point. Clear learners. Are we good to go? Right now let's understand and the concept of iteration. What is iteration? What is iteration learners? Looping. Now why in pandas looping is considered more important? Because pandas represent the real time data. So either you want Iterate each row together. We don't want to se you know iterate separate items. Agreed. Many yog want to iterate the whole row together. So panda library has uh developed different functions to iterate the rows of the real time data set. So iteration in pandas typically involves traversing through the rows or elements of the data frame or series. However, it is important to note that the direct iteration over data frame rows using Python's for loop is generally discouraged due to performance reason. So generally the direct iteration that we want to perform is discouraged. Why is it discouraged? Right? Why is it discouraged? because we want efficient methods for iteration to access all the elements in the data frame. So therefore several you know functions have been developed. The first one is iter rows. The second one is iteruples. The third one is iterms. Right? So let's understand one by one. Yes learners are you there with me? Yes. So we have created a data frame with column one two and cal values some 100 200 300 values. Yes learners are you there with me getting my point? What does the iter rows do? Please try to understand. Iter rows will do index and row. It will take two values index and row. For each value of index, it will you know store the value as a tpple. Getting my point? Then in the next row the next value as this tpple and in the third row next value as this tpple. What I'm trying to say is let's understand over here. I hope these ppds are also helping you out. I hope these ppds are also helping you out to understand things better to get your concepts right. Okay. So iteration in pandas iter rows is generally used for smaller data set and it's quite outdated method right now. Apply the apply function for complex row wise transformation. We generally apply apply for complex row wise transformation and then we have the vectorzed operation. What do we mean by vectorzed operation? What do we mean by vectorzed operation? one to one area element byelement operation. Clear? Is this point getting clear to everybody? So one of the you know limitations of iter rows is that it does not preserve the data types across the rows. It can change the data types. Each row is automatically converted to a series because a single uh row is one-dimensional array which might lead to type changes if columns are of mixed data types. Why? Because series are of homogeneous data type whereas a column can be a heterogeneous column. So that is why this was the flaw in it rows as and we also understand that this is less efficient for large data frames compared to the vectorz operations or methods like apply or iterles especially when performing operations that can be applied to entire columns or data frame at once. Clear? Got my point? It is generally not recommended to modify data frames while iterating over it using iteros as this can lead to unexpected behavior. Got my point? Are these points clear? the limitations of it and why we avoid it at the moment. Okay. Okay. Then let's move on to the iter rows syntax. The syntax of the iter rows methods in pandas is that for index and row series automatically it gives me two outputs index and row series when I iterate using a data frame with iter rows. Clear? So if you look at this particular example for this data frame for columns year and sale we get this ID for uh you know the index values and the row as tpple. So when I extract the value for row year and row sales I get these points clear. Is this point getting clear to everybody? Are you all getting this point? Right. Got it learners? Yeah. This can be repeated. And see if this is my data frame and I am using this itera rows function two values which it returns is the index value by default and row as tuple index is the index of the current current row where we are iterating a row series refers to the data of the current row. So what does it return? An iterator yields a pair of tpples containing the index and data of each row. So how do I access them? Once I'm iterating, I can access this access them using index value. Now clear man. Okay. So which are the better other methods to iterate over uh the data frame? Another function that we use commonly is the df.apply function. So df do.apply function. I want to find out the sum of the value when axis is equal to zero. Axis equal to zero means taking all the rows together. So for p column index 0 1 2 if I take the values I get the value as 27. And for the other one I take all the rows I get these value clear. Then that means I don't have to iterate. And in this case I don't even have to run explicitly any for loop. It will automatically do it for me and give me the result. Clear. Right. And I when I give access equal to one, axis equal to one refers to taking the all rows together. Clear? Getting my point? Learners? It tpples are generally not used only. Generally these two methods are pretty outdated. We are not using them. This is just for your information. We are not using these two methods at all. Okay. Got my point man my so now let's understand the apply function. So this is my column and in the existing column I apply apply function using the lambda function. What is a lambda function in Python? Yes learners. What is a lambda function in Python? Yeah, function with no name, single expression. So it will take the square of the values. But actually do we require this apply function also? Not really. Not really because now we are looking at the vector operation that is element by element. So if this is the thing I can directly use the plus sign also right. Can I can I do this operation like this also rather than even using the apply function. Can I directly do it as into two? Right. So this is the new technique mani that is I'm talking about about we all use vectorz function nowadays. Got my point? H yog ruby others please respond. And which is the fastest method? Can you tell me among the apply vectorzed and the iter rows or iter which one is the faster method? Of course the vectorzed operation. Of course you all understand that right lers clear to everybody. And similarly can we do it over the series also? >> Yeah, we can do it over the series also by using the series dot items. Clear? Now let's understand sorting. What is sorting? What is sorting learners? arranging the data in ascending or descending order. Right? Right. And what is ascending and descending? Ascending is from lower to higher and descending is higher to lowe
Original Description
🔥Data Scientist Masters Program (Discount Code - YTBE15) - https://www.simplilearn.com/in/data-science-course?utm_campaign=T4R0Vd99UXU&utm_medium=Lives&utm_source=Youtube
🔥Partnership is with E&ICT of IIT Kanpur - Professional Certificate Course in Data Analytics and Generative AI (India Only) - https://www.simplilearn.com/iitk-professional-certificate-course-data-analytics?utm_campaign=T4R0Vd99UXU&utm_medium=Lives&utm_source=Youtube
🔥IITG - Professional Certificate Program in Data Analytics and Generative AI (India Only) - https://www.simplilearn.com/iitg-generative-ai-data-analytics-program?utm_campaign=T4R0Vd99UXU&utm_medium=Lives&utm_source=Youtube
This video on Applied Data Science with Python Full Course 2026 by Simplilearn, we provide a complete guide to learning applied data science using Python with real-world use cases. This course focuses on applying data science concepts to solve business problems. You will learn key topics like data cleaning, data analysis, visualization, and machine learning. The video covers libraries such as NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn. You will also explore concepts like data preprocessing, feature engineering, and model evaluation. The course includes hands-on projects and real-world datasets to build practical skills. It is ideal for students, analysts, and professionals looking to apply data science in real scenarios. You will understand how data science is used in business intelligence and decision-making. This course also highlights career opportunities in data science and analytics roles. If you want practical experience in data science, this course is perfect. Watch this video to learn the complete applied data science roadmap with Python in 2026.
Related Videos:
✅ 1. https://www.youtube.com/watch?v=mnkiYN6qikw
✅ 2. https://www.youtube.com/live/LGCZ-Fhm48c
✅ 3. https://www.youtube.com/watch?v=S8hG_NXDRz8
✅ 4. https://www.youtube.com/watch?v=XTwiahmkc_0
✅ 5. https://www.youtube.com/watch?v=Xhne0Zx
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: ML Maths Basics
View skill →Related AI Lessons
🎓
Tutor Explanation
DeepCamp AI