Machine Learning Full Course 2026 | Machine Learning Tutorial | Machine Learning | Simplilearn

Simplilearn · Beginner ·🔢 Mathematical Foundations ·8mo ago

Key Takeaways

This video covers machine learning fundamentals, including supervised and unsupervised learning techniques using scikit-learn and TensorFlow

Full Transcript

Hey there, welcome to our machine learning full course by Simply Learn. You know how your phone seems to read your mind like when Google finishes typing your sentence before you're done or Netflix somehow picks the perfect show for you Friday night? That's not magic. That's machine learning and you're about to learn how it all works. Here's what's crazy. Machine learning is literally everywhere around us. It's helping doctors spot diseases earlier bank catch fraud and even helping your favorite app show you exactly what you want to see. And the best part is companies are desperately hiring for people who understand this stuff. We are talking about real money here. In India, machine learning experts can earn 10 lakh to 25 lakh perom while in US it can go up to $120,000 plus. And don't worry if you're starting from zero, we have got you covered. We'll begin with simple concept and gradually work up to building actually smart system. You'll learn cool techniques like how computers recognize pattern, make predictions and even detect fake news. By the end of this course, you will be the person who knows how to make computers think and learn. So let's get started. Here's a quick information. If you're interested in launching a high growth career in artificial intelligence and machine learning, this program might be the best thing you have ever come across today. The professional certificate in AI and machine learning offered by Purda University online in collaboration with simply learn and IBM isn't just another course. It's a complete career transforming experience. Ranked one online EI ML certification by career karma. This program is designed to help you master the most in demand skills in AI automation GBT ji lms deep learning agentic framework and so much more. So whether you're just starting out or looking to upskill, you'll get hands-on with 15 plus reable projects, hugging face, tensorflow, majour, and even build llm based application. So what are you waiting for? Hurry up and enroll now and you can find the course link below. Welcome to math refresher, probability and statistics. In this lesson, we are going to explain the concepts of statistics and probability. Describe conditional probability. Define the chain rule of probability. Discuss the measure of variance. Identify the types of gshian distribution. Basic of statistics and probability. Probability and statistics. Data science relies heavily on estimates and predictions. A significant portion of data science is made up of evaluations and forecast. Statistical methods are used to make estimates for further analysis. Probability theory is helpful for making predictions. Statistical methods are highly dependent on probability theory and all probability and statistics are dependent on data. Data is information acquired for reference or research via observations, facts, and measurements. Data is a set of facts structured in the form that computers can interpret such as numbers, words, estimations, and views. Importance of data. Data aids in seeing more about the information by identifying possible connections between two features. Data assists in the detection of distortion by uncovering hidden patterns based on prior information patterns. Data may be utilized to anticipate the future or predict the current state of affairs. Also, data aids in determining whether two pieces of information have any instance in common or not. Types of data. Data might be quantitative that is data that can be measured or counted in numbers or it may be qualitative which is data which is generally divided into groups or in simpler words which cannot be counted or measured in numbers. Let's consider an example. A customer information data of a bank may contain quantitative and qualitative data. Consider this snapshot where we have customer ID, surname, geography, gender, age, balance, has C or card is active member. Amongst these variables we can see surname is mostly qualitative as it cannot be counted and measured in numbers. Geography and gender are also qualitative as they cannot be counted in numbers and are mostly groups. has C or card that is has credit card and is active member although are containing numerical in form but these are categorical that means these have been divided into groups of one and zero that represent yes and no as an answer hence these two variables are also qualitative customer ID is again although a numerical data however the significance or intuition behind Customer ID is categorical. Hence, it may be kept in the qualitative data also. However, age and balance these are numerical information which have been measured or counted and numerical operations can be performed on them. Hence, these are under quantitative data categories. Introduction to descriptive statistics. Descriptive statistics. A descriptive measurement is summary measure that quantitatively portrays the most important features of a set of data allowing for a better comprehension of the information. Data can be measured as different levels. The levels of measurement describe the nature of information stored in the data assigned to the variables. Qualitative data can be measured as nominal or ordinal. Quantitative data can be measured in terms of interval and ratio type. Nominal data. The data is categorized using names, labels or qualities. For example, brand name, zip code, and gender. Ordinal data can be arranged in order or ranked and can be compared. Examples include grades, star reviews, position, and race, and date. Interval data is the data that is ordered and has meaningful differences between the data points. Example temperature in Celsius and year of birth. Ratio data is similar to the interval level with the added property of inherent zero. Mathematical calculations can be performed on both interval as well as ratio data. For example, height, age, and weight. Population versus sample. Before analyzing the data, it's important to figure out if it's from a population or a sample. Population is a collection of all available items as well as each unit in our study. Sample is a subset of the population that contains only a few units of the population. Population data is used for study when the data pool is very small and can give all the required information. Samples are collected randomly and represent the entire population in the best possible way. Measures of central tendency. The central tendency is a single value that aids in the description of the data by determining its center position. Measures of central tendency are sometimes known as summary statistics or measures of central location. The most popular measurements of central tendency are mean, median, and mode. The normal distribution is a bell-shaped symmetrical distribution in which mean, median, and mode all are equal. The curve over here shows the bell-shaped curve or the normal distribution of variable X. The point over here that is X1 is the point which represents the mean, median and mode of this distribution. Mean mean is calculated by dividing these sum of all data values by the total number of data values. It gets affected when there are unusual or extreme values. It is sensitive to the outliers. Mean can be calculated as summation over all the values of X in a collection divided by the size of the collection. For example, we have a collection where we have values as 7 3 4 1 6 and 7. We find out the sum of these values which is 28 and there are total of six values. So 28 / 6 gives us a mean value of 4.66. Median, it is the middle value in the set of the data that has been sorted in ascending order. It is a better alternative to mean since it is less impacted by outliers and skewess. It is closer to the actual central value. Median is calculated differently for different sizes of data. Differentiated as if the total number of values is odd or if the total number of values is even. If the size of the data is odd. For example, in this case we have five elements. After sorting whatever middle value we get that means n + 1 by 2 term in this case 5 + 1 / 2 that is the third term which is 4 is the median value. In case when the total number of values is even like here there are six values the average or the mean of the two central values is considered as the median. In this case the median is the mean of six and four which is five. Mode. Mode represents the most common value in the data set. It is not at all affected by extreme observations. It is the best measure of central tendency for highly skewed or non-normal distribution. Mode for categorical data is determined by estimating the frequencies for each categories and then the category with the highest frequency is considered to be mode. Like in this case seven has the highest frequency. Hence seven becomes the mode value. However, in case of continuous data or quantitative data, the calculation of mode is slightly different. The first step in calculation of mode is dividing the data into classes which are equal with then getting the frequency of data points lying in within that range of classes and finally selecting the class with the highest frequency. Using the range of that class and the frequencies, we can get the final mode value. Using the formula L plus FM minus F_sub_1 multiplied to H / FM minus F_sub_1 plus FM minus F_sub_2. Here L is the lower limit or the lower observation of the mode class. H is the size of the mode class. FM is the frequency of the mode class. F_sub_1 is the frequency of the class proceeding to mode and F_sub_2 is the frequency of the class succeeding to mode. This gives us the final mode value. Mean versus expectation. Now let's talk about mean versus expectation. So in general we use the expected value or expectation when we want to calculate the mean of a probability distribution that represents the average value we expect to occur before collecting any data. And mean on the other hand mean is basically used when we want to calculate the average value of a given sample. This represents the average value of raw data that we may have already collected. We can understand this by using a simple example. Now to calculate the expected value of this probability distribution, we can use a specific formula from the previous discussion. This is going to be the expected value where X is going to be the data value and this PX is the probability of value. For example, we could calculate the expected value for this probability distribution to be as shown. So here it will be 1.45 goals. So this represents the expected number of goals that the team will score in any given game. And then if you talk about calculating mean, so we typically calculate the mean after we have actually collected raw data. For example, suppose we record the number of goals that a soccer team will score in 15 different games. Now to calculate the mean number of goals scored per game, we can use the following formula where sum of x is basically the sum of all the goals divided by n and the number of records or we can say the sample size. It is as shown on the screen. So this represents the mean number of goals scored per game by the team. Measures of asymmetry. The difference between the three distinct curves can be studied in this image. The central curve is the normal or no skewess curve here. mean, median and mode all lie on the same point. This normal curve is symmetrical about its mean, median and mode. That means the left hand side of the curve is a mirror image of the right hand side of the curve. However, in case of negatively skewed data, the tail is elongated on the left hand side and the mean is smaller than the mode and the median values or is on the left hand side of the mode. Hence indicating that the outliers are in the negative direction. On the other hand, in case of positively skewed, the data is concentrated on the left hand side of the curve. While the tail is elongated or longer on the right hand side of the curve, the mean is greater than the mode and median or is on the right hand side of the mode and median indicating that the outliers are in the positive direction. Let's consider an example. The graph here shows the global income distribution for the year 2003 2013 and a projection for 2035. If we see the global income distribution statistics for 2003 it is highly right skewed. We can observe in the previous graph that in 2003 the mean of $3,451 was higher than the median of $1090. The global income is definitely not evenly distributed. The majority of people make less than $2,000 each year, while only a small percentage of the population earns more than $14,000. Measures of variability. Measures of variability. Dispersion. The measure of central tendencies provide a single value that addresses the full worth. However, the central tendency cannot depict the viewpoint entirely. The metric of dispersion helps us focus on the inconsistency in the data spread. Measures of dispersion describe the spread of the data. The range, intercortile range, standard deviation and variance are examples of dispersion measures. Range. The range of distribution is the difference between the largest and the smallest amount of data. The range, for example, does not include all of a series positive aspects. It concentrates on the most shocking aspects and ignores that aren't considered critical. For example, for a set 13, 33, 45, 67, 70, the range is 57. That is the maximum of this which is 70 minus the minimum over here which is 13. Variance. Variance is the average of all squared deviations. It is defined as the sum of squared distance between each point and the mean or the dispersion around the mean. The standard deviation is used as variance suffers from a unit difference. Variance can be computed as sigma square summation over x - mu^ 2 divided by n where mu is the mean of the data, x is the individual data point and n is the size of the data. This representation is for a population data. for a sample data variance can be computed as x minus xar whole square summation over it divided by n minus one. Here xbar is the mean of these sample data and n is the sample size. The units of values and variance are not equal. So another variability measure is used. Standard deviation. Standard deviation is a statistical term used to measure the amount of variability or dispersion around a mean. The standard deviation is calculated as the square root of variance. It depicts the concentration of the data around the mean of the data set. Standard deviation as indicated previously can be computed as square root of variance for a population data. Standard deviation sigma can be computed as square root of summation over x i minus mu^ square / n where mu is the mean of the data x i are the data points and n is the size. Let's consider an example. Let's find out the mean, variance, and standard deviation for this data. The data values are three, 5, 6, 9, and 10. To find out the mean, we first find the sum of all these data values that is 33 and divide it by the count, which is five. We get the mean of 6.6. To compute the variance, we start by computing the deviation. That is X minus the mean of X. Here 3 is one of the values of the data and 6.6 is the mean. So 3 - 6.6 squared and we do that to find out sum of all the deviations divided by the count which is five. We end up getting an overall variance of 6.64. Standard deviation as we know is measured at square root of variance that is square root of 6.64 which amounts to 2.576. Measures of relationship. Measures of relationship coariance. Coariance is the measure of joint variability of two variables. It measures the direction of the relationship between the variables. It determines if one variable will cause the other to alter in the same way. Coariance between variable X and Y can be computed as summation over the product of X I - Xar and Y I - Y bar the whole divided by N minus one. Here Xar and Y bar are the mean of X and Y respectively. The value of covariance can range from minus infinity to a plus infinity. Correlation. Correlation is normalized coariance. It measures the strength of association between two variables. The most common measure for correlation is the Pearson correlation coefficient. Correlation between two variables X and Y can be measured with respect to coariance as coariance between X and Y divided by the standard deviation of X and standard deviation of Y. The value of correlation ranges from a negative 1 to positive 1. Types of correlation. Correlation can be either a positive correlation, zero correlation or a negative correlation. The first picture over here represents a perfect positive correlation wherein a straight line with a positive slope is representing the relationship between the two variables. Zero correlation means that the line representing the relationship between the two variables is horizontal to the xaxis. Perfect negative correlation can be represented by a straight line with a negative slope. Correlation equals to 1 implies a positive relationship. That is when one variable increases the other variable also increases. A correlation value of negative 1 implies a negative relationship. That is when one variable increases the other decreases. The correlation coefficient of zero shows that the variables are completely independent of each other. Let's consider an example. Here we have two variables height and weight. To compute the correlation between height and weight, we use the correlation formula as coariance of X and Y divided by standard deviation of X and standard deviation of Y. Here height is the X variable and weight is the Y variable. First to compute coariance we compute the x - xar and y - y bar values and then the product of them. We then compute x - xr² and y - y bar square values to compute the standard deviations of height and weight respectively. Correlation as we know has been defined as covariance of x and i and y divided by standard deviations of x and y. This can also be represented as summation over x - xr multiplied to y - y bar divided by square root of summation over sum of squared deviations that is x - xr square multiplied to square root of summation over y - y bar square that is sum of square deviations for y. Now let's find out values to put into this formula. First we find out the overall sum of height to get the mean of height which is 5.14. Similarly we get the sum of weight to get the mean of weight as 50. We now get the summation over x - xr multiplied to y - y bar to get the numerator for the formula. Then we compute x - xr square summation and y - y bar square that is sum of squared deviation of x and y respectively. Now we put in the values in this final correlation formula to get a correlation value of 0.889. This indicates that height and weight have a positive relationship. It is evident that as height grows, weight also increases. In this module, we will be talking about expectation and variance. So the expected value or we can say mean of a given variable that we can denote by X is a discrete random variable where it is a weighted average of the possible values that X can take and each value is going to be according to the probability of that specific event occurring. So usually the expected value of X is denoted by a simple formula where we can define the expectation based on the X parameter. which is going to be the sum of each possible outcome multiplied by the probability of the outcome occurring. So in more concrete terms, the expectation is what we would expect the outcome of an experiment to be on average. We can take an example for the coin. If a coin is being tossed 10 times, then one is most likely to get five heads and five tails. Same logic can be discussed if we talk about another example of rolling a die. So there are six possible outcomes when you roll a dieice 1 2 3 4 5 6. And each of these has a probability of 1 by 6 of occurring. So we can say that the expectation is going to be 1 multiplied by the probability of that happening which is going to be 1x 6 + 2x 6 + 3x 6 + 4x 6 + 5x 6 + 6x 6 and that is going to give us 3.5 as an output. The expected value is 3.5. So if you think about it, 3.5 is halfway between the possible values that I can take and this is what we should have expected. Next we talk about the concept of variance. So variance of a random variable allows us to know something about the spread of the possible values of the variable. So for a discrete random variable X, the variances of X is going to be denoted by using a simple formula that is going to be var equals E X - M the whole square where M is basically the expected value of the expectation of X. So this is more like a standard deviation of X which can also be represented by using this formula. So the variance does not behave in the same way as expectation when we multiply and add constants to random variables. So now there are two different type of variance that we can have a fair understanding on. First of all we have low variance and then we have high variance. So low variance simply means that there is a small variation in the production of the target function with changes in the trading data set and at the same time high variance as we can see here high variance shows a large variation in prediction of the target function with changes in the trading data set. So a model that shows high variance learns a lot and perform well with the training data set and it does not generalize well with the unseen data set and that's why as a result such a model gives good results with training data set but shows high error rates on the test data set and since the high variance a model learns too much from the data set it leads to an overfitting of the model. So model with high variance will be having couple of issues like it may lead to overfitting or it may also lead to increase in model complexities. Next we have skewess. So skewess in simple terms is basically a measure of asymmetry of a distribution. So distribution is asymmetrical when its left and right sides are not the mirror images. Right now this is a mirrored image and a distribution can have right positive or we can say negative or it can have zero skewess. So right skewed in this scenario is basically the distribution is longer on the right side of its peak and a left skew distribution is going to be we can say where it is longer on the left side. So we can see we have this one as a part of right side. it is more elongated towards the right side and this one is more elongated towards the left side. So we can think of skewess in terms of tails. A tail is long tampering and the end of a distribution. So it simply indicates that they are observations at one end of the distribution but that they are relatively infrequent. So a right skew distribution has a long tail on the right side as you can see here. So the number supports observed. Let's say we have a data on a per year basis. So again we can have a more skewess towards the right side where data is being dropping as we continue to increase the number of years. For example we may have a high sales towards the beginning of year suppose in 2022 but again as we proceed to 2023 second half we are seeing the dip in performance. So that is rightly skewed and same way let's suppose if we started with the sales figure it was really less in suppose 2002 but again as we proceeded to 2023 now our sales have been gradually increasing so it's more like skew towards the left section as a part of negative skew. Next we have curtosis. So curtosis is basically a measure of the tailness of a distribution. So tailness is how often the outliers occur and acts as curtis is the tailness of the distribution related to a normal distribution. So a distribution with medium curtsis is called as messortic. A distribution with low curtosis like this one. This is called as the platicurtic and then distribution with high curtosis like this one. This is called as the leptocortic. So tails here they are tapering ends on either side of a distribution like this. So they represent the probability or the frequency of values that are extremely high or extremely low to the mean. In other words, tails here represents how often the outliers occur. So there are three type of curtsis. We have platocurtic which is negative, leptoccuric which is a positive towards the upper end and then we have messertic which is a normal distribution. So meocurtic is the medium tail. So normal distributions they have a curtosis of three. So any distribution with a curtsis of a prox value of three is going to be messertic. And curtosis is described in terms of excess curtises which is curtosis minus3. And since normal distribution they have a curttosis of three axis curtises makes comparing a distribution curtosis to a normal distribution even easier. Introduction to probability. Probability theory. Probability is a measure of the likelihood that an event will occur. Let's consider an example of coin toss where the chances of getting heads on a coin are 1 by two or 50%. The probability of each given event is between zero and one both inclusive. Sum of an events cumulative probability cannot be greater than one. Hence the probability of an event X lies between zero and one. This means that the integral of probability of distribution over X equals to 1. Conditional probability. Conditional probability of any event A is defined as the probability of occurrence of A given that event B has previously occurred. Condition probability of event A given B can be estimated as probability of A intersection B that is probability of both A and B happening together divided by the probability of B. It is also written as that probability of A intersection B equals to probability of A given B multiplied to probability of B. Let's consider an example. In a coin, we are doing a two coin flip. Coin one gets heads, tails, heads, and tails in subsequent flips. while coin two gets tails, heads, heads, and tails in the subsequent flips. Now, the probability that coin one will get a head is 2 out of four. While the probability that coin two will get heads is again two out of four. The probability that both coin one and coin two will have a heads is just one out of the four flips. Hence the probability that coin one will get heads given that coin 2 is already heads can be computed as probability of coin one edge intersection coin 2 edge that is 1x4 divided by probability of coin 2 edge that's a given that is 2x 4 which is going to be 0.5 or 50% based base theorem Base theorem calculates the conditional probability of an event based on its prior probabilities. Basically base theorem incorporates the prior probability distribution to predict the posterior probabilities. Base theorem for conditional probability can be expressed as probability of A given B equals probability of B given A divided by probability of B multiplied to probability of A. Base theorem allows updating the probability values by using new information or evidence. Here probability of A is known as prior probability. That is the probability of event before any new data is collected. Probability of A given B is known as the posterior probability. It is the revised probability of an event occurring after taking into consideration the new information probability of B given A is known as the likelihood and probability of B is probability of observing an evidence B model. An example consider an example for calculating the likelihood of having diabetes based on frequency of fast food consumption. Here is the observed data. Let's say the fast food audience is 20%. Diabetes prevalence is 10% and 5% is fast food and diabetes. The chances of diabetes given fast food that is the conditional probability of D given B can be calculated as probability of diabetes and fast food together divided by probability of fast food. That means 5% divided by 20%. that equals 25%. Define an analysis can state eating fast food increases the chance of having diabetes by 25%. The multiplication rule of probability if events A and B are statistically independent and probability of A intersection B can be given as probability of A given B multiplied to probability of B. However, probability of A intersection B is also given as probability of A multiplied to probability of B. Here probability of A given B equals to probability of A when we assume that probability of B is non zero. Similarly, probability of B equals probability of B given A assuming probability of A is non zero. Chain rule of probability joint probability distributions over many random variables can be reduced into conditional distributions over a single variable. It can be expressed as probability of X1 X2 so on until Xn equals probability of X1 intersection probability of X I given probability of X1 till X I minus one. For example, the joint probability of A, B and C can be given as probability of A given B. C multiplied to probability of B given C multiply to probability of C. Logistic sigmoid. The logistics function is a type of sigmoid function that aims to predict the class to which a particular sample belongs. Its outcome is discrete binary value. a probability between zero and one. The logistic sigmoid is a useful function that follows the yes curve. It saturates when the input is very large or very small. Logistic sigmoid is expressed as sigma of x= 1 upon 1 + e to the power minus x. The logistic sigmoid can be expressed as sigmoid function of x is given as 1 upon 1 + e ^ minus x where e is the ooler's number. Gshian distribution. The gossian distribution is a type of distribution in which data tends to cluster around a central value with little or no bias to the left or right. It is often referred to as normal distribution. In absence of prior information, the normal distribution is frequently a fair assumption in machine learning equation. The formula for calculating Gaussian distribution is described as the normal distribution of X. That is the function of X given mean as mu and variance is sigma square can be calculated as 1 upon sigma square<unk> of 2<unk>i. E to the power minus/ X - mood divided by sigma square where mu is the mean or peak value which also is the expected value of X. Sigma is the standard deviation. Sigma square is the variance. A standard normal distribution has a mean of zero and a standard deviation of one. Gshian distribution can be univariat which describes the distribution of a single variable X. It can also be multivariate where it can just use to describe the distribution of several variables. It is represented in 3D of ND formats. Law of large numbers. Now let's talk about law of large numbers. The law of large numbers states that an observed sample average from a large sample will be close to the true population average and that it will get closer in the larger sample. So the law of large number does not guarantee that a given sample spatially a small sample will reflect the true population characteristics or that a sample does not reflect the true population will be balanced by a subsequent sample. This is for the law of large numbers to express the relationship between scale and growth rate. So there are multiple examples through which we can understand and it is widely used in statistical analysis in working with the central limit theorem in terms of the business growth. So there are multiple real time setup in which these are going to be used. So if you talk about tossing a coin, so tossing a coin in a number of times will give us two different type of outcomes. The result will spread evenly between head and tails and the expected average value is going to be half. That means 50 * tails and 30 * heads. But again, if you toss a coin 1,00 times, then the result can be in different manners because out of 1,00 let's say 850 times it has been head and only 150 times it has been tails and so on. So that's why the possibility of one event occurring is going to be changed in large sample sets as compared to a small sample sets as in let's say 10 times. So the number of heads and tails unbalanced for lower number of trials. So we can see it is unbalanced. But again as soon as we toss more number of coins more leans towards the balance value or we can see the observed averages. Next we have P value. So p value is basically a number calculated from the statistical test that describes how likely we are to have found a particular set of observations if the null hypothesis were true. So p values are used in hypothesis testing to help decide whether to reject the null hypothesis. And the smaller the p value, the more likely we are to reject the null hypothesis. So we have a term called as null hypothesis. So all statistical tests they have null hypothesis. So for most tests the null hypothesis is that there is no relationship between our variables of in first or that there is no difference among groups. For example in a two-tail t test the non-hypothesis is that the difference between two groups is going to be zero. So p value is going to tell us how likely it is that our data could have occurred under the null hypothesis. It is done by calculating the likelihood of a test statistic which is the number calculated by a statistical test using our data. So p value tell us how often we would expect to see a test statistic as extreme or more extreme than one calculated by a statistical test. if the null hypothesis of the test was true. So there are multiple limitations as well. So first one is the results can be significant but again they are they may not be practical as we have compared it can be based on multiple hypothesis for a game for the healthcare test. If the test is going to be positive or not it may show even values of the effect of a variable but not the magnitude in real life. What exactly is going to be the application of a drug test being failed in pharma company? Therefore, it is recommended to use confidence and levels in addition to the p values to quantify or we can say to give a solid figure to the reserve which we are going to get. The p values they are interpreted as supporting or we can say refuting the alternative hypothesis. So p value can only tell you whether or not the null hypothesis is supported. It cannot tell us whether our alternative hypothesis is true or why. So the risk of rejecting the null hypothesis is often higher than the p value. So especially when we are looking at a single study or when using small sample sizes. So this is because the smaller frame of reference, the greater are the chance that as we stumble across a statistically significant pattern completely by accident. Key takeaways. Key takeaways. Probability and statistics structure the premise of the data. The data helps in anticipating the future or gauging in view of the past patterns of information. The central tendency is a single value that helps to describe the data by identifying these central positions. The mean, median and mode are the measures of central tendencies. The distribution where the data tends to be around a central value with a lack of bias or minimal bias towards the left or right is called as gshian distribution. >> Mathematics for machine learning. My name is Richard Kersner with the SimplyLearn team. That's get certified, get ahead. We're going to cover mathematics for machine learning. So today's agenda is going to cover data and its types. Then we're going to dive into linear algebra and its concepts, calculus, statistics for machine learning, probability for machine learning, hands-on demos, and of course throwing in there in the middle is going to be your matrixes and a few other things to go along with all this data. Then is types data denotes the individual pieces of factual information collected from various sources. It is stored, processed and later used for analysis. And so we see here uh just a huge grouping of information, a lot of tech stuff, money, dollar signs, numbers uh and then you have your performing analytics to drive insights and hopefully you have a nice share your shareholders gathered at the meeting and you're able to explain it in something they can understand. So we talk about datas types of data we have in our types of data we have a qualitative categorical you think nominal or ordinal and then you have your quantitative or numerical which is discrete or continuous and let's look a little closer at those data type vocabulary always people's favorite is the vocabulary words okay not mine uh but let's dive into this what we mean by nominal nominal they are used to label various just uh label our variables without providing any measurable value. Uh country, gender, race, hair, color, etc. It's something that you either mark true or false. This is a label. It's on or off. Either they have a red hat on or they do not. Uh so a lot of times when you're thinking nominal data labels, uh think of it as a true false kind of setup. And we look at ordinal. This is categorical data with a set order or a scale to it. Uh and you can think of salary range is a great one. Uh movie ratings etc. You see here the salary range if you have 10,000 to 20,000 number of employees earning that rate is 150. 20,000 to 30,000 100 and so forth. Some of the terms you'll hear is bucket. Uh this is where you have 10 different buckets and you want to separate it into something that makes sense into those 10 buckets. And so when we start talking about ordinal, a lot of times when you get down to the brass bones, again, we're talking true false. Uh so if you're a member of the 10 to 20k range, uh so forth, those would each be either part of that group or you're not. But now we're talking about buckets and we want to count how many people are in that bucket. Quantitative numerical data uh falls into two classes, discrete or continuous. And so data with a final set of values which can be categorized class strength questions answered correctly and runs hit in cricket. A lot of times when you see this you can think integer uh and a very restricted integer i.e. you can only have 100 questions um on a test. So you can it's very discreet. I only have a 100 different values that it can attain. So think usually you're talking about integers but within a very small range. They don't have an open end or anything like that. Uh so discrete is very solid, simple to count, set number. Continuous on the other hand uh continuous data can take any numerical value within a range. So water pressure, weight of a person etc. Usually we start thinking about float values where they can get phenomenally small in their in what they're worth. And there's a whole series of values that falls right between discrete and continuous. Um you can think of the stock market. You have dollar amounts. It's still discreet, but it starts to get complicated enough when you have like, you know, jump in the stock market from $525.33 to $580.67. There's a lot of point values in there. It'd still be called discreet, but you start looking at it as almost continuous because it does have such a variance in it. Now uh we talk about n we did we went over nominal and ordinal uh almost true false charts and we looked at quantitative and numerical data which we're starting to get into numbers. Discrete you can usually a lot of times discreet will be put into it could be put into true false but usually it's not. Uh so we want to address this stuff and the first thing we want to look at is the very basic which is your algebra. So we're going to take a look at linear algebra. You can remember back when your uklidian geometry. Uh we have a line. Well, let's go through this. We have linear algebra is the domain of mathematics concerning linear equations and their representations in vector spaces and through matrices. I told you we're going to talk about matrices. Uh so a linear equation is simply um uh 2x + 4 y - 3 z = 10. Very linear. 10 x + 12.4 4 y = z. And now you can actually solve these two equations by combining them. Uh, and that's where we're talking about a linear equation. In the vectors, we have a + b= c. Now, we're starting to look at a direction. And these values usually think of an xyz plot. Um, so each one is a direction. And the actual distance of like a triangle A is C. And then your matrix can describe all kinds of things. Um, I find matrixes uh confuse a lot of people, not because they're particularly difficult, but because of the magnitude and the different things are used for. And a matrix is a chart or a um, you know, think of a spreadsheet, but you have your rows and your columns. And you'll see here we have a * b= c. Very important to know your counts. Uh, so depending on how the math is being done, what you're using it for, making sure you have the same rows and number of columns or a single number, there's all kinds of things that play in that that can make matrixes confusing. Uh, but really it has a lot more to do with what domain you're working in. Uh, are you adding in multiple polomials where you have like uh uh ax^2 plus b y plus, you know, you start to see that can be very confusing versus a very straightforward matrix. And let's just go a little deeper into these because these are such primary this is what we're here to talk about is these different math uh mathematical computations that come up. So we're looking at linear equations. Let's dig deeper into that one. An equation having a maximum order of one is called a linear equation. Uh so it's linear because when you look at this we have uh ax plus b= c which is a one variable. We have two variable ax plus b y = c ax plus b y + z c cz z= d and so forth. But all of these are to the power of one. You don't see x squ. You don't see x cubed. So we're talking about linear equations. That's what we're talking about. And they're addition. If you have already dived into say neural networks, you should recognize this ax plus b y plus cz um setup plus the intercept uh which is basically your your neural network each node adding up all the different inputs. And we can drill down into that most common formula is your y = mx + c. So you have your uh y equals the m which is your slope, your x value plus c which is your um y intercept. They kind of labeled it wrong here. Threw me for a loop. But the the c would be your y intercept. So when you set x equal to 0, y equals c. And that's that's your y intercept right there. Uh and that's they they just had reversed value of y. When x equals 0, it equals the y intercept which is c. and your slope gradient line which is your m. So you get your y = 2x + 3. And there's lots of easy ways to compute this. This why this is why we always start with the most basic one when we're solving one of these problems. And then of course the one of the most important takeaways is the slope gradient of the line. Uh so the slope is very important that m value. Uh in this case we went ahead and solved this. If you have y = 2x + 3 you can see how it has a nice line graph here on the right. So matrixes a matrix refers to a rectangular representation of an array of numbers arranged in columns and rows. So we're talking m rows by n columns here a1 is denotes the element of the first row in the first column. Similarly a12 and it's really pronounced a11 in this particular setup. So it's row one column one. A12 is a row one column 2. uh first row and second column and so on. And there's a lot of ways to denote this. I've seen these as like a capital letter a smaller case a for the top row or I mean you can see where they can go all kinds of different directions as far as the value. You just take a moment to realize there's need to be some designation as far as what row it's in and what column it's in. And we have our uh basic operations. We have addition. So when you think about addition, you have uh uh two matrices of 2x two and you just add each individual number in that matrix and then when you get to the bottom you have uh in this case the solution is 12 10 + 2 is 12 5 + 3 is 8 and so on. And the same thing with subtraction. Now again you're counting matrices you want to check your um dimensions of the matrix. the shape. You'll see shape come up a lot in programming. So, we're talking about dimensions. We're talking about the shape. If the two shapes are equal, this is what happens when you add them together or subtract them. And we have multiplication. When you look at the multiplication, you end up with a very uh slightly different setup going. Now, if we look at our last one, we're um uh we're like, why this always gets to me when we get to matrices. They don't really say why you multiply matrices. Um, you know, my first thought is 1 * 2, 4 * 3. But if you look at this, we get 1 * 2 + 4 * 3, 1 * 3 + 4 * 5, uh, 6 * 2 + 3 * 3, 6 * 3 + 3 * 5. If you're looking at these matrices, uh, think of this more as an equation. And so we have uh if you remember when we back up here for our multiple line equations, let's just go back up a couple slides where we were looking at uh two variable. So this is a two variable equation. ax plus b y= c. Um and this is a way to make it very quick to solve these variables. And that's why you have the matrix and that's why you do the multiplication the way they do. And this is the dotproduct of uh 1* 2 + 4 * 3 1 * 3 + 4 * 5 uh 6 * 2 + 3 * 3 6 * 3 + 3 * 5 and it gives us a nice little 14 23 21 and 33 over here which then can be used and reduced down to a sample um formula as far as solving the variables as you have enough inputs. Uh and then in matrix operations when you're dealing with a lot of matrices. Uh now keep in mind multiplying matrices is different than finding the product of two matrices. Okay? So when we're talking about multiplication, we're talking about solving uh for equations. When you're finding the product, you are just finding 1* 2. Keep that in mind because that does come up. I've had that come up a number of times where I am altering data and I get confused as to what I'm doing with it. uh transpose flipping the matrix over it's diagonal comes up all the time where you have you still have 12 but instead of it being uh 128 it's now 1214 821 you're just flipping the columns and the rows. Uh and then of course you can do an inverse um changing the signs of the values across this main diagonal. And you can see here we have the inverse a to the minus one and ends up with uh instead of 12 8 14 12 it's now -22 -2 vectors uh vector just means we have a value and a direction and we have down four numbers here on our vector. uh in mathematics a one-dimensional matrix is called a vector. Uh so if you have your x plot and you have a single value that values along the x- axis and it's a single dimension. If you have two dimensions you can think about putting them on a graph. You might have x and you might have y and each value denotes a direction. And then of course the actual distance is going to be the hypothesis of that triangle. Uh and you can do that with three dimensionals x y and z. uh and you can do it all the way to nth dimensions. So when they talk about the k means uh for categorizing and how close data is together they will compute that based on the pyagorean theorem. So you would take uh the square of each value, add them all together and find the square root. And that gives you a distance as far as where that point is, where that vector exists or an actual point value. And then you can compare that point value to another one and makes a very easy comparison versus comparing uh 50 or 60 different numbers. And that brings us up to gene vectors and I gene values. uh igene vectors the vectors that don't change their span while transformation and I gene values the scalar values that are associated to the vectors conceptually you can think of the vector as your picture you have a picture it's um uh two dimensions x and y and so when you do those two dimensions and those two values or whatever that value is um that is that point but the values change when you skew it. And so if we take and we have a vector A and that's a set value. Uh B is um your is your you have A and B which is your hygiene vector. Two is the hyene value. So we're altering all the values by two. That means we're um maybe we're stretching it out one direction making it tall. Uh if you're doing picture editing um that one of the places this comes in. But you can see when you're transforming uh your different information, how you transform it is then your hygiene value. And you can see here vector after line transition uh we have 3 a. A is the hygiene vector. Three is the gene value. So a doesn't change. That's whatever we started with. That's your original picture. And three uh is skewing it one direction and maybe uh b is being skewed another direction. And so you have a nice tilted picture because you've altered it by those by the hygiene values. So let's go ahead and pull up a demo on linear algebra. And to do this, I'm going to go through my trusted Anaconda into my Jupiter notebook. And we'll create a new uh notebook called linear algebra. Since we are working in Python, uh we're going to use our numpy. I always import that as np or numpy array. probably the most popular um module for doing matrixes and things in given that this is part of a series. I'm not going to go too much into numpy. Uh we are going to go ahead and create two different variables. A for a numpy array 105 and b 29. We'll go ahead and run this. And you can see there's our two arrays 105 29. And I went ahead and added a space there in between so it's easier to read. And since it's the last line, we don't have to put the print statement on it unless you want. We can simp but we can simply do a plus b. So when I run this, uh, we have 10 15 29 and we get 30 24, which is what you expect. 10 + 20, 15 + 9. You could almost look at this addition as being um just adding up the columns on here coming down. And if we wanted to do it a different way, we could also do a t plus b dot t. Remember that t flips them. And so if we do that, we now get them. Uh we now have 30 24 going the other way. We could also do something kind of fun. There's a lot of different ways to do this. Uh as far as a plus b, I can also do a + b. T. And you're going to see that that will come out the same. the 30 24 whether I transpose a and b or transpose them both at the end and likewise we can very easily subtract two vectors I can go a minus b and we run that and we get - 106 now remember this is the last line in this particular section that's so I don't have to put the print around it um and just like we did before we can transpose either the individual or we can transpose the main setup up and then we get a - 106 going the other way. Now, we didn't mention this in our notes, but you can also do a scalar multiplication. Let me just put down scaler so you can remember that. Uh what we're talking about here is I have uh this array here u and if I go a time u uh we'll take the value two, we'll multiply it by every value in here. So 2 * 30 is 60. 2 * 15 and just like we did before um this happens a lot because when you're doing matrices you do need to flip them. You get 6030 coming this way. So in numpy uh we have what they call dotproduct and uh with this this in a twodimensional vectors it is the equivalent of two matrix multiplication. And remember we were talking about matrix multiplication uh where it is the well let's walk through it. We'll go ahead and start by defining two um numpy arrays. We'll have uh 10 20 256 or our u and our v. Uh and then we're going to go ahead and do if we take the values uh and if you remember correctly an array like this would be 10 * 25 + 20 * 6. We'll go ahead and uh print that. There we go. And then we'll go ahead and do the uh np dot of u comma v. And we'll find when we do this, we go ahead and run this uh we're going to get uh 370 370. So this is a strain multiplication where they use it to solve uh linear algebra uh when you have multiple numbers going across. And so this could be very complicated. We could have a whole string of different variables going in here. But for this we get a nice uh value for our dot multiplication. And we did um addition earlier which is just your basic addition. Uh and of course a matrix you can get very complicated on these or in this case we'll go ahead and do um let's create two complex matrixes. This one is a matrix of um you know 1210 46 431. We'll just print out A so you can see what that looks like. Here's print A. We print A out. You can see that we have a um 2x3 layer matrix for A. And we can also put together always kind of fun when you're playing with print values. Uh we could do something like this. We could go in here. There we go. Uh we could print a we have it end with uh equals a run. And this kind of gives it a nice look. Uh here's your matrix. That's all this is. Comma n means it just tags it on the end. That's all all that is doing on there. And then we can simply add in what is a plus b. And you should already guess because this is the same as what we did before. There's no difference. Uh we do a simple vector addition. We have 12 + 2 is 14. 10 + 8 is 18. And so on. And just like we did the uh matrix addition, we can also do a minus b and do our matrix subtraction. And we look at this uh we have what? 12 - 2 is 10. 10 - 8 um where are we? Oh, there we go. 8 min confusing what I'm looking at. I should have reprinted out the original numbers. Uh but we can see here 12 - 2 is of course 10. 10 - 8 is 2. Uh 4 - 46 is - 42 and so forth. So same as a subtraction as before, we just call it matrix subtraction. It's identical. Now if you remember up here, we had uh scalar addition where we're adding just one number to a matrix. You can also do scalar multiplication. Uh and so simply if you have a single value A and you have B which is your array, we can also do A * B. When we run that, uh, you can see here we have 2 * 4 is 8. Uh, 5 * 4 is 20 and so forth. You're just multiplying the four across each one of these values. And this is an interesting one that comes up. A little bit of a brain teaser is matrix and vector multiplication. And so when we're looking at this, uh, we are just do a regular arrays. It doesn't necessarily have to be a numpy array. We have a which has our um array of arrays and b which is a single array and so we can from here do the dot a b and this is going to return two values and the first value is that it's you could say it's like uh um we're doing the this array b array first with a and then with a second one and so it splits it up so you have a matrix of vector multiplication and you mix and match. When you get into really complicated uh backend stuff, this becomes more common because you're now you got layers upon layers of data and so you you'll end up with a matrix and a set of uh vector matrices. Do you want to multiply? Now, keep in mind that if you're doing data science, a lot of times you're not looking at this. This is what's going on behind the scenes. So if you're in um the scikit looking at sklearn where you're doing linear regression models, this is some of the math that's hidden behind the scenes that's going on. Other times you might find yourself having to do part of this and manipulate the data around so it fits right and then you go back in and you run it through the scit. And if we can do um up here where we did a uh matrix and vector multiplication, we can also do matrixtom matrix multiplication. And if we run this where we have the two matrices, uh you can see we have a very complicated array that of course comes out on there for our dot. And just to reiterate it, we have our transpose a matrix which is your T. And so if we create a matrix A and we do transpose it, you can see how it flips it from 5 10 15 20 25 30 to 5 15 25 10 20 30 uh rows and columns. And certainly with the math, uh, this comes up a lot. Um, it also comes up a lot with XY plotting. When you put it into piplot, you have one format where they're looking at pairs of numbers and then they want all of X's and all Y's. So, you know, the transpose is an important tool both for your math and for plotting and all kinds of things. Another tool that we didn't discuss uh is your identity matrix. Uh and this one is more definition. Uh the identity matrix. Um we have here one where we just did uh two. So it comes down as one 0 0 1 uh 1 0 0 1 0. It creates a diagonal of one. And what that is is when you're doing your identities, you could be comparing all your different features to the different features and how they correlate. And of course when you have uh feature one compared to feature one to itself it is always one uh where usually it's between zero one depending on how well correlates. So when we're talking about identity matrix that's what we're talking about right here is that you create this preset matrix and then you might adjust these numbers depending on what you're working with and what the domain is. And then another thing we can do uh kind of wrap this up. we'll hit you with the most complicated uh um piece of this puzzle. Here is an inverse um a matrix. And let's just go ahead and put the um it's a lengthy description. Let's go and put the description. This is straight out of the uh the website for um Numpy. Uh so given a square matrix A, here's our square matrix A, which is 2 1 0 0 1 0 1 2 1. Keep in mind 3x3 it's square. It's got to be equal. It's going to return the matrix a inverse satisfying dot a um a inverse. So here's our matrix multiplication um and then of course it equals the dot uh yeah a inverse of a um with an identity shape of uh a dotshaped zero. This is just reshaping the identity. That's a little complicated there. Uh so we go and have our here's our array. Uh we'll go ahead and run this. And you can see what we end up with is we end up with uh an array 0.5 minus.5 and so forth with our 211 going down to one 0 0 1 0121. Um getting into a little deep on the math. Understanding when you need this is probably really is is what's really important when you're doing data science versus uh handwriting this out. looking up the math and handwriting all the pieces out. You do need to know about the linear algorithm inverse of a. Uh so if it comes up, you can easily pull it up or at least remember where to look it up. We took a look at the algebra side of it. Let's go ahead and take a look at the calculus side of uh what's going on here with the machine learning. So calculus, oh my goodness, and differential equations, you got to throw that in there because that's all part of the bag of tricks, especially when you're doing large neural networks, but also comes up in many other areas. The good news is most of it's already done for you in the back end. Uh so when it comes up, you really do need to understand from the data science, not data analytics. Data analytics means you're digging deep into actually solving these math equations. uh and a neural network is just a giant differential equation. Uh so we talk about calculus uh we're going to go ahead how and understand it by talking about cars versus time and speed. Uh so helps to calculate the spontaneous rate of change. Uh so suppose we plot a graph of the speed of a car with respect to time. So, as you can see here, going down the highway, probably merged into the highway from an on-ramp, so I had to accelerate. So, my speed went way up. Uh, stuck in traffic, merged into the traffic, traffic opens up, and I accelerate again up to the speed limit. And, uh, maybe it peters off up there. So, you can look at this as as um the speed versus time. I'm getting faster and faster because I'm continually accelerating. And if I hit the brakes, it go the other way. So the rate of change of speed with respect to time is nothing but acceleration. How fast are we accelerating? The acceleration is the area between the start point of x and the end point of delta x. Uh so we can calculate a simple if you had x and delta x. We can put a line there and that slope of the line is our acceleration. Now that's pretty easy when you're doing linear algebra. But I don't want to know it just for that line and those two points. I want to know it across the whole of what I'm working with. That's where we get into calculus. So when we talk about the distance between x and delta x, it has to be the smallest possible near to zero in order to approximate the acceleration. Uh so the idea is that instead of I mean if you ever did took a basic calculus class, they would draw bars down here and you would divide this area up um let's go back up a screen. you divide this area of this time period up into maybe 10 sections and you'd use that and you could calculate the acceleration between each one of those 10 sections kind of thing. Uh and then we just keep making that space smaller and smaller until delta x is almost uh infantismally small. And so we get a function of a uh equals a limit as h goes to zero of a function of a plus h minus a function of a over h. And that is you're computing the slope of the line. We're just computing that slope under smaller and smaller and smaller samples. Uh and that's what calculus is. Calculus is the integral. You can see down here we have our nice uh integral sign. Looks like a giant s. And that's what that means is that we've taken this down to as small as we can for that sampling. Uh so we're talking about calculus. for finding the area under the slope is the main process in the integration. Similar small intervals are made of the smallest possible length of x plus delta x where delta x approaches almost an infantismly small space. And then it helps to find the overall acceleration by summing up all the lengths together. Uh so we're summing up all the accelerations from the beginning to the end. And so here's our integral. we sum of a of x * d of x = a + c. Uh that is our basic calculus here. So when we talk about multivvariant calculus, uh multivariate calculus deals with functions that have multiple variables and you can see here we start getting into some very complicated equations. Um uh change in w over change of time equals change of w over change of z. the differential of z to dx, differential of x to dt, it gets pretty complicated. Uh, and it really translates into the multivariate integration using double integrals. And so you have the the sum of the sum of f ofx of y of d of a equals the sum from c to d and a to b of f ofxy dx dy equals uh the sum of a to b sum of c to d of fxy dy dx. understanding the very specifics of everything going on in here and actually doing the math is usually calculus one, calculus 2, and differential equations. Uh so you're talking about three fulllength courses to dig into and solve these math equations. What we want to take from here is we're talking about calculus. Uh we're talking about summing of all these different slopes. And so we're still solving a linear uh expression. We're still solving y = mx + b. But we're doing this for infantismally small x's. And then we want to sum them up. That's what this integral sign means. The the sum of a of x d ofx= a plus c. And when you see these very complicated uh multivariate differentiation using the chain rule uh when we come in here and we have the change of w to the change of t equals the change of w dz uh and so forth. That's what's going on here. That's what these means. We're basically looking for the area under the curve which really comes to how is the change changing and speed's going up. How is that changing? And then you end up with a multiple layer. So if I have three layers of neural networks, how is the third layer changing based on the second layer changing which is based on the first layer changing? And you get the picture here that now we have a very complicated uh multivariate integration um with integrals. The good news is we can solve this uh mathematically and that's what we do when you do neural networks and reverse propagation. Uh so the nice thing is that you don't have to solve this on paper unless you're a data analysis and you're working on the back end of integrating these formulas and building the script to actually build them. So we talk about applications of calculus. Uh it provides us the tools to build an accurate predictive model. Um so it's really behind the scenes we want to guess at what the change of the change of the change is. That's a little goofy. I I know I just threw that out there. It's kind of a meta term. But if you can guess how things are going to change, then you can guess what the new numbers are. Multivariate calculus explains the change in our target variable in relation to the rate of change in the input variables. So there's our multiple variables going in there. If uh one variable is changing, how does it affect the other variable? And then in gradient descent, calculus is used to find the local and global maxima. And this is really big. Uh we're actually going to have a whole section here on gradient descent because it is really I mean I talked about neural networks and how you can see how the different layers go in there but gradient descent is one of the most key things for trying to guess the best answer to something. So let's take a look at the code behind gradient descent. And uh before we open up the code, let's just do real quick uh gradient descent. Let's say we have a curve like this. And most common is that this is going to represent your error. Oops. Error. There we go. Error. Ah, hard to read there. And I want to make the error as low as possible. And so what I'm looking at it is I want to find this line here which is the minimum value. So we're looking for the minimum and it does that by uh sampling there and then it based on this it guesses it might be someplace here and it goes hey this is still going down. It goes here and then goes back over here and then goes a little bit closer and it's just playing a high low until it gets to that spot, that bottom spot. And so we want to minimize the error in uh on the flip note, you could also want to be maximizing something. You want to get the best output of it. Uh that's simply uh minus the value. Uh so if you're looking for where the peak is, this is the same as a negative for where the valley is and looking for that valley. Uh that's all that is and this is a way of finding it. So the cool thing is um all the heavy lifting's done. Um I actually ended up putting together one of these a while back is uh when I didn't know about sidekick and I was just starting boy it's a long while back and uh is playing high low. How do you play high low, not get stuck in the valleys, uh figure out these curves and things like that? Well, you do that and the back end is all the calculus and differential equations to calculate this out. The good news is you don't have to do those. Uh so instead, we're going to put together the code and let's go ahead and see what we can do with that. So, uh, guys in the back put together a nice little piece of code here, which is kind of fun. Uh, some things we're going to note, and this is this is really important stuff because when you start doing your data science and digging into your machine learning models, uh, you're going to find these things are stumbling blocks. Uh, the first one is current X. Where do we start at? Uh, keep in mind your model that you're working with is very generic. So whatever you use to minimize it, the first question is where do we start? Um and we started at this because the algorithm starts at x= 3. So we arbitrarily picked five. Learning rate is uh how many bars to skip going one way or the other. Uh in fact, I'm going to separate that a little bit because these two are really important. Um if we're dealing with something like this where we're talking about um uh well, here's our here's the function. We're going to use our um gradient of our function. um 2 * x + 5. Keep it simple. So that's a function we're going to work with. So if I'm dealing with increments of a th00and 0.1 is going to be a very long time. And if I'm dealing with increments of 0.001, uh 0.1 is going to skip over my answer. So I won't get a very good answer. Um and then we look at precision. This tells us when to stop the algorithm. So again, very specific to what you're working on. uh if you're working with money and you don't convert it into a float value, uh you might be dealing with 0.01, which is a penny. That might be your precision you're working with. Um and then of course the previous step size, max iterations, uh we want something to cut out at a certain point. Usually that's built into a lot of minimization functions. And then here's our actual uh formula we're going to be working with. And then we come in, we go while previous step size is greater than precision and its is less than max its say that 10 times fast. Um, we're just saying if it's uh if we're if we're still greater than our precision level, we still got to keep digging deeper. Um, and then we also don't want to go past a thou or whatever this is, a million or 10,000 uh running. That's actually pretty high. um almost never do max iterations more than like 100 or 200. Rare occasions you might go up to four or 500 if it's depending on the problem you're working with. Uh so we have our previous equals our current. That way we can track timewise. Uh the current now equals the current minus the rate times the formula of our previous x. So now we've generated our new version. Uh previous step size equals the absolute current previous. Uh, so we're looking for the change in x itters equals iterations + one. That's how we know to stop if we get too far. And then we're just going to print the local minimum occurs at x on here. And if we go ahead and run this, uh, you can see right here it gets down to this point and it says, hey, um, local minimum is minus 3.3222 for this particular series we created. Uh, and this is created off of our formula here. lambda x2 * x + 5. Now, when I'm running this stuff, uh you'll see this come up a lot and uh with the sklearn kit and and one of the nice reasons of breaking this down the way we did is I could go over those top pieces. Uh those top pieces are everything when you start looking at these minimization toolkits and built-in code. And so from um we'll just do it's actually docs.cipi.org and we're looking at the scikit. There we go. Um optimize minimize. You can only minimize one value. You have the function that's going in. This function can be very complicated. Uh so we used a very simple function up here. It could be there's all kinds of things that could be on there. And there's a number of methods to solve this as far as how they shrink down. Uh and your x knot. There's your there's your start value. So your function, your start value. Um there's all kinds of things that come in here that we can look at which we're not going to. Um optimization automatically creates constraints bounds. Some of this it does automatically, but you really the big thing I want to point out here is you need to have a starting point. You want to start with something that you already know is mostly the answer. Uh if you don't, then it's going to have a heck of a time trying to calculate it out. Or you can write your own little script that does this and and does a high low guessing and tries to find the max value. That brings us to statistics. What this is kind of all about is figuring things out. Lot of vocabulary and statistics. Uh so statistics, well, I guess it's all relative. It's definitely not an ed class. Uh so a bunch of stuff going on. Statistics statistics concerns with the collection, organization, analysis, interpretation and presentation of data. That is a mouthful. Um so we have from end to end we're does it come from is it valid? What does it mean? How do we organize it? Um how do we analyze it? Then you got to take those analysis and interpret it into something that uh people can use. kind of reduce it to understandable. Um, and nowadays you have to be able to present it. If you can't present it, then no one else is going to understand what the heck you did. So, we look at the terminologies. Uh, there is a lot of terminologies depending on what domain you're working in. So clearly if you're working in um a domain that deals with viruses and tea cells and and how does you know where does that come from and you're studying the different people then you're going to have a population. if you are working with um mechanical gear um you know a little bit different if you're looking for the wobbling statistics uh to know when to replace a rotor on a machine or something like that uh that can be a big deal. You know, we have these huge fans that turn in our sewage processing systems. And so those fans, they start to wobble and hum and do different things that the sensors pick up. At one point, do you replace them? Instead of waiting for it to break, in which case it cost a lot of money. Instead of replacing a bushing, you're replacing the whole fan unit. Uh an interesting project that came up for our city a while back. Uh so population, all objects are measurements whose properties are being observed. Uh so that's your population, all the objects. It's easy to see it with people because we have our population and large. Um but in the case of the sewer fans, we're talking about how the fan units. That's the population of fans that we're working with. You have a parameter, a matrix uh that is used to represent a population or characteristic. You have your sample, a subset of the population studied. You don't want to do them all because then you don't have a if you come up with a conclusion for everyone, you don't have a way of testing it. So you take a sample. Uh sometimes you don't have a choice. You can only take a sample of what's going on. You can't u study the whole population. And a variable, a metric of interest for each person or object in a population. Types of sampling. We have probabilistic approach. uh selecting samples from a larger population using a method based on the theory of probability and we'll go into a little bit more deeper on these. We have random systematic stratified and then you have nonprobabilistic approach selecting samples based on the subjective judgment of the researcher rather than random selection. Uh it has to do with convenience trying to reach a quota um or snowball. Uh and they're very biased. That's one of the reasons you'll see this big stamp on it says biased. Uh so you got to be very careful on that. So probabilistic sampling. Uh when we talk about a random sampling, we select random size samples from each group or category. So we it's as random as you can get. Uh we talk about systematic sampling. We're selecting randomsiz samples from each group or category with a fixed periodic interval. Uh so we kind of split it up. This would be like a time setup or our different categories. And you might ask your question, what is a category or a group? Uh if you look at I'm going to go back a window. Let's say we're studying um economics of different of an area. Um we know pretty much that based on their culture, where they came from, they might need to be separated. And so uh and when I say separated, I don't mean separated from their their uh place where they live. I mean, as far as the analysis, we want to look at the different groups and make sure they're all represented. So, if we had like an 80% uh of a group that is uh say Hispanic and or Indian and also in that same area, we have 20% 20% who are it's called our expatriots. They left America and they're nice and uh your Caucasian group. We might want to sample a group that is representative of both. Uh, so we're talking about stratified sampling and we're talking about groups. Those are the groups we're talking about. And it brings us to stratified sampling, selecting approximately equal-sized samples from each group or category. Uh, this way we can actually separate the categories and give us an insight into the different cultures and how that might affect them in that area. Uh so you can see these are very very different kind of depends on what you're working with um as far as your data and what you're studying. And so we can see here just to go a little bit more we'd have selecting 25 employees from a company of 250 employees randomly. Don't care anything about them. What groups they're in, which office are in nothing. Uh and we might be selecting one employee from every 50 unique employees in a company of 250 employees. And then we have selecting one employee from every branch in the company office. So we have all the different branches. There's our group or our categories by the branch. And the category could depend on what you're studying. So it has a lot of variation on there. You see this kind of grouping and categorizing is also used to generate a lot of misinformation. Uh so if you only study one group and you say this is what it is, then everybody assumes that's what it is for everybody. And so you got to be very careful of that and it's very unethical thing to kind of do. So types of statistics uh we talk about statistics we're going to talk about descriptive and inferential statistics. There are so many different terms in statistics to break it up. Uh so we so we're talking about a particular setup. So we're talking about descriptive and inferential uh statistics. you, the base of the word describe is pretty solid. You're describing the data. What does it look like with inferial statistics, we're going to take that from the small population to a large population. So, if you're working with a drug company, uh you might look at the data and say these people were helped by this drug. They did uh 80% better as far as their health or 80% better survival rate than the people um who did not have the drug. So we can infer that that drug will work in the greater populace and will help people. So that's where you get your inferential. Uh so we are predicting how it's going to affect the greater population. So descriptive statistics it is used to describe the basic features of data and form the basis of quantitative analysis of data. So we have a measure of central tendencies. We have your mean, median and mode. And then we have a measure of spread like your range, your interquartile range, your variance and your standard deviation. And we're going to look at all these a little deeper here in a second. Uh but one of them you can think of is um how the data difference differences you know what's the max min range all that stuff is your spread. And anything that's just a single number is usually your central tendencies measure of central tendencies. So we talk about the mean. It is the average of the set of values considered. What is the average outcome of whatever is going on? And then your median separates the higher half and the lower half of data. Uh so where's the center point of all your different data points? So your mean might have some a couple really big numbers that skew it uh so that the average is much higher than if you took those outliers out where the median would by separating the high from the low might give you a much lower number. You might look at and say, "Oh, that's that's odd. Why is the average so much higher than the median?" Well, it's because you have some outliers or why is it so much lower? And then the mode is the most frequent appearing value. Uh this is really interesting. If you're studying economics and how people are doing, you might find that the most common um income like in the US was at one point 24,000 a year where the average was closer to 80,000 and it's like wow what a difference. Well, there's some people have a lot of money and so that skews that way up. So the average person is not making that kind of money. And then you look at the median income and you're like well the median income is a little bit closer to the average. Uh so it does create a very interesting way of looking at the data. Again these are all uh central tendencies single numbers you can look at for the whole spread of the data. And we look at the measure of central tendencies. The mean is the average marks of a students in a classroom. So here we have the mean sum of the marks of the students total number of students. And as we talked about the median, uh if we have zero through 10 and we take half the numbers and put them on one side of the line, half the numbers on the other side of the line, uh we end up with five in the middle. And then the mode, what mark was scored by most of the students in a test. In a simple case where most people scored like an 82% and got certain problems wrong, easy to figure out. uh not so easy when you have different areas where like you have like the um oh let's go back to economy a little bit more difficult to calculate if you have a large group that scores that makes 30,000 and a slightly bigger group that makes 26,000. So what do you put down for the mode? Uh certainly there's a number of ways to calculate that and there's actually a different variations depending on what you're doing. So now we're looking at a measure of spread uh range. What's the difference between the highest and the lowest value? First thing you want to look at, you know, it's we had everybody in the test scored between 60 and 100%, somebody got 100% or maybe 60 to 90%. It was so hard that a lot of people could not get 100%. Um, and you have your interquartile range. Quartortiles divide a rankorder data set into four equal parts. Very common thing to do as part of all the basic packages. Whether you're working in uh data frames with pandas, whether you're working in scala, whether you're working in R, um you'll see this come up where they have range, your min, your max, and then it'll have your interquartile range. How does it look like in each quarter of data? Variance measures how far each number in the set is from the mean and therefore from every other number in the set. uh so you have like a how much turbulence is going on in this data and then the standard deviation it is the measure of the variance or the dispersion of a set of values from the mean and you'll usually see uh if I'm doing a graph I might have the value graphed um and then based on the the error I might graph graph the standard deviation and the error on the graph as a background so you can see how far off it is uh so standard deviation is used a lot so measurement going to spread uh marks of a student out of a 100 uh we have here from 50 to 63 or 50 to 90. Uh so the range maximum marks minimum marks we have 90 to 45 and the spread of that is 45 90 - 45 and then we have the interquartile range using the same marks over there. You can see here where the median is and then there's the first quarter, the second quarter and the third quarter based on splitting it apart by those values. And to understand the variance and standard deviation, we first need to find out the mean. Uh so here's our our you know calculating the average there. We end up at approximately 66 for the average. And then we look at that the variance once we know the means we can do equals the marks minus the mean squared. Why is it squared? Uh because one, you want to make sure it's you don't have like if you if you're putting all this stuff together, you end up with an error as far as one's negative, one's positive, one's a little higher, one's a little lower. Uh so you always see the squared value and over the total observations. And so the standard deviation equals the square root of the variance, which is approximately 16. And if you were looking at um a predictable model, you would be looking at the deviation based on the error. How much error does it have? Uh that's again really important to know if you're if your prediction is predicting something, what's a chance of it being way off or just a little bit off. Now that we've looked at the um tools as far as some of the basics for doing your statistics and what we're talking about, let's go ahead and pull up a little demo and show you what that looks like in Python code. Uh so you can get some little hands-on here. For that, let's go back into our Jupyter notebook in Python. Now, almost all of this you can do in numpy. Last time we worked um in numpy. This time we're going to go ahead and use pandas. And if you remember from pandas on here, uh this is basically a data frame, rows, columns. Let's just go ahead and do a print df. head and run that. And you can see we have uh the name Jane, Michael, William, Rosie, Hannah, and their salaries on here. And of course, instead of having to do all those hand calculations and add everything together and divide by the total, we can do something very simple on this uh like use the command mean in pandas. And so if I go ahead and do this print df, pick our column salary because we want to find the means of that colery. We want to find the means of that column. Uh and we go and print this out. And you can see that the uh average income on here is 71,000. Uh, and let's just go ahead and do this. We'll go ahead and put in uh means. And if we're going to do that, we also might want to find the median. And the median is uh very similar except it actually is just median. Uh we're used to means and average. It's kind of interesting that those are they use the two different words. Uh there can be in some computations slight differences but for the most part the means is the average. Uh and then the median oops let's put a median here. DF salary that way it displays a little better. We can see the median is 54 um thousand. So the halfway mark is significantly below the average. Why? Because we have somebody in here who makes 189,000. Darn you, Rosie, for throwing off our numbers. U but that's something you'd want to notice. This is this is the difference between these is huge and so is what is the meaning behind that when you're studying a populace and looking at uh the different data coming in. And of course, we also want to find out, hey, what's the most uh common income that people make in this little tiny sample? And so we'll go ahead and do the mode. And you can see here with the mode uh it's at 50,000. So this is this is very telling that most people are making 50,000. The middle point is at 54,000. So half the people are making more than that. What that tells me is that if the most common income is way is below the median then there's a few there's a you know there's a a lot of high salaries going up but there's some really low salaries in there. And so this trend which is very common in statistic you when you're analyzing the economy and different people's income is pretty common and the bigger difference between these is also very important when we're studying statistics. Uh and when you hear someone just say hey the average income was you might start asking questions at that point. Why aren't you talking about the median income? Why aren't you talking about the mode the most common income? What are you hiding? Uh and if you're doing these analysis, you should be looking at these saying, "Hey, why why are this discrepancies? Why are these so different?" And of course, with any uh analysis, it's important to find out the minimum and the maximum. So, we'll go ahead. It's just simply uh um min'll pull up your minimum and then do max pulls up the maximum. pretty straightforward on as far as um translating it and knowing what your you know what the your lowest value and what your highest value is here. Um which you'll use to generate like a spread later on. And real quick on no mode mode uh note that it puts mode zero. Like I said, there's a couple different ways you can compute the mode. Um although you know standard one's pretty good. We can of course do the range which is your max minus your min. So now we have a range of 149,000 between the upper end and the lower end. And you might want to be looking up the individual values on all of these. But it turns out there is a describe feature in pandas. And so in pandas we can actually do df salary describe. And if we do this you can see we have that there's seven uh setups. Here's our mean. Um, our standard deviation, which we didn't compute yet, which would just be a STD. And you got to be a little careful because when it computes it, it looks for axes and things like that. Uh, we have our minimum value, and here's our cortiles, uh, our maximum value, and then of course the name salary. Uh, so these are the these are the basic statistics. You can pull them up and just describe. This is a dictionary. So, I could actually do something like um in here I could actually go uh count and run. And now it just prints the count. Uh so, because this is a dictionary, you can pull any one of these values out of here. It's kind of a quick and dirty way to pull all the different information and then split it up depending on what you need. Now, if I just walked in and gave you this information um in a meeting, at some point you would just kind of fall asleep. That's what I would do anyway. Um, so we want to go ahead and and see about graphing it here. And we'll go ahead and put it into a histogram and plot that graph on it of the salaries. And let's just go ahead and put that in here. So we do our map plot inline. Remember that's a Jupiter's notebook thing. Uh, a lot of the new version of the mattplot library does it automatically, but just in case I always put it in there. Uh, import mattplot library piplot as plt. That's my plotting. And then we have our data frame. Uh I don't I guess I really don't need to respell the data frame. Maybe we could just remind oursel what's in it. So we'll go ahead and just uh print DF. That way we still have it. And then we have our salary. DF salary salary.plot history title salary distribution color gray. Uh plot AXV line salary the mean value. So, we're going to take the mean value um color violet line style dash. This is just all making it pretty. Uh what color dash line width of two, that kind of thing. And the median. And let's go ahead and run this just so you can see what we're talking about. And so up here we are taking on our plot. Um so here's the data. Here's our our data frame printed out so you can see it with the salaries. We're looking at the salary distribution and just look at this the way they're the salary is distributed. Um you have our in this case we did let's see we had red for the median. We have violet for our average or mean and you can just see how it really here's our outlier. Here's our person who makes a lot of money. Here's the um average and here's the median. Um, and so as you look at this, you can say, "Wow." Um, based on the average, it really doesn't tell you much about what people are really taking home. All it does is tell you how much money is in this, you know, what the average salary is. So, some of the things you want to take away in addition to this is that it's very easy to plot um an AXV line. These are these up and down lines for your markers. Um, and as you display display the data, I mean, you can add all kinds of things to this and get really complicated. Keeping it simple is pretty straightforward. I look at this and I can see we have a major outlier out here. We can definitely do a histogram and stuff like that. Um, but you know, picture's worth a thousand words. What you really want to make sure you take away is that we can do a basic describe which pulls all this information out and we can print any of the individual information from the describe uh because this is a dictionary. And so if we want to go ahead and look up um the mean value, we can also do describe mean. So if you're doing a lot of statistics, uh being able to doesn't have the print on there, so it's only going to print um the last one, which happens to be the mean. Uh you can very easily reference any one of these. And then you can also if you're doing something a little bit more complicated and you don't need just the basics, you can come through and pull any one of the individual um references from the from the pandas on here. So now we've had a chance to describe our data. Uh let's get into inferential statistics. Inferial statistics allows you to make predictions or inferences from data. And you can see here we have a nice little picture movie ratings and um if we took this group of people and said hey how many people like the movie dislike it can't say and then you ask just a random person who comes out of the movie who hasn't been in this study uh you can infer that 55% chance of saying liked 35% chance of saying disliked or a 10 or 11% chance of can't say. So that that's real basics of what we're talking about is you're going to infer that the next person is going to follow these statistics. Uh so let's look at point estimation. Uh it is a process of finding an approximate value for a population's parameter like mean or average from random samples of the population. Let's take an example of testing vaccines for COVID 19. Uh vaccines and flu bugs, all that. It's a pretty big thing of how do you test these out and make sure they're going to work on the populace. A group of people are chosen from the population. Medical trials are performed. Results are generalized for the whole population. So here's a protected here's our small group up here where we've selected them. We run medical trials on them and then the results work for the population. You nice diagram with the arrows going back and forth and the very scary co virus in the middle of one. And let's take a look at the applications of inferial statistics. Very central is what they call hypothesis testing uh and the confidence interval which go with that. And then as we get into probability, we get into our binomial theorem, our normal distribution and central limit theorem. Hypothesis testing. Hypothesis testing is used to measure the plausibility of a hypothesis assumption by using sample data. Now when we talk about theorems, theory, hypothesis, uh keep in mind that if you are in a philosophy class, theory is the same as hypothesis where theorem is a scientific uh statement that is something that has been proven although it is always up for debate because in science we always want to make sure things are up to debate. So a hypothesis is the same as a phil philosophical class calling a theory where theory in science is not the same. Theory in science says this has been well proven. Gravity is a theory. Uh so if you want to debate the theory of gravity try jumping up and down. If you want to have a theory about why the economy is collap collapsing in your area that is a philosophical debate. Very important. I've heard people mix those up and it is a pet peeve of mine. When we talk about hypothesis testing, the steps involved in hypothesis testing is first we formulate a hypothesis. We figure out the right test to test our hypothesis. We execute the test and we make a decision. And so when you're talking about hypothesis, you're usually trying to disprove it. If you can't disprove it and it works for all the facts, then you might call that a theorem at some point. So in a use case, uh let's consider an example. We have four students. were given a task to clean a room every day. Sounds like working with my kids. They decided to distribute the job of cleaning the room among themselves. They did so by making four chits which has their names on it and the name that gets picked up has to do the cleaning for that day. Rob took the opportunity to make chits and wrote everyone's name on it. So here's our four people, Nick, Rob, Imlia, Imlia, and Summer. Now Rick, I imlia and summer are asking us to decide whether Rob has done some mischief in preparing the chits i.e whether Rob has written his name on one of the chit. For that we will find out the probability of Rob getting the cleaning job on first day, second day, third day and so on till 12 days. The probability of Rob getting the job decreases every day. I.e. his turn never comes up. Then definitely he has done some mischief while making the chits. So the probability of Rob not doing work on day one is uh three out of four. There's a 75 chance that he didn't do work. Uh two days 34s * 34s equals.56. 3 days you have 34 34 34s which equals42. Uh when you get to day 12 it's 0032 which is less than 0.05. Remember this 0.05 005. Uh, that comes up a lot when we're talking about um certain values when we're looking at statistics. Rob is cheating as he wasn't chosen for 12 consecutive days. That's a very high probability when on day 12 he still hasn't gotten the job cleaning the room. So, we come up to our important important terminologies. We have null hypothesis. A general statement that states that there is no relationship between two measured phenomenon or no assoc association among the groups. Alternative hypothesis contrary to the null hypothesis it states whenever something is happening a new theory is preferred instead of an old one. And so the two hypothesis go hand in hand. Uh so your null this is always interesting in in we're talking about data science and the math behind it. It's about proving that the things have no correlation. Null hypothesis says these two have zero relation to each other. Where the alternative hypothesis says, hey, we found a relation. This is what it is. We have p value. The p value is a probability of finding the observed or more extreme results when the null hypothesis of a study question is true. And the t value, it is simply the calculated difference represented in units of standard error. The greater the magnitude of t, the greater the evidence against the null hypothesis. And you can look at the t value as being specific to the test you're doing where the p value is derived from your t value and you're looking for what they call the 5% or the 0.05 showing that it has a high correlation. So digging in deeper, let's assume that a new drug is developed with the goal of lowering the blood pressure more than the existing drug. And this is a good one because uh the null value here isn't that you don't have any drug. The null value here is that it's better than the existing drug. The new drug doesn't lower the blood pressure more than the existing drug. Now if we get that uh that says our null hypothesis is correct. There is no correlation and the new drug is not doing its job. The alternative hypothesis the new drug does significantly lower the blood pressure more than the existing drug. Uh yay we got a new drug out there. And that's our alternative hypothesis or the H1 or HA. And we look at the p value results from the evidence like medical trials showing positive results which will reject the null hypothesis. And again they're looking for um a 0.05 or 5%. And the t value comparing all the positive test results and finding means of different samples in order to test hypothesis. So this is specific to the test. How uh what percentage of increase did they have? And this leads us to the confidence intervals. Uh a confidence interval is a range of values we are sure our true values of observations lie in. Let's say you asked a dog owner around you and asked them how many cans of food do you buy for your uh per year for your dog. Through calculations you got to know that the on average around 95% of the people bought around 200 to 300 cans of food. Hence we can say that we have a confidence interval of 230 where 95% of our values lie in that spread data spread. Uh and this the graph really helps a lot. So you can start seeing what you're looking at here where you have the 95% you have your peak in this case it's a normal distribution. So you have the nice bell curve equal on both sides. It's not asymmetrical. And 95% of all the values lie within a very small range. And then you have your outliers the 2.5% going each way. So we touched upon hypothesis. Uh we're going to move into probability. Uh so you have your hypothesis. Once you've generated your hypothesis, we want to know the probability of something occurring. Probability is a measure of the likelihood of an event to occur. Any event can be predicted with total certainty and can only be predicted as a likelihood of its occurrence. So any event cannot be predicted with total certainty. It can only be predicted as a likelihood of its occurrence. Uh score prediction. how good you're going to do in whatever sport you're in, weather prediction, stock prediction, if you've studied physics and chaos theory, even the location of the chair you're sitting on has a probability that it might move 3 ft over. Granted, that probability is one in like uh I think we calculated as under one in trillions upon trillions. So, it's the better the probability, the more likely it's going to happen. There are some things that have such a low probability that we don't see them. So we talk about a random variable. Uh random variable is a variable whose possible values are numerical outcomes of a random phenomena. So uh we have the coin toss. How many heads will occur in the series of 20 coin flips? Probably you know the on average there are 10, but you really can't know because it's very random. How many times a red ball is picked from a bag of balls if there's equal number of of red balls and blue balls and green balls in there. How many times the sum of digits on two dice uh result or five each? Um so you know there's how often you going to roll two fives on your pair of dice. So in a use case uh let's consider the example of rolling two dice. We have a random variable outcome equals y. You can take values 2 3 4 5 6 7 8 9 10 11 12. So we have a random variable and a combination of dice and instead of looking at how many times um both dice will roll five. Let's go ahead and look at a total sum of five and you have in as far as your random variables you can have a one four equals 5 4 1 2 3 32. So four of those rolls can be four if you look at all the different options you have four of those random rolls can be a five. And if we look at the total number, which happens to be 36 different options, uh you can see that we have four out of 36 chance every time you roll the dice that you're going to roll a total of five. You're going to have an outcome of five. And uh we'll look a little deeper as to what that means. Uh but you could think of that at what point if someone never rolls a five or they always roll a five, can you say, "Hey, that person's probably cheating." uh we'll look a little closer at the math behind that but let's just consider this as one of the cases is rolling two dice and gambling. There's also a binomial distribution. It is the probability of getting success or failure as an outcome in an experiment or trial that is repeated multiple times and the key is is by meaning two binomial uh so passing or failing an exam winning or losing a game and getting either head or tails. So if you ever see binomial distribution, it's based on a um true false kind of setup. You win or lose. Let's consider a use case and let's consider the game of football between two clubs. Barcelona and Dortmund. The teams will have to play a total of four matches and we have to find out the chances of Barcelona winning the series. So we look at the total games and we're looking at five different games or matches. Let's say that the winning chance for Barcelona is 75% or 75. That means at each game they have a 75% chance that they're going to win that game and losing chances are 25% or 0.25. Clearly 75 plus 0.25 equals 1. So that accounts for 100% of the game. Probability for getting K wins in n matches is calculated. And we we're talking like so if you have five games uh and you want to know if I play um how many wins in those five games should I get? What's a percentage on those? And the probability for getting k wins in n matches is calculated by px= k= n k p the k q to the n minus k. Here p is the probability of success and q is the probability of failure. And so we can do total games of n= 5 where k equals 01 2345. P which is the chance of winning is 75. Q the chance of losing equals 1 minus p which equals 1 - 0075 which equals.25. The probability that Barcelona will lose all of the matches can then just plug in the numbers and we end up with a009765625. So very small chance they're going to lose all their matches. And we can plug in uh the value for two matches. Probability that Barcelona will win at least two matches is 00878. And of course we can go on to probability that Barcelona will win three matches the 26 and of course four matches and so on. And it's always nice to take this information um and let let's find the cumulative discrete probabilities for each of the outcomes where Barcelona has won three or more matches x= 3 x= 4 x= 5 and we end up with the p =264 plus 395 + 237 which equals89. In reality the probability of Barcelona winning the series is much higher than 75. And it's always nice to uh put out a nice graph so you can actually see the number of wins to the probability and how that pans out with our binomial case. Continuing in our important terminology, location. The location of the center of the graph depends on the mean value. And uh this is some very important things. So much of the data we look at and when you start looking at probabilities almost always has a normalized look like the graph in the middle. uh but you do have left skewed where the data is skewed off to the left and you have more stuff happening off to the left and you have right skewed data and so when this comes up and these probabilities come up where they're skewed it's really important to take a closer look at that uh mostly you end up with a normalized set of data but you got to also be aware that sometimes it's a skewed data and then the height height of the slope inversely depends upon the standard deviation so you can see down here the standard deviation is really large it kind of squished pushes it out. And if the standard deviation is small, then most of your data is going to hit right there in the middle. You're going to have a nice peak. Um, and so being aware of this that you might have a probability that fits certain data, but it has a lot of outliers. So you're if you have a really high standard deviation, um, if you're doing stock market analysis, this means your predictions are probably not going to make you much money. uh where if you have a very small deviation, you might be right on target and set to become a millionaire. Which leads us to the zcore. Zcore tells you how far from the mean a data point is. It is measured in terms of standard deviations from the mean. Around 68% of the results are found between one standard deviation. Around 95% of the results are found between two standard deviations. And you read the symbols. Of course, they love to throw some Greek letters in there. we have mu minus 2 sigma. Mu is just a quick way. It's that kind of funky u. It just means the mean. Uh and then the sigma is the standard deviation and that's the o with a little arrow off to the right or the little waggly tail going up. The o with a with a line on it. Uh so mu minus 2 sigma is your uh 95% of the results are found between two standard deviations. Central limit theorem. This goes back to the skew. If you remember, we were looking at the skew values on this previous slide. Have left skewed, normalized, and right skewed. When we're talking about it being skewed or not skewed, the distribution of the sample means will be approximately normally distributed, evenly distributed, not skewed. If you take large random samples from the population with the mean mu and the standard deviation sigma with replacement and you can see here um uh of course we have our uh mu minus 2 sigma and the spread down here the mean the median and the mode and so when you're talking about very large populations these numbers should come together and you shouldn't have a skewed value. If you do that's a flag that something's wrong. That's why this is so important to be aware of what's going on with your data, where your samples are coming from and the math behind it. And if you're going to do all this, we got to jump into conditional probability. The conditional probability of an event A is the probability that the event will occur given the knowledge that an event B has already occurred. And you'll see this as B theorem. B A Y S bay. Uh, and this is read. I mean, you have these funky looking little P brackets. A B. This is the probability of A being true while B is already true. And you have the probability of B being true when A is already true. So, P B of A probability of A being true divided by the probability of B being true. And we talk about BA's theorem which occurred back in the 1800s when he discovered this. This is such an important formula and it's really it's not if you actually do the math you could just kind of do um um XY equals J K and then you divide them out and you're going to see the same math but it works with probabilities which makes it really nice. And so if you have a s you might have uh eight or nine different studies going on in different areas different people have done the studies they brought them together. Um if we look at today's co virus the virus spread uh certainly the studies done in China versus the studies the way they're done in the US that data is different in each of those studies but if you can find a place where it overlaps where they're studying the same thing together you can then compute the changes that you need to make in one study to make them equal and this is also true if you have a study of uh um one group and you want to find out more about it. So this formula is very powerful uh and it really has to do with the data collection part of the math and data science and understanding where your data is coming from and how you're going to combine different studies in different groups and we'll go and go into a use case. Uh let's find out the chance of a person getting lung disease due to smoking. Uh and this is kind of interesting the way they word this. Um let's say that according to medical report provided by the hospital states that around 10% of all patients they treated suffered lung lung disease. Uh so we have kind of a generic medical report. They further found out uh by a survey that 15% of the patients that visit them smoke. So we have 10% that are lung disease and um 15% of the patients smoke. And finally, 5% of the people continued smoke even when they had lung disease. Uh not the brightest choice. Um but you know it is an addiction so it can be really difficult to kick. And so we can look at the probability of a uh prior probability of 10% people having lung disease. And then probability b probability that a patient smokes is 15%. Uh and the probability of B um if B then A. The probability of a patient smokes even though they have lung disease is 5%. And probability of A is B. Probability that the patient will have lung disease if they smoke. And then when you put the formulas together uh you get a nice solution here. You get the probability of A of B, probability that the patient will have lung disease if they smoke. And you can just plug the numbers right in and we get a 3.33% chance. Hence, there is a 3.33% chance that a person who smokes will get a lung disease. So, we're going to pull up a little Python code. Always my favorite. Roll up the sleeves. Keep in mind, we're going to be doing this um kind of like the backend way so that you can see what's going on. And then later on we're going to create um we'll get into another demo which shows you some of the tools that are already pre-built for this. Let's start by creating a set. So we're going to create a set with curly braces. This means that our set has um only unique values. So you have a list uh you have your tupils which can never change and then you have um in this case the the set. So 47, you can't create a 47, 4. It'll delete the four out. So it's only unique values. And if you use dictionaries, quick reminder, this should look familiar because it is a dictionary uh where you have a value and that value is assigned to or that key is assigned to a value. Uh so you could have a key value set up as a dictionary. So it's like a dictionary without the value. It's just the keys and they all have to be unique. And if we run this, we have a set of 47. We can also take a list, a regular um setup. And I'm going to go ahead and just throw in another number in here, four, and run it. Uh, and you can see here if I take my list 1 2 3 4, and I convert it to a set, and here it is. My set from list equals set my list. The result is 1 2 3 4. So, it just deletes that last four right out of there. And with the sets, you can also go in there and um print here is my set. My set uh three is in the set. And then if you do three in my set, that's going to be a logic function. Uh and one in my set, six is not in the set and so forth. If we run this, we get uh three is in the set true one is in the set false because 357 is another one. Six is in the set uh six is not in the set. So not in my set. You can also use this with a list. We could have just used 357 and it would have um the same response on there is three and usually do if three is in but three in my set is still works on a just a regular list. And we'll go ahead and do a little iteration. We're going to do kind of the dice one. Remember, um, uh, 1 2 3 4 5 six. And so, we're going to bring in an iteration tool and import product as product. And, uh, I'll show you what that means in just a second. So, we have our two dice. We have dice A, and it's going to be a set of values. Um, they can only have one value for each one. That's why they put it in a set. And if you remember from range, it is up to seven. So this is going to be 1 2 3 4 5 six. It will not include the seven. And the same thing for our dice B. And then we're going to do is we're going to create a list which is the product of A and B. So what's U A + B? And if we go ahead and run this uh it'll print that out. And you'll see um in this case when they say product because it's an iteration tool, we're talking about creating a tupole of the two. So we've now created a tupole of all possible outcomes of the dice where dice A is 1 2 3 1 6 and dice B is 1 to six. And you can see 1 1 2 1 to three and so forth. You remember we had a slide on this earlier where we talked about um the different all the different outcomes of a dice. We can play around with this a little bit. Uh we can do in dice equals two divi dice faces 1 2 3 4 5 six. Uh another way of doing what we did before and then we can create an event space where we have a set which is the product of the dice faces repeat equals end dice. And we'll go and just run this. And you can see here it just again puts it through all the different possible variables we can have. And then if we wanted to take the same uh set on here and print them all out like we had before uh we can just go through for outcome and event space. Outcome end equals. So the event space is creating a sequence. And as you can see here when we print it out it stacks them versus going through and putting them in a nice line. And we'll go ahead and do something. Um, let's go print. Since we have the end printing with a comma, that just means it's just going to it's not going to hit the return going down to the next line. Uh, and we'll go ahead and do the length of our event space. Uh, that'll be an important variable we're going to want to know in a minute. And of course, if I get carried away with my typing of length, uh, we'll print it twice and it'll give me an error. Uh so we have 36 different possible variations here and we might want to calculate something like um what about the multiple of three? What if we want to have uh the probability of the multiple of three in our setup? And so uh we can put together the code for the outcome in event space of xy = outcome if x + y remainder 3. So, we're going to divide by three and look at the remainder and it equals zero. Then it's a favorable outcome and we're going to pop that outcome on the end there and we'll turn it into a set. So, the favor outcome equals a set. Not necessary uh because we know it's not going to be repeating itself, but just in case, we'll go ahead and do that. And if we want to print out the outcome, we can go ahead and see what that looks like. And you can see here these are all u multiples of three. Uh 1 + 2 is 3, 5 + 4 is 9, which divided by 3 is 3, and so forth. And just like we looked up the length uh of the one before, let's go ahead and print the length of our f outcome so we can see what that looks like. There we go. And of course, I did forget to add the print in the middle because we're looping through and putting an end on the on the setup on there. So, we're going to put the print in there. And if I run this, you can see um we end up with 12. So, we have 36 total options. Uh we have 12 that are multiple that um add up to a multiple of three. And we can easily conver compute the probability of this uh by simply taking the length of our favorable outcome over the length of the event space. And if we print it out, let me put that in there. Probability last line. So we just type it in. We end up with a 333 chance. That's roughly a third. And we might want to make this look nice. So let's go ahead and put in another line there. The probability of getting the sum which is a multiple of three is 3333. We can compute the same thing for five dice. And if we do this for five dice and go and run it, you can see we just have a huge amount of choices. So it just goes on and on down here. And we can look at the uh length of the event space. And we have over 7,776 choices. That's a lot of choices. And if we want to ask the question like we did above, uh what is the sum where the sum is a multiple of five but not a multiple of three, we can go through all of these different options. And then uh you can see here uh d1 d2 d3 d4 d5 equals the outcome. And if uh you add these all together and the division by five does not have a remainder of zero but the remainder is also of a division by three is not equal to zero. So the multiple of five is equal to zero but the multiple of three is not. We can just appin that on here and then we can look at that uh favorable outcome. We'll go ahead and set that and we'll just take a look at this. What's our length of our favorable outcome? It's always good to see what we're working with. And so we have 94 out of 776. And then of course we can just do a simple division to get the probability on here. What's the probability that we're going to roll a multiple of five when you add them together? but not a multiple of three. And so we're just going to divide those two numbers. And you can see here we get.16255 or 11.62%. And so you can really have a nice visual that this is not really complicated math right here on probabilities. Uh it's just how many options do you have and how many of those are you possibly going to be able to um come up with with the solution you're looking for. And this leads us to a confusion matrix. A confusion matrix is a table which is used to describe the performance of a classification model on a set of test data for which the true values are known. And so you'll see on the left we have the predicted and the actual and we have a negative uh false negative positive true positive um and then we have false positive and true negative. And you can think of this as your predicted model. What does that mean? That means if you divided your data and you use twothird of it to create the model, you might then test it against an actual case for the last third to see how well it comes out. How many times was it uh true positive versus uh false positive? You give a false positive response. And you can imagine in medical uh situations, this is a pretty big deal. You don't want to give a false positive. So you might adjust your model accordingly so you don't have a false positive. Say with a co virus test, it'd be better to have a false negative and they go back and get retested than to have 30% false positives where then the test is pretty much invalid. So in a use case uh like cancer prediction, let's consider an example where a cancer prediction model is put to the test for its accuracy and precision. Actual result of a person's medical report is compared with the prediction made by the machine learning model. And so you can see here here's our actual predicted uh whether they have cancer or not. You know cancer a big one. You don't want to have a u false positive. I mean a false negative. In other words, you don't want to have it tell you that you don't have cancer when you do. So that would be something you'd really be looking for in this particular domain. You don't want a false negative. Uh and this is again, you know, you've created a model, you have hundreds of people or thousands of pieces of data that come in. There's a real famous case study where they have the imagery and all the measurements they take and there's about 36 different measurements they take. And then if you run the a basic model, you want to know just how accurate it is. How many um negative results do you have that are either telling people they have cancer that don't or telling people that don't have cancer that they do? And then we can take these numbers and we can feed them into our accuracy, our precision and our recall. Uh so accuracy, precision, and recall accuracy metric to measure how accurately the results are predicted. And this is your um total um true where you got the right results. you add them together, the true positive, the true negative over all the results. So what percentage of them were accurate versus what were wrong? We talk about precision is a metric to measure how many of the correctly predicted cases are actually turned out to be positive. Uh so we have a precision on true positive. Again, if you're talking about like uh COVID testing with the viruses, uh you really want this to be a a high number. you want this true um that to be the center point where you might have the opposite if you're dealing with cancer where you want no false negatives. Uh so this is your metric on here. Precision is your test positive uh true positive plus uh false positive. And then your recall how many of the actual positive cases we were able to predict quickly with our model. Uh so test positive is the test positive plus the false negative on there. And we'll want to go ahead and do a demo on the naive baze classifier. Before I get too far into uh naive b classifier because we're going to pull it from the sklearn or the uh scikit. Um let's go ahead kind of an interesting page here for classifiers. When you go into the sklearn kit, there's a lot of ways to do classification. And I'll just zoom up in here so you can see some of the titles. Uh there's everything from the nearest neighbor linear uh but we're going to be focusing on the naive bays over here and this is just um a sample data set that they put together and you can see how some of these have a very different output. The naive base remember is set up as probably the most simplified uh calculator or um set of predictions out there. And so what we've been talking about with the true false and stuff like that where there's a uh an belief that there is a independent assumption between the features where the features are very assumed to have some kind of connection uh then we can go ahead and use that for the prediction. And so that's what we're using as a naive bay classifier versus many of the other classifiers that are out there. For this we're going to use uh the social network ads. It's a little data set on here and let me go and just open that up the file. Uh here we go. It has user ID, gender, age, estimated salary, uh purchased. And so we have you can see the user ID male 19 uh estimated salary 19,000 and purchased zero. Uh so it's either going to make a purchase or not. So look at that last one 01. We should be thinking of binomials. we should be thinking of a simple naive base classifier kind of setup. So if we close this out, we're going to go ahead and import our numpy as np. We're nice to have a a good visual of our data. So we'll put in our mapplot library. Here's our pandas, our data frame. Uh and then we're going to go ahead and import the data set. And the data set's going to be is we're going to read it from the social network ads.csv. Then we're going to print the head just so you can see it again uh even though I showed you it in the file. And X equals the data set I location uh two three values and Y is going to be the four uh column four. Let me just run this so it's a little easier to go over that. Um you can see right here we're going to be looking at uh 012 is age and estimated salary. So 2 three and that's what I location just means um that we're looking at the number versus a regular location. Uh regular location you'd actually say age and estimated salary. And then column four is did they make a purchase? They purchased something. Uh so those are the three columns we're going to be looking at when we do this. And we've gone ahead and imported these and imported the data. So now our data set is all set with this information in it. And we'll need to go ahead and split the data up. Uh so we need our from the sklearn model selection we can import train test split. Uh this does a nice job. We can set the random state so it randomly picks the data and we're just going to take uh 25% of it is going to go into the test our x test and our y test and the 75% will go to x train and y train. That way once we create our model, we can then have data to see just how accurate or how well it has performed with our um prediction. The next step in pre-processing our data is to go ahead and do feature scaling. Now, a lot of this is start to look familiar. If you've done a number of the other modules and setup, you should start noticing that we bring in our data. We take a look at what we're working with. uh we go ahead and split it up into training and testing. Uh in this case, we're going to go ahead and scale it. Scale it means we're putting it between a value of minus one and one uh or someplace in that middle ground there. This way, if you have any huge set, you don't have this huge um setup. If we go back up to here where salary, the salary is 20,000 versus age 35. Well, there's a good chance with a lot of the back-end math that 20,000 will skew the results and the estimated salary will have a higher impact than the age instead of balancing them out and letting the calculations weigh them properly. And finally, we get to actually create our naive bay model. Um, and then we're going to go ahead and import the Gazian naive bays. And the gazian is is uh the most basic one. That's what we're looking at now. It turns out though, if you go to the SK um learn kit, uh they have a number of different ones you can pull in there. There's a um Bernoli. I I've never used that one. Categorical um compliment. And here's our Gazian. Uh so there's a number of different options you can look at. Gazian when you come to the naive bays is the most commonly used. Uh so we're talking about the naive bays. That's usually what people are talking about when they when they're pulling this in. And one of the nice things about the gazian if you go to their website um to sklearn the naive bay gazian there's a lot of cool features. One of them is you can do partial fit on here. Um that means if you have a huge amount of data you don't have to process it all at on you once. you can batch it into the gazian uh NB model and there's many other different things you can do with it as far as fitting the data and how you um manipulate it. We're just doing the basics. So we're going to go ahead and create our classifier. We're going to equal the gauian NB and then we're going to do a fit. We're going to fit our training data and our training solution. So X train Y train. And we'll go ahead and run this. and it's going to tell us that it it ran the code right there. And now we have our trained classifier model. So the next step is we need to go ahead and run a prediction. We're going to do our y predict equals the classifier.predict x test. So here we fit the data and now we're going to go ahead and predict and now we get to our confusion matrix. Uh so from the sklearn matrix metrics you can import your confusion matrix just this saves you from doing all the simple math. It does it all for you. And then we'll go ahead and create our confusion metrics with the y test and the y predict. So we have our actual and we have our predicted value. And you can see from here this is the chart we looked at. Here's predicted. So true positive false positive false negative true negative. And if we go ahead and run this, there we have it. 653725. And in this particular uh prediction, we had 65 uh or predicted the truth as far as a a purchase. They're going to make a purchase and we guessed three wrong. And then we had 25 we predicted would not purchase and seven of them did. So there's our our confusion matrix. At this point, if you were uh with your shareholders or a board meeting, um you would start to hear some snoozing if they were looking at the numbers and you say, "Hey, here's my confusion mat uh matrix." So, let's go ahead and visualize the results. And we're going to pull from the map plot library colors import listed color map. Um and this is actually my machine's going to throw an error because this is being um because of the way the setup is. Uh I have a newer version on here than when they uh put together the demo. And we need our um x set and our y set which is our x train and y train. And then we'll create our x1 x2. And we'll put that into a grid. Uh and we set our x set minimum stop and our x max stop. And if you come all the way over here, we're going to step 0.01. This is going to give us a nice line. Uh is what that's doing. And then we're going to plot the contour uh plot the x limit. plot the y limit and put the scatter plot in there. And let's go ahead and run this. Uh to be honest, when I'm doing these graphs, there's so many different ways to do that. There's so many different ways to put this code together to show you what we're doing. It's uh a lot easier to pull up the graph and then go back up and explain it. So, the first thing we want to note here when we're looking at the data is this is the training set. And so we have those who didn't make a purchase. We've drawn a nice area for that that's defined by the naive bay setup. And then we have those who did make a purchase, the green. And you can see that some of the green dots fall into the red area and some of the red dots fall into the green. So even our training set isn't going to be 100%. Uh we couldn't do that. And so we're looking at our different data coming down. Uh we can kind of arrange our x1 x2. So we have a nice plot going on. And then we're going to create the um contour. That's that nice line that's drawn down the middle on here with the red green. Um that's where that's what this is doing right here with the reshape. And notice that we had to uh do the T. If you remember from numpy um if you did the numpy module um you end up with pairs, you know, x uh x1 x2 x1 x2 next row and so forth. you have to flip it so it's all one row. You have all your X1 ones and all your X2s. Um so this what we're kind of looking for right here on this setup. Uh and then the scatter plot is of course um your scattered data across there. We're just going through all the points that puts these nice little dots onto our setup on here. And we have our estimated salary and our H. And then of course the dots are did they make a purchase or not. And just a quick note, this is kind of funny. You can see up here where it says X set Y set equals uh X-rain Y train which seems kind of a little weird to do. Um this is because this is probably originally a definition. Uh so it's its own module that could be called over and over again and which is really a good way to do it because the next thing we're going to want to do is do the exact same thing but we're going to visualize the test set results. Uh that way we can see what happened with our test group, our 25%. And you can see down here we have um the test set. Uh and it if you look at the two graphs next to each other, this one obviously has um 75% of the data. So it's going to show a lot more. This is only 25% of the data. You can see that there's a number that are kind of on the edge as to whether they could guess by age and income they're going to make a purchase or not. U but that said, it still is pretty clear. It's pretty good as far as how much the estimate is and how good it does. Now, graphs are really effective for showing people what's going on, but you also need to have the numbers. And so, we're going to do from sklearn, we're going to import metrics. And then we're going to print our metrics classification report from the Y test and the Y predict. And you can see here we have precision uh precision of zeros is 90. There's our recall 0.96. We have an F1 score and a support. And we have our precision, the recall on getting it right. Uh, and then we can do our accuracy, the macro average, and the weighted average. Uh, so you can see it it pulls in pretty good as far as um how accurate it is. You could say it's going to be about 90% is going to guess correctly um that it that they're not going to purchase. And we had an 89% chance that they are going to purchase. Um, and then the other numbers as you get down have a little bit different meaning, but it's pretty straightforward on here. Here's our accuracy, and here's our micro average, and the weighted average, and everything else you might need. And if you forgot the exact definition of accuracy, it is the true positive, true negative over all of the different setups. Precision is your true positive over all positives, true and false. And recall is a true positive over true positive plus false negative. And we can just real quick flip back there. So you can see those numbers on here. Uh here's our precision. Here's our recall. And here's our accuracy on this. What is machine learning? Hello, my name is Richard Kersner. I'm with the SimplyLearn team. That's www.simplearn.com. Get certified. Get ahead. Today we're covering what is machine learning? What's in it for you? We're going to cover the basics of machine learning. What is machine learning? Artificial intelligence versus machine learning versus deep learning. How does machine learning work? Types of machine learning. Machine learning prerequisites. Applications of machine learning. Here we have our um it looks a little bit like Frankenstein, our Frankenstein looking robot. Today, let me tell you what is machine learning. Machine learning works on the development of computer programs that can access data and use it to automatically learn and improve from experience. Watch a robot builder construct house in 2 days. This was back in July 29th, 2016. So that's pretty impressive. This amount of time to continue to grow in his development. And it's smart enough to leave spaces in the brick work for wiring and plumbing and can even cut and shape bricks to size. Amazon Echo relies on machine learning. And with more data, it becomes more accurate. Play your favorite music, order pizza from Domino's, voice control your home, request rides from Uber. Have you ever wondered the difference between AI, machine learning, and deep learning? Artificial intelligence, a technique which enables machines to mimic human behavior. This is really important because this is how we are able to gauge how well our computations or what we're working on works is the fact that we're mimicking human behavior. We're using this to replace human work and make it more efficient and make it more streamlined and more accurate. And so the center of artificial intelligence is the big picture of all this put together. IBM deep blue chess electronic game characters. Those are just a couple examples of artificial intelligence. Machine learning a technique which uses statistical methods enabling machines to learn from their past data. So this means if you have your input from last time and you have your answer, you use that to help prove the next guess it makes for the correct answer. IBM Watson, Google search algorithm, email spam filters, these are all part of machine learning. And then deep learning, which is a subset of machine learning, composing algorithms that allow a model to train itself and perform tasks. Alph Go, natural speech recognition. These are a couple examples. Deep learning is associated with tools like neural networks where it's kind of a black box. As it learns, it changes all these things that are as a human we'd have a very hard time tracking and it's able to come up with an answer from that. Now, let's see how machine learning works. First, we start with training the data. Once we've trained the data, the train we go into the machine learning algorithm which then puts the data into a processing which then goes down to machine another machine learning algorithm and then we take new data because you have to test whatever you did to make sure it works correctly and we put that into the same algorithm. Once we do that, we check our prediction, we check our results. And from the prediction, if we've set aside some training data and we find out it didn't do a good job predicting it and it gets a thumbs down as you see, then we go back to the beginning and we retrain the algorithm. And a lot of times it's not just about getting the wrong answer. It's about continually trying to get a better answer. So you'll see the first time you might be like, "Oh, this is not the answer I want." Depending on what domain you're working in, whether it's medical, economical, business, stocks, whatever, you try out your model and if it's not giving you a good answer, you retrain it. If you think you can get a better answer, you retrain it and you keep doing that until you get the best answer you can. Let's see the types of machine learning. Supervised learning, unsupervised learning. There's a number of ways to divide up machine learning and how it works. These are two main categories you can divide it into. Supervised learning, we have a known amount of data. So in this case, we have a bunch of apples. We have a machine learning algorithm. It goes through the process. It goes through and trains a model based on that known data. And then once you've trained your model on the known stuff, you can then put an unknown data in there and you get a new response. And of course in this particular one, it's an apple. So it's trying to figure out whether it's an apple or another fruit. There are many different algorithms you can use for computing this information for doing this supervised training. Just to list the some of the top ones that are currently being used and by no means not there's more than just this. So by no means this isn't the complete list. There's polomial regression, there's a random forest, there's linear regression, there's logistic regression, there's decision trees, there's K nearest neighbors, and there's naive bays. Like I said, this is just a short list of some of the many tools that are out there nowadays. And if you have supervised learning, then we should also look at unsupervised learning. Unsupervised learning. So, we have unknown data. In this case, you can see we have a bunch of fruit and we might not have labeled it. We don't know. We've never had anybody look at it and say this is what this is and we take that data and we put it through the machine learning algorithm and then that goes through the processing and then the trained model and what the trained model says hey can I see a pattern here and from that pattern it divides it up into a response in this case apples and pears. You can see some of these things look just like the other and it tries to put them all together so that you get similar things in similar groups. And again we have a nice list of algorithms here. And this is not uh the only algorithms used for this. So don't limit yourself to this just these. These are just some of the primary ones used today. And of course we have the K means clustering, singular value decomposition, fuzzy means, partial lease squares, a priori, hierarchal clustering, principal component analysis, machine learning prerequisites, computer science fundamentals and programming. So any of the machine learning out today, you have to know some basic scripting or programming. Intermediate statistical knowledge, you have to understand a little bit about probabilities. If A is current, how likely is B going to happen? If there's clouds overhead, how likely is it going to rain? Linear algebra and intermediate calculus. The linear algebra is very important because you have to understand basically drawing a line through the data points and what that means. That's the most fundamental linear regression models. You draw a line through all your data and you use that line to compute new values. Intermediate calculus means you need to have a little bit of understanding of what a differential equation is. You really don't need to be an expert because the computer does all the heavy lifting for you. But it's important to know the terms when they come up. Unless you're doing some advanced programming on the actual models themselves. And data wrangling and cleaning. I would say this might be the biggest one in here is you have to start getting a grip on how to clean up your data. There's a saying is bad data in, bad data out. Good data in, you're more likely to have good data out. Some applications of machine learning, instance segmentation, object detection, instant segmentation. You can see here where they use machine learning to go in there and find where the different cats are and the different objects are in the picture. And then in segmentation, it actually cuts them out. kind of a fun one, especially if you have a Google Pixel phone and you can do little animation objects on top of your ongoing pictures you're taking or movies. Number plate detection. You can see here where we have a car and it comes in there and it finds a number plate on the car. Once it's done that, it can then do automatic translation. Automatic translation is we pick up some symbols, in this case on a machine, and it does machine translation so that you can know what it's saying even if you don't speak that language. So to summary, we covered the basics of machine learning. What is machine learning? We talked a little bit about the process or the workflow of machine learning. We've looked at two different divisions of machine learning, supervised and unsupervised. We went over the prerequisites you should have going into machine learning that you should have the basic fundamentals or a little bit of computer science and programming or scripting skills. You should know some basic linear algebra and maybe some little bit of calculus and differential equations as part of the calculus. You should have some basic intermediate statistical knowledge of what that means and what those terminology means. And you should have an idea of what data wrangling and cleaning is. How do you take your data coming in making sure you don't have missing values? Make sure that you're switching float values so they're processed correctly, integer values versus something that's categorically like yes, true, yes, no, true, false. And then we looked at a couple applications of machine learning. Of course, there are so many applications out in today's market. is one of the biggest growing markets out there. This is just a very brief summary of some of the things that are going on >> today. We are diving into machine learning, the technology behind things like Netflix recommendations, CD, and even the face unlock of your phone. Machine learning helps devices get smarter by learning from data and predicting what we might like or need. And here's why machine learning is huge for your career. Right now, machine learning jobs are among the fastest growing roles worldwide. Companies in every industry, tech, healthcare, finance, and more, are looking for people with machine learning skills to improve their products, automate tasks, and make smarter decisions. Machine learning engineers in the US earn around $112,000 on average with plenty of room for growth as you gain experience. So, if you want to jump into this exciting field, learning machine learning can open doors to highpaying in demand jobs. So in this video I'll guide you through the ultimate road map to master machine learning in 2025 one step at a time. So let's get started. So in the first month start with the foundations of programming. So programming is a language you'll use to communicate with your computer and bring machine learning algorithms to life. So this month is all about Python, the language of choice for most machine learning practitioners. So here's what to focus on. First, learn Python basics. Begin with Python's fundamentals like variables, data types, loops and functions. So spend time writing small programs daily to get comfortable. After that explore the key libraries like numpy, pandas and scikitlearn. So numpy is for numerical operations. It makes handling large data sets faster and easier. And pandas is to manipulate and analyze data. So pandas allow you to filter, sort and reshape data in a breeze. And then scikitlearn is for implementing algorithms in just a few lines of code. So now you might have heard about R, another language using machine learning. But don't stress about it now. Python will serve you well, especially as a beginner, because it's simpler and more flexible. So aim to spend an hour or two each day coding. By the end of this month, you'll have a solid base to build on. Now, in the second month, get organized with version control and data structures. So this month is about learning how to organize and manage your code effectively and sharpening your problem solving skills with data structures and algorithms. So first is version control with git. So think of git as your project history tracker. So imagine working on a big project and making changes then realizing something went wrong. You want to go back to an earlier version, right? So that's where git comes in. And here's what you should practice. Number one is committing changes. So save different versions of your work as you progress. And then branching which means work on separate features without affecting your main code. And then comes merging which means combining changes from different versions once they are ready. So you have to set up an account on GitHub or GitLab to store your projects online. So not only will this be super useful, but it'll also start building your portfolio. Now next is data structures and algorithm. So think of data structures like tools in a toolkit. So each one like arrays, stacks, cues, etc. serves a specific purpose. So here's how to approach them. Number one, arrays and lists. Now arrays and lists are for storing data in sequence. After that, you can get familiar with stacks and cues. So stacks and cues are for tasks that need ordered data access. And then you have sorting and searching algorithms. So these make your programs more efficient. And that's super important in machine learning where data can get massive. So the goal here is to build up your problem solving skills which are key to machine learning success. So take it slow, practice daily and you'll see progress. Now in the third month, learn to access data with SQL. So in machine learning, a lot of work involves accessing and organizing data from databases. So SQL, a structured query language, is your ticket to getting the data you need for training ML models. So here's what you should focus on. Select and where. So these commands help you pull specific pieces of data and then you can move on to joins. Joins usually combine data from different tables. So this is so powerful that you'll use it all the time. And then comes group by and aggregate functions. They are great for summarizing data to find patterns. So spend time working with sample databases you can find online and practice writing queries. Being comfortable with SQL will save you time when preparing data for your models. Now after completing the third month you can move on to mathematics which is building your analytical mind. So this month we are tackling the math behind machine learning. So don't worry you don't need to be a math genius but understanding certain concepts will make everything feel less mysterious. So in this month you have to focus on linear algebra. So this is the math behind how models see data. So you can study vectors, matrices and operations like multiplication. Next comes calculus. So you'll use calculus to help your models learn. So you have to focus on derivatives and gradients which help minimize errors in your model. And then you can move on to probability and statistics. So understanding probability helps you make sense of data. So learn about distributions like normal distribution, bormal distribution and then variance and standard deviation. So once you have learned maths, next you'll be moving on to data handling and visualization which is the heart of machine learning as you all know. So with Matt under your belt, it's time to dig into data handling and visualization. So data preparation is vital because your model is only as good as the data you feed it. So number one comes data manipulation. So using pandas and numpy, you'll clean and organize your data. You might be removing missing values like clean up messy data so it doesn't confuse your model. And then you'll learn transforming variables like converting data into formats that work for models. And then you will move on to encoding categorical data like changing text data like female or male into numbers. Now once you're done with data manipulation, next comes data visualization. So visualization is how you get to see your data before training a model. So here you have to learn mattplot lip and seab. So you can create line charts, histograms, scatter plots and heat maps. So this lets you explore patterns and spot outliers. So understanding these patterns in your data is crucial for building effective models. Now in the sixth month you'll be moving on to the machine learning fundamentals. So now it's time to start building your own models. So you'll focus on two main types of machine learning this month. Number one comes the supervised learning. So this is when you train a model on label data where the outcome is already known. So you'll work with algorithms like linear regression which predicts a continuous outcome. Then you'll work with decision trees which breaks down decisions into a tree structure. And then you have support vector machines under supervised learning which updates data into classes. Now after supervised learning comes unsupervised learning. So here your model identifies patterns in data without labeled outcomes. So two popular techniques in unsupervised learning is number one clustering like K means clustering which means group similar data points and then you have dimensionality reduction. This reduces data complexity by focusing on key features. So you can use scikitle learn to try out these algorithms on sample data sets. So this will give you hands-on experience with model training and you will learn to fine-tune them to get better results. Now moving on to the seventh month, you'll be building and training models with advanced libraries. So by now you have experimented with some basic models. So let's step it up with advanced tools like TensorFlow and PyTorch. So these libraries offer more flexibility and power. So TensorFlow and PyTorch. So here you can start with simple models and work your way up. So these libraries allow for building neural networks which you'll be studying more on the next month. Now once you have become familiar with TensorFlow and PyTorch, you can move on to model training and evaluation. So you have to learn to split data into training and testing sets and evaluate models using metrics like accuracy and precision. So your goal this month should be to get comfortable with these libraries and understand how they handle data and model training behind the scenes. So once you are done with this you'll be moving on to the eighth month where you'll be dealing with advanced machine learning. So this month's concept will be number one on ensemble learning which means combining multiple models to get better predictions. So here you'll be learning about bagging for example random forests. Here multiple decision trees make predictions and then you have boosting like ada boost xg boost. So models learn from each other's mistakes over here and after ny symbol learning comes deep learning. So here you explore neural networks which mimic the human brain. So you'll learn about neural network basics. So you can start with simple fully connected networks and then you can move on to back propagation and gradient descent. So these helps your model learn and improve. So you can use tensorflow or pytorch to practice building neural networks. So you can work on projects to reinforce these concepts. Now moving on, you have to specialize on topics like NLP and computer vision. So machine learning applications are so powerful and here you'll get a taste of two major fields which is NLP or natural language processing. So here they work with text data for tasks like sentiment analysis and text classification. So you can start with basic pre-processing like tokenization, stop word removal and move to building simple NLP models. After that you can try computer vision. So for image data you have to learn CNN's convolutional neural networks. So these network analyze visual patterns making them ideal for image classification. So you practice with open data sets like text, documents or images and apply the concepts you will learn to see results in real world applications. Now in the 10th month you'll be dealing with model deployment which is bringing your models to life. So here you'll be using Flask or Django. So you can use these frameworks to create a web API so users can interact with your model. For example, build a web app that lets people upload images for classification. And then you can also try out Docker. So package your model and its dependencies so it can run on any machine. So this is super helpful for deploying models without compatibility issues. So by the end of this month, you'll be able to share your models with the world. So moving on to the 11th month, you'll be starting with cloud and production. So this month, you'll learn how to deploy models on the cloud and ensure they perform well in real world environments. So you'll be dealing with cloud platforms like AWS, Google Cloud or Azure. So you have to learn to deploy models of the cloud provider accessibility and scalability. And then comes monitoring and maintenance. So understand how to track your models performance over time and update it as needed. So these skills are essential for maintaining models in production and ensuring they stay reliable. And finally, you'll be creating real world projects and portfolio building. So here you have to choose topics that interest you and showcase your skills. So first you can start with full projects. So complete projects that go from data cleaning and model building to deployment. So ideas could be a sentiment analysis tool or an image recognition app. And then you have to build your portfolio. So organize and document your projects, host them on GitHub and create an online portfolio to share with potential employers or collaborators. So by following this road map, you'll be well prepared to handle real world machine learning challenges and have an impressive portfolio to show for it. So thank you so much for joining me on this ultimate road map to mastering machine learning in 2025. I hope this monthby-month guide gives you a clear achievable path to dive into machine learning from the basics to deploying your own models. Remember learning machine learning is a journey. There will be challenges along the way but with patience and consistent practice you'll see progress step by step. Hello and welcome to machine learning tutorial part one. This is part one of a machine learning series put on by simply learn. My name is Richard Kersner. I'm with the SimplyLearn team. That's www.simplearn.com. Get certified get ahead. What's in it for you today? Well, we'll start off with a brief explanation of why machine learning and what is machine learning. And then we'll get into a few of the types of machine learning. machine learning algorithms, linear regression, decision trees, support vector machine, and finally, we'll do a use case where we're going to classify whether a recipe is of a cupcake or a muffin using the SVM or the support vector machine. Sounds like a delicious way to explore machine learning. So, why machine learning? Why do we even care about having these computers come up and be able to do all these new things for us? Well, because machines can now drive your car for you. still very in the infant stage but it's just exploding as we see with uh Google's Whimo and then Uber had their program which unfortunately crashed. They know that this is huge. This is going to be the huge industry to change our whole transportation infrastructure. Machine learning is now used to detect over 50 eye diseases. Do you know how amazing that is to have a computer that double checks for the doctor for things they might miss? That's just huge in the health industry. Pretty soon they actually do already have that with in some areas where maybe not for eyes but for other diseases where they're using the camera on your phone to help pre-diagnose before you go in and see the doctor. And because the machine can now unlock your phone with your face. I mean, that's just cool having it being able to identify your face or your voice and be able to turn stuff on and off for you depending on where you're at and what you need. Talk about an ultimate automation our world we live in. And as we dig in deeper, we have a nice example of Facebook. As you can see here, they have the Facebook post with Halloween. Comment yes if you want it. Order here. Nobody likes spam posts on Facebook that annoy them into interacting with likes, shares, comments, and other actions. I remember the original ones were all if you don't click on here, you will have bad luck or some kind of fear factor. Well, this is a huge thing in a social media when people are getting spammed. And so this tactic known as engagement bait takes advantage of Facebook's newsfeed algorithm by choosing engagement in order to get the greater reach. To eliminate engagement bait, the company reviewed and categorized hundreds of thousands of posts to train a machine learning model that detects different types of engagement bait. So in this case, we have we're using Facebook, but this is of course across all the different social media. they have different tools are building and the Facebook scroll gif will be replaced kind of like a virus coming in there it notices that there's a certain setup with Facebook and it's able to replace it and they have like vote baiting react baiting share baiting they have all these different these are kind of general titles but there certainly are a lot of way of baiting you to go in there and click on something so they fed all this this data was fed into the machine and then they have the new post the new post comes up that takes over part of the Facebook setup up and that's what you're looking at. You're looking at this new post that's replaced like a virus has replaced that. So what Facebook did to eliminate this is they start scanning for keywords and phrases like this and checks the click-through rate. So it starts looking for people who are clicking through it without even looking at it or clicking through it and it's not something that normally would be clicked through. Once Facebook has scanned for these keywords and phrases, it is now able to identify the spam coming in and this makes your life easier. So you're not getting spammed. It's not like walking through an airport and in a lot of countries you have like hundreds of people trying to sell you time share. Come join us. Sign up for this. Eliminates that annoyingness. So now you can just enjoy your Facebook and your cat pictures. Or maybe it's your family pictures. Mine is family. Certainly people like their cat pictures too. Another good example is Google's Deep Mind project Alph Go. A computer program that plays a board game Go has defeated the world's number one Go player. And I hope I say his name right. in Kiji. The ultimate go challenge game of three of three was on May 27th, 2017. So that was just last year that this happened. And what makes this so important is that you know Go is just is a game. So it's not like you're driving a car or something in our real world, but they are using games to learn how to get the machine learning program to learn. They want it to learn how to learn. And that is a huge step. A lot of this is still in its infant stage as far as development as we saw what happened with the as I referred to earlier the Uber cars. They lost their whole division because they jumped ahead too fast. So still an infant stage but boy is this like the beginning of just an amazing world that is automated in ways we can't even imagine what tomorrow's going to look like. We've looked at a lot of examples of machine learning. So let's see if we can give a little bit more of a concrete definition. What is machine learning? Machine learning is the science of making computers learn and act like humans by feeding data and information without being explicitly programmed. We see here we have a nice little diagram where we have our ordinary system, your computer nowadays, it can even run a lot of the stuff on a cell phone because cell phones have advanced so much. And then with artificial intelligence and machine learning, it now takes the data and it learns from what happened before and then it predicts what's going to come next. And then really the biggest part right now in machine learning that's going on is it improves on that. How do we find a new solution? So we go from descriptive where it's learning about stuff and understanding how it fits together to predicting what it's going to do to postcripting coming up with a new solution. And when we're working on machine learning, there's a number of different diagrams that people have posted for what steps to go through. A lot of it might be very domain specific. So if you're working on photo identification versus language versus medical or physics, some of these are switched around a little bit or new things are put in. They're very specific to the domain. This is kind of a very general diagram. First, you want to define your objective. Very important to know what it is you're wanting to predict. Then you're going to be collecting the data. So once you've defined an objective, you need to collect the data that matches. You spend a lot of time in data science collecting data and the next step preparing the data. You got to make sure that your data is clean going in. There's the old saying, bad data in, bad answer out or bad data out. And then once you've gone through and we've cleaned all this stuff coming in, then you're going to select the algorithm. Which algorithm are you going to use? You're going to train that algorithm. In this case, I think we're going to be working with SVM, the support vector machine. Then you have to test the model. Does this model work? Is this a valid model for what we're doing? And then once you've tested it, you want to run your prediction. You want to run your prediction or your choice or whatever output it's going to come up with. And then once everything is set and you've done lots of testing, then you want to go ahead and deploy the model. And remember, I said domain specific. This is very general as far as the scope of doing something. A lot of models you get halfway through and you realize that your data is missing something and you have to go collect new data because you've run a test in here someplace along the line. You're saying, "Hey, I'm not really getting the answers I need." So, there's a lot of things that are domain specific that become part of this model. This is a very general model, but it's a very good model to start with. And we do have some basic divisions of what machine learning does that's important to know. For instance, do you want to predict a category? Well, if you're categorizing thing, that's classification. For instance, whether the stock price will increase or decrease. So in other words, I'm looking for a yes no answer. Is it going up or is it going down? And in that case, we'd actually say, is it going up? True. If it's not going up, it's false, meaning it's going down. This way, it's a yes, no. 01. Do you want to predict a quantity? That's regression. So remember, we just did classification. Now we're looking at regression. These are the two major divisions in what data is doing. For instance, predicting the age of a person based on the height, weight, health, and other factors. So based on these different factors, you might guess how old a person is. And then there are a lot of domain specific things like do you want to detect an anomaly? That's anomaly detection. This is actually very popular right now. For instance, you want to detect money withdrawal anomalies. You want to know when someone's making a withdrawal that might not be their own account. We've actually brought this up because this is really big right now. If you're predicting the stock whether to buy stock or not, you want to be able to know if what's going on in the stock market is an anomaly, use a different prediction model because something else is going on. You got to pull out new information in there or is this just the norm? I'm going to get my normal return on my money invested. So being able to detect anomalies is very big in data science these days. Another question that comes up which is on what we call untrained data is do you want to discover structure in unexplored data and that's called clustering. For instance, finding groups of customers with similar behavior given a large database of customer data containing their demographics and past buying records. And in this case, we might notice that anybody who's wearing certain set of shoes goes shopping at certain stores or whatever it is. They're going to make certain purchases. By having that information, it helps us to market or group people together. So then we can now explore that group and find out what it is we want to market to them if you're in the marketing world. And that might also work in just about any arena. You might want to group people together whether they're uh based on their different areas and investments and financial background, whether you're going to give them a loan or not. before you even start looking at whether they're a valid customer for the bank, you might want to look at all these different areas and group them together based on unknown data. So, you're not you don't know what the data is going to tell you, but you want to cluster people together that come together. Let's take a quick detour for quiz time. Oh, my favorite. So, we're going to have a couple questions here under quiz time and um we'll be posting the answers in the part two of this tutorial. So, let's go ahead and take a look at these quiz times questions and hopefully you'll get them all right and it'll get you thinking about how to process data and what's going on. Can you tell what's happening in the following cases? Of course, you're sitting there with your cup of coffee and you have your checkbox and your pen trying to figure out what's your next step in your data science analysis. So, the first one is grouping documents into different categories based on the topic and content of each document. Very big these days. you know, you have legal documents, you have uh maybe it's a sports group documents, maybe you're analyzing newspaper postings, but certainly having that automated is a huge thing in today's world. B, identifying handwritten digits in images correctly. So, we want to know whether uh they're writing an A or capital A, B, C, what are they writing out in their hand digit, their handwriting. C behavior of a website indicating that the site is not working as designed. D predicting salary of an individual based on his or her years of experience with HR hiring uh setup there. So stay tuned for part two. We'll go ahead and answer these questions when we get to the part two of this tutorial or you can just simply write at the bottom and send a note to SimplyLearn and they'll follow up with you on it. Back to our regular content. And these last few bring us into the next topic which is another way of dividing our types of machine learning and that is with supervised unsupervised and reinforcement learning. Supervised learning is a method used to enable machines to classify predict objects, problems or situations based on labeled data fed to the machine. And in here you see we have a jungle of data with circles, triangles and squares. And we label them. We have what's a circle, what's a triangle, what's a square. And we have our model training and it trains it. So we know the answer. Very important when you're doing supervised learning, you already know the answer to a lot of your information coming in. So you have a huge group of data coming in and then you have new data coming in. So we've trained our model. The model now knows the difference between a circle, a square, a triangle. And now that we've trained it, we can send in in this case a square and a circle goes in and it predicts that the top one's a square and the next one's a circle. And you can see that this is uh being able to predict whether someone's going to default on a loan because I was talking about banks earlier. Supervised learning on stock market whether you're going to make money or not. That's always important. And if you are looking to make a fortune on the stock market, keep in mind it is very difficult to get all the data correct on the stock market. It is very uh it fluctuates in ways you really hard to predict. So it's quite a roller coaster ride. If you're running machine learning on the stock market, you start realizing you really have to dig for new data. So we have supervised learning and if you have supervised, we need unsupervised learning. In unsupervised learning, machine learning model finds the hidden pattern in an unlabeled data. So in this case, instead of telling it what the circle is and what a triangle is and what a square is, it goes in there, looks at them and says for whatever reason, it groups them together. Maybe it'll group it by the number of corners and it notices that a number of them all have three corners, a number of them all have four corners, and a number of them all have no corners. And it's able to filter those through and group them together. We talked about that earlier with looking at a group of people who are out shopping. We want to group them together to find out what they have in common. And of course, once you understand what people have in common, maybe you have one of them who's a customer at your store, or you have five of them are customer at your store, and they have a lot in common with five others who are not customers at your store. How do you market to those five who aren't customers at your store yet? They fit the demographics of who's going to shop there, and you'd like them to shop at your store, not the one next door. Of course, this is a simplified version. You can see very easily the difference between a triangle and a circle, which is might not be so easy in marketing. Reinforcement learning. Reinforcement learning is an important type of machine learning where an agent learns how to behave in an environment by performing actions and seeing the result. And we have here where the in this case a baby. It's actually great that they used an infant for this slide because the reinforcement learning is very much in its infant stages. But it's also probably the biggest machine learning demand out there right now or in the future. It's going to be coming up over the next few years is reinforcement learning and how to make that work for us. And you can see here where we have our action. And the action in this one, it goes into the fire. Hopefully the baby didn't just a little candle, not a giant fire pit like it looks like here. When the baby comes out and the new state is the baby is sad and crying because they got burned on the fire. And then maybe they take another action. The baby's called the agent because it's the one taking the actions. And in this case, they didn't go into the fire. They went a different direction. And now the baby's happy and laughing and playing. Reinforcement learning is very easy to understand because that's how as humans that's one of the ways we learn. We learn whether it is you know you burn yourself on the stove, don't do that anymore. Don't touch the stove. In the big picture, being able to have machine learning program or an AI be able to do this is huge because now we're starting to learn how to learn. That's a big jump in the world of computer and machine learning. And we're going to go back and just kind of go back over supervised versus unsupervised learning. Understanding this is huge because this is going to come up in any project you're working on. We have in supervised learning, we have labeled data. We have direct feedback. So someone's already gone in there and said, "Yes, that's a triangle. No, that's not a triangle." And then you predict an outcome. So you have a nice prediction. This is this this new set of data is coming in and we know what it's going to be. And then with unsupervised training, it's not labeled. So, we really don't know what it is. There's no feedback. So, we're not telling it whether it's right or wrong. We're not telling it whether it's a triangle or a square. We're not telling it to go left or right. All we do is we're finding hidden structure in the data, grouping the data together to find out what connects to each other. And then you can use these together. So, imagine you have an image and you're not sure what you're looking for. So, you go in and you have the unstructured data. find all these things that are connected together and then somebody looks at those and labels them. Now you can take that label data and program something to predict what's in the picture. So you can see how they go back and forth and you can start connecting all these different tools together to make a bigger picture. There are many interesting machine learning algorithms. Let's have a look at a few of them. Hopefully this gave you a little flavor of what's out there and these are some of the most important ones that are currently being used. We'll take a look at linear regression, decision tree, and the support vector machine. Let's start with a closer look at linear regression. Linear regression is perhaps one of the most well-known and well understood algorithms in statistics and machine learning. Linear regression is a linear model. For example, a model that assumes a linear relationship between the input variables x and the single output variable y. And you'll see this if you remember from your algebra classes. y = mx + c. Imagine we are predicting distance traveled y from speed x. Our linear regression model representation for this problem would be y = m * x + c or distance equals m * speed + c where m is the coefficient and c is the y intercept. And we're going to look at two different variations of this. First we're going to start with time is constant. And you can see we have a bicyclist. He's got his safety gear on, thank goodness. Speed equals 10 meters per second. And so over a certain amount of time, his distance equals 36 kilometers. We have a second bicyclist who's going twice the speed or 20 m/s. And you can guess if he's going twice the speed and time is a constant, then he's going to go twice the distance. And that's easy to compute. 36 * 2, you get 72 km. And so if you had the question of how fast would somebody going three times that speed or 30 m/s is, you can easily compute the distance in our head, we can do that without needing a computer. But we want to do this for more complicated data. So it's kind of nice to compare the two. But let's just take a look at that and what that looks like in a graph. So in a linear regression model, we have our distance to the speed and we have our m equals the ve slope of the line. And we'll notice that the line has a plus slope. And as speed increases, distance also increases. Hence, the variables have a positive relationship. And so your speed of the person, which equals y= mx plus c, distance traveled in a fixed interval of time. And we could very easily compute either following the line or just knowing it's 3 * 10 m/s. That this is roughly 102 km distance that this third bicus has traveled. One of the key definitions on here is positive relationship. So the slope of the line is positive. As distance increase, so does speed increase. Let's take a look at our second example where we put distance is a constant. So we have speed equals 10 m/s. They have a certain distance to go and it takes him 100 seconds to travel that distance. And we have our second bicyclist who's still doing 20 m/s. Since he's going twice the speed, we can guess he'll cover the distance in about half the time, 50 seconds. And of course, you could probably guess on the third one, 100 divided by 30 since he's going three times the speed. You can easily guess that this is 33.33 seconds time. We put that into a linear regression model or a graph. If the distance is assumed to be constant, let's see the relationship between speed and time. And as time goes up, the amount of speed to go that same distance goes down. So now your m equals a minus v slope of the line. As the speed increases, time decreases. Hence the variable has a negative relationship. Again, there's our definition positive relationship and negative relationship dependent on the slope of the line. And with a simple formula like this um and even a significant amount of data, let's uh see what the mathematical implementation of linear regression. And we'll take this data. So suppose we have this data set where we have xyx = 1 2 3 4 5 standard series and the y value is 3 22 2 43. When we take that and we go ahead and plot these points on a graph, you can see there's kind of a nice scattering and you could probably eyeball a line through the middle of it. But we're going to calculate that exact line for linear regression. And the first thing we do is we come up here and we have the mean of xi. And remember mean is basically the average. So we added 5 + 4 + 3 + 2 + 1 and divide by five. And that simply comes out as three. And then we'll do the same for y. We'll go ahead and add up all those numbers and divide by five. And we end up with a mean value of y of i equals 2.8 where the x i references an average or means value and the yi also equals a means value of y. And when we plot that, you'll see that we can put in the y= 2.8 and the x= 3 in there on our graph. We kind of gave it a little different color so you could sort it out with the dashed lines on it. And it's important to note that when we do the linear regression, the linear regression model should go through that dot. Now, let's find our regression equation to find the best fit line. Remember, we go ahead and take our y= mx plus c. So, we're looking for m and c. So, to find this equation for our data, we need to find our slope of m and our coefficient of c. And we have y = mx + c where m equals the sum of x - x average * y - y average or y means and x means over the sum of x - x means squared. That's how we get the slope of the value of the line. And we can easily do that by creating some columns here. We have xy. Computers are really good about iterating through data. And so we can easily compute this and fill in a graph of data. And in our graph you can easily see that if we have our x value of 1 and if you remember the x i or the means value is 3. 1 - 3 equals a -2 and 2 - 3 = a -1 so on and so forth. And we can easily fill in the column of x - x i y - yi. And then from those we can compute x - x i^ 2 and x - x i * y - yi. And you can guess it that the next step is to go ahead and sum the different columns for the answers we need. So we get a total of 10 for our x - x i^2 and a total of 2 for x - x i * y - yi. And we plug those in, we get 2/10, which equals2. So now we know the slope of our line equals2. So we can calculate the value of c. That'd be the next step is we need to know where it crosses the y ais. And if you remember, I mentioned earlier that the linear regression line has to pass through the means value, the one that we showed earlier. We can just flip back up there to that graph. And you can see right here, there's our means value, which is 3 x= 3 and y= 2.8. And since we know that value, we can simply plug that into our formula. y =2x + c. So we plug that in, we get 2.8 8 =2 * 3 + c. And you can just solve for c. So now we know that our coefficient equals 2.2. And once we have all that, we can go ahead and plot our regression line. y =2 * x + 2.2. And then from this equation, we can compute new values. So let's predict the values of y using x= 1 2 3 4 5 and plot the points. Remember the 1 2 3 4 5 was our original x values. So now we're going to see what y thinks they are, not what they actually are. And when we plug those in, we get y of designated with y of p. You can see that x= 1 = 2.4, x= 2 = 2.6, and so on and so on. So we have our y predicted values of what we think it's going to be when we plug those numbers in. And when we plot the predicted values along with the actual values, we can see the difference. And this is one of the things that's very important with linear regression in any of these models is to understand the error. And so we can calculate the error on all of our different values. And you can see over here we plotted um X and Y and Y predict. And we draw a little line so you can sort of see what the error looks like there between the different points. So our goal is to reduce this error. We want to minimize that error value on our linear regression model. Minimizing the distance. There are lots of ways to minimize the distance between the line and the data points like sum of squared errors, sum of absolute errors, root mean square error, etc. We keep moving this line through the data points to make sure the best fit line has the least square distance between the data points and the regression line. So to recap with a very simple linear regression model, we first figure out the formula of our line through the middle and then we slowly adjust the line to minimize the error. Keep in mind this is a very simple formula. The math gets even though the math is very much the same, it gets much more complex as we add in different dimensions. So this is only two dimensions y = mx + c. But you can take that out to x, z, y, jq, all the different features in there and they can plot a linear regression model on all of those using the different formulas to minimize the error. Let's go ahead and take a look at decision trees. A very different way to solve problems in the linear regression model. Decision tree is a tree-shaped algorithm used to determine a course of action. Each branch of a tree represents a possible decision, occurrence, or reaction. We have data which tells us if it is a good day to play golf. And if we were to open this data up in a general spreadsheet, you can see we have the outlook whether it's rainy, overcast, sunny, temperature, hot, mild, cool, humidity, windy, and did I like to play golf that day? Yes or no. So, we're taking a census and certainly I wouldn't want a computer telling me when I should go play golf or not. But you could imagine if you got up in the night before you're trying to plan your day and it comes up and says tomorrow would be a good day for golf for you in the morning and not a good day in the afternoon or something like that. This becomes very beneficial and we see this in a lot of applications coming out now where it gives you suggestions and lets you know what what would uh fit the match for you for the next day or the next purchase or the next uh whatever you know next mail out in this case is tomorrow a good day for playing golf based on the weather coming in. And so we come up and let's uh determine if you should play golf when the day is sunny and windy. So we found out the forecast tomorrow is going to be sunny and windy. And suppose we draw our tree like this. We're going to have our humidity. And then we have our normal, which is uh if it's if you have a normal humidity, you're going to go play golf. And if the humidity is really high, then we look at the outlook. And if the outlook is sunny, overcast, or rainy, it's going to change what you choose to do. So if you know that it's a very high humidity and it's sunny, you're probably not going to play golf cuz you're going to be out there miserable, fighting off the mosquitoes that are out joining you to play golf with you. Maybe if it's rainy, you probably don't want to play in the rain. But if it's slightly overcast and you get just the right shadow, that's a good day to play golf and be outside out on the green. Now, in this example, you can probably make your own tree pretty easily cuz it's a very simple set of data going in. But the question is, how do you know what to split? Where do you split your data? What if this is much more complicated data where it's not something that you would particularly understand like studying cancer? They take about 36 measurements of the cancerous cells and then each one of those measurements represents how bulbous it is, how extended it is, how sharp the edges are, something that as a human we would have no understanding of. So how do we decide how to split that data up? And is that the right decision tree? But so that's a question that's going to come up. Is this the right decision tree? For that we should calculate entropy and information gain. Two important vocabulary words there are the entropy and the information gain. Entropy. Entropy is a measure of randomness or impurity in the data set. Entropy should be low. So we want the chaos to be as low as possible. We don't want to look at it and be confused by the images or what's going on there with mixed data. And the information gain, it is a measure of decrease in entropy after the data set is split. Also known as entropy reduction. information gain should be high. So we want our information that we get out of the split to be as high as possible. Let's take a look at entropy from the mathematical side. In this case, we're going to denote entropy as I of P of and N where P is the probability that you're going to play a game of golf and N is the probability where you're not going to play the game of golf. Now, you don't really have to memorize these formulas. There's a few of them out there depending on what you're working with. But it's important to note that this is where this formula is coming from. So when you see it, you're not lost when you're running your programming, unless you're building your own decision tree code in the back. And we simply have a log 2 of p over p plus n minus n over p + n * the log 2 of n of p plus n. But let's break that down and see what actually looks like when we're computing that from the computer script side. Entropy of a target class of the data set is the whole entropy. So we have entropy play golf. And when we look at this, if we go back to the data, you can simply count how many yeses and no in our complete data set for playing golf days. In our complete set, we find we have five days we did play golf and nine days we did not play golf. And so our I equals, if you add those together, 9 + 5 is 14. And so our I equals 5 over 14 and 9 over 14. That's our PNN values that we plug into that formula. And you can go 5 over 14=.36. 9 over 14=64. And when you do the whole equation, you get the -.36 log<unk>^2 of.36 minus.64 log<unk> of 64. And we get a set value. We get 94. So we now have a full entropy value for the whole set of data that we're working with. And we want to make that entropy go down. And just like we calculated the entropy out for the whole set, we can also calculate entropy for playing golf and the outlook. Is it going to be overcast or rainy or sunny? And so we look at the entropy. We have P of sunny times E of three of two. And that just comes out how many sunny days yes and how many sunny days? No, over the total, which is five. Don't forget to put the we'll divide that five out later on. uh equals p overcast = 4 comma 0 plus rainy = 2a 3 and then when you do the whole setup we have 5 over4 remember I said there was a total of five 5 over 14 * the i of 3 of 2 + 4 over 14 * the 4 0 and 514 over i of 23 and so we can now compute the entropy of just the part that has to do with the forecast and we get 693 similar Finally, we can calculate the entropy of other predictors like temperature, humidity, and wind. And so, we look at the gain outlook. How much are we going to gain from this entropy play golf minus entropy play golf outlook? And we can take the original 0.94 for the whole set minus the entropy of just the rainy day and temperature. And we end up with a gain of.247. So, this is our information gain. Remember, we define entropy and we define information gain. The higher the information gain, the lower the entropy, the better. The information gain of the other three attributes can be calculated in the same way. So we have our gain for temperature equals 0.029. We have our gain for humidity equals.152. And our gain for a windy day equals 0048. And if you do a quick comparison, you'll see the.247 is the greatest gain of information. So that's the split we want. Now let's build the decision tree. So, we have the outlook. Is it going to be sunny, overcast, or rainy? That's our first split because that gives us the most information gain. And we can continue to go down the tree using the different information gains with the largest information. We can continue down the nodes of the tree where we choose the attribute with the largest information gain as the root node and then continue to split each subnode with the largest information gain that we can compute. And although it's a little bit of a tongue twister to say all that, you can see that it's a very easy to view visual model. We have our outlook. We split it three different directions. If the outlook is overcast, we're going to play. And then we can split those further down if we want. So if the over outlook is sunny, but then it's also windy. If it's uh windy, we're not going to play. If it's uh not windy, we'll play. So, we can easily build a nice decision tree to guess what we would like to do tomorrow and give us a nice recommendation for the day. So, we want to know if it's a good day to play golf when it's sunny and windy. Remember the original question that came out, tomorrow's weather report is sunny and windy. You can see by going down the tree, we go outlook sunny, outlook windy. We're not going to play golf tomorrow. So, our little smartwatch pops up and says, I'm sorry, tomorrow's not a good day for golf. It's going to be sunny and windy. And if you're a huge golf fan, you might go, "Uhoh, it's not a good day to play golf." We can go in and watch a golf game at home. So, we'll sit in front of the TV instead of being out playing golf in the wind. Now that we looked at our decision tree, let's look at the third one of our algorithms we're investigating. Support vector machine. Support vector machine is a widely used classification algorithm. The idea of support vector machine is simple. The algorithm creates a separation line which divides the classes in the best possible manner. For example, dog or cat, disease or no disease. Suppose we have a labeled sample data which tells height and weight of males and females. A new data point arrives and we want to know whether it's going to be a male or a female. So we start by drawing a line. We draw decision lines. But if we consider decision line one, then we will classify the individual as a male. And if we consider decision line two, then it'll be a female. So you can see this person kind of lies in the middle of the two groups. So it's a little confusing trying to figure out which line they should be under. We need to know which line divides the classes correctly. But how the goal is to choose a hyper plane and that is one of the key words they use when we talk about support vector machines. Choose a hyper plane with the greatest possible margin between the decision line and the nearest point within the training set. So you can see here we have our support vector. We have the two nearest points to it and we draw a line between those two points. And the distance margin is the distance between the hyper plane and the nearest data point from either set. So we actually have a value and it should be equal distant between the two points that we're comparing it to. When we draw the hyperplanes, we observe that line one has a maximum distance. So we observe that line one has a maximum distance margin. So we'll classify the new data point correctly. And our result on this one is going to be that the new data point is MEL. One of the reasons we call it a hyper plane versus a line is that a lot of times we're not looking at just weight and height. We might be looking at 36 different features or dimensions. And so when we cut it with a hyper plane, it's more of a three-dimensional cut in the data, multi-dimensional. It cuts the data a certain way. And each plane continues to cut it down until we get the best fit or match. Let's understand this with the help of an example. Problem statement. You always start with a problem statement when you're going to put some code together. We're going to do some coding now. Classifying muffin and cupcake recipes using support vector machines. So the cupcake versus the muffin. Let's have a look at our data set. And we have the different recipes here. We have a muffin recipe that has so much flour. I'm not sure what measurement 55 is in, but it has 55, maybe it's ounces, but it has a certain amount of flour, certain amount of milk, sugar, butter, egg, baking powder, vanilla, and salt. And so based on these measurements, we want to guess whether we're making a muffin or a cupcake. And you can see in this one, we don't have just two features. We don't just have height and weight as we did before between the male and female. In here, we have a number of features. In fact, in this we're looking at eight different features to guess whether it's a muffin or a cupcake. What's the difference between a muffin and a cupcake? Turns out muffins have more flour. Well, cupcakes have more butter and sugar. So, basically the cupcakes a little bit more of a dessert where the muffin's a little bit more of a fancy bread. But how do we do that in Python? How do we code that to go through recipes and figure out what the recipe is? And I really just want to say cupcakes versus muffins like some big professional wrestling thing. Before we start in our cupcakes versus muffins, we are going to be working in Python. There's many versions of Python, many different editors. That is one of the strengths and weaknesses of Python is it just has so much stuff attached to it. It's one of the more popular data science programming packages you can use. In this case, we're going to go ahead and use Anaconda and Jupyter Notebook. The Anaconda Navigator has all kinds of fun tools. Once you're into the Anaconda Navigator, you can change environments. I actually have a number of environments on here. We'll be using Python 36 environment. So, this is in Python version 36. Although, it doesn't matter too much which version you use. I usually try to stay with the 3x because they're current unless you have a project that's very specifically in version 2x. 27 I think is usually what most people use in the version two. And then once we're in our um Jupyter notebook editor, I can go up and create a new file and we'll just jump in here. In this case, we're doing SPM muffin versus cupcake. And then let's start with our packages for data analysis. And we almost always use a couple there's a few very standard packages we use. We use import oops import numpy that's for number python they usually denote it as np that's very comma that's very common and then we're going to import pandas as pd and numpy deals with number arrays. There's a lot of cool things you can do with the numpy uh setup as far as multiplying all the values in an array in a numpy array. data array pandas I can't remember if we're using it actually in this data set I think we do as an import it makes a nice data frame and the difference between a data frame and a numpy array is that a data frame is more like your excel spreadsheet you have columns you have indexes so you have different ways of referencing it easily viewing it and there's additional features you can run on a data frame and pandas kind of sits on numpy so they you need them both in there and then finally we're working with the support vector machine. So from sklearn, we're going to use the sklearn model import SVM support vector machine. And then as a data scientist, you should always try to visualize your data. Some data obviously is too complicated or doesn't make any sense to the human, but if it's possible, it's good to take a second look at it so you can actually see what you're doing. Now for that, we're going to use two packages. We're going to import mapplot library.pipplot as plt. Again, very common. And we're going to import seabor as sns. And we'll go ahead and set the font scale in the SNS right in our import line. That's what this u semicolon followed by a line of data. We're going to set the SNS. And these are great because the the seabour sits on top of mattplot library just like pandas sits on numpy. So it adds a lot more features and uses and control. We're obviously not going to get into Mattplot library and seabour. It' be its own tutorial. We're really just focusing on the SVM, the support vector machine from sklearn. And since we're in Jupiter notebook, uh we have to add a special line in here for our mattplot library. And that's your percentage sign or amber sign mattplo

Original Description

🔥Professional Certificate in AI and Machine Learning - https://www.simplilearn.com/professional-aiml-program?utm_campaign=SQkaBIP2JoA&utm_medium=DescriptionFirstFold&utm_source=Youtube 🔥IITK - Professional Certificate Course in Generative AI and Machine Learning (India Only) - https://www.simplilearn.com/iitk-professional-certificate-course-ai-machine-learning?utm_campaign=SQkaBIP2JoA&utm_medium=DescriptionFirstFold&utm_source=Youtube ️🔥 Professional Certificate in AI and Machine Learning - https://www.simplilearn.com/professional-aiml-program?utm_campaign=SQkaBIP2JoA&utm_medium=DescriptionFirstFold&utm_source=Youtube 🔥IITG - Professional Certificate Program in Generative AI and Machine Learning (India Only) - https://www.simplilearn.com/applied-generative-ai-course?utm_campaign=SQkaBIP2JoA&utm_medium=DescriptionFirstFold&utm_source=Youtube The Machine Learning Full Course 2025 begins with the fundamentals of Probability, Statistics, and Mathematics for Machine Learning, establishing a strong theoretical base. It then introduces the core concepts and roadmap of Machine Learning, along with its applications in the defense sector. Learners explore key algorithms, including Decision Trees, KNN, and RNN, and gain clarity on types of machine learning as well as ensemble methods like Bagging and Boosting. The course also covers practical aspects such as Confusion Matrix interpretation and a Fake News Detection project, concluding with a set of Machine Learning interview questions to strengthen professional readiness. Following are the topics covered in the Machine Learning Full Course 2025: 0:00:00 - Introduction to Machine Learning Full Course 2025 0:02:04 - Probablity and Statistics 0:48:23 - Mathematics for machine learning 2:38:47 - What is Machine Learning 2:47:26 - Use of AI in Defense Sector 2:47:26 - Machine Learning Roadmap in 2025 2:59:52 - Machine learning Basics 4:01:52 - Machine Learning Algorithms 4:58:24 - Types Of Machine Learning 4:58:24 - Bagg

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Simplilearn · Simplilearn · 0 of 60

← Previous Next →

Ethical Hacking Full Course 2026 | Ethical Hacking Course for Beginners | Simplilearn

Ethical Hacking Full Course 2026 | Ethical Hacking Course for Beginners | Simplilearn

AWS Full Course 2026 | AWS Cloud Computing Tutorial for Beginners | AWS Training | Simplilearn

AWS Full Course 2026 | AWS Cloud Computing Tutorial for Beginners | AWS Training | Simplilearn

Data Structures And Algorithms Full Course | Data Structures and Algorithms Tutorial | Simplilearn

Data Structures And Algorithms Full Course | Data Structures and Algorithms Tutorial | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

Microsoft Azure Full Course 2026 | Azure Tutorial for Beginners | Azure Training | Simplilearn

Microsoft Azure Full Course 2026 | Azure Tutorial for Beginners | Azure Training | Simplilearn

Shopify Tutorial For Beginners 2026 | Shopify Course | shopify dropshipping | Simplilearn

Shopify Tutorial For Beginners 2026 | Shopify Course | shopify dropshipping | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

🔥Feeling Stuck? How Upskilling Can Boost Your Career! #shorts #simplilearn

🔥Feeling Stuck? How Upskilling Can Boost Your Career! #shorts #simplilearn

Growth Hacking In Marketing | Learn Growth Hacking Marketing Strategies | Simplilearn

Growth Hacking In Marketing | Learn Growth Hacking Marketing Strategies | Simplilearn

🔥Cracked 3 Job Offers with One AIML Course! | 20–30% Salary Hike #shorts #simplilearn

🔥Cracked 3 Job Offers with One AIML Course! | 20–30% Salary Hike #shorts #simplilearn

Top 10 Must-Have Figma Plugins for UI/UX Designers in 2026 | Figma Plugins | Simplilearn

Top 10 Must-Have Figma Plugins for UI/UX Designers in 2026 | Figma Plugins | Simplilearn

Business Analytics Full Course 2026 | Business Analytics Tutorial For Beginners | Simplilearn

Business Analytics Full Course 2026 | Business Analytics Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Getting future-ready with course in Artificial Intelligence | Roopam’s story

Simplilearn Reviews | Getting future-ready with course in Artificial Intelligence | Roopam’s story

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Simplilearn Reviews | How David Went From Seasoned Engineer to AI Innovator #GetCertifiedGetAhead

Simplilearn Reviews | How David Went From Seasoned Engineer to AI Innovator #GetCertifiedGetAhead

Complete Social Media Marketing Strategy for 2026 | Social Media Marketing Strategy | Simplilearn

Complete Social Media Marketing Strategy for 2026 | Social Media Marketing Strategy | Simplilearn

🔥Top 4 Cybersecurity Certifications You Need! #simplilearn #shorts

🔥Top 4 Cybersecurity Certifications You Need! #simplilearn #shorts

🔥Cloud Engineer Salary in India 2026 | City-Wise Breakdown #shorts #simplilearn

🔥Cloud Engineer Salary in India 2026 | City-Wise Breakdown #shorts #simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Full Stack Java Developer Course | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Java Developer Course | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Social Media Marketing Full Course | Social Media Marketing Tutorial For Beginners | Simplilearn

Social Media Marketing Full Course | Social Media Marketing Tutorial For Beginners | Simplilearn

How To Create LLM Chatbot Demo 2026 | Build a LLM Chatbot From Scratch | Simplilearn

How To Create LLM Chatbot Demo 2026 | Build a LLM Chatbot From Scratch | Simplilearn

Digital Supply Chain Management Certification | Supply Chain Management Course | Simplilearn

Digital Supply Chain Management Certification | Supply Chain Management Course | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Integrating AI & Music | Diego's Story

Simplilearn Reviews | Integrating AI & Music | Diego's Story

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

SEO Full Course 2026 | SEO Tutorial for Beginners | SEO Training | SEO Explained | Simplilearn

SEO Full Course 2026 | SEO Tutorial for Beginners | SEO Training | SEO Explained | Simplilearn

PMP Vs CAPM: Which Certification Should You Choose? | PMP Vs CAPM | Simplilearn

PMP Vs CAPM: Which Certification Should You Choose? | PMP Vs CAPM | Simplilearn

Complete Data Analyst Roadmap 2026 | How To Become A Data Analayst In 2026 | Simplilearn

Complete Data Analyst Roadmap 2026 | How To Become A Data Analayst In 2026 | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

🔥5 Jobs That Are Most Likely Safe from Layoffs in Today’s Market #shorts #simplilearn

🔥5 Jobs That Are Most Likely Safe from Layoffs in Today’s Market #shorts #simplilearn

🔥Git vs GitHub – What's the Difference?

🔥Git vs GitHub – What's the Difference?

What Goes Behind Building the Likes of Uber and Netflix? | Product Management Tutorial | Simplilearn

What Goes Behind Building the Likes of Uber and Netflix? | Product Management Tutorial | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Product Life Cycle 2025 | Stages Of Product Life Cycle | Product Life Cycle Tutorial | Simplilearn

Product Life Cycle 2025 | Stages Of Product Life Cycle | Product Life Cycle Tutorial | Simplilearn

Project Management Full Course 2026 | Project Management Tutorial | PMP Course | Simplilearn

Project Management Full Course 2026 | Project Management Tutorial | PMP Course | Simplilearn

PCB Design Course 2025 | PCB Designing Explained | How To Make PCBs | Simplilearn

PCB Design Course 2025 | PCB Designing Explained | How To Make PCBs | Simplilearn

Python Full Course 2026 | Python Data Analytics Tutorial For Beginners | Simplilearn

Python Full Course 2026 | Python Data Analytics Tutorial For Beginners | Simplilearn

🔥Top Product Management Skills You Need to Succeed in 2026 #shorts #simplilearn

🔥Top Product Management Skills You Need to Succeed in 2026 #shorts #simplilearn

SQL For Data Analytics 2026 | Essential SQL Commands | SQL Tutorial For Beginners | Simplilearn

SQL For Data Analytics 2026 | Essential SQL Commands | SQL Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Paving Way To Success With AI & ML Course | Soumik’s Upskilling Journey

Simplilearn Reviews | Paving Way To Success With AI & ML Course | Soumik’s Upskilling Journey

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Learn Snowflake In 45 Mins | Snowflake Tutorial | What Is Snowflake | Snowflake Explained

Learn Snowflake In 45 Mins | Snowflake Tutorial | What Is Snowflake | Snowflake Explained

🔥ML Career Tip – How to Start Learning Machine Learning in 60 Seconds! #shorts#simplilearn

🔥ML Career Tip – How to Start Learning Machine Learning in 60 Seconds! #shorts#simplilearn

🔥Agile vs Waterfall in 60 Seconds #shorts #simplilearn

🔥Agile vs Waterfall in 60 Seconds #shorts #simplilearn

Excel Full Course 2026 | Excel Tutorial For Beginners | Microsoft Excel Course | Simplilearn

Excel Full Course 2026 | Excel Tutorial For Beginners | Microsoft Excel Course | Simplilearn

What Are AI Agents? | Types Of AI Agents | AI Agents Explained | AI Agents Tutorial | Simplilearn

What Are AI Agents? | Types Of AI Agents | AI Agents Explained | AI Agents Tutorial | Simplilearn

How To Create a Product Roadmap In 2026 | Product Roadmap | What Is Product Roadmap | Simplilearn

How To Create a Product Roadmap In 2026 | Product Roadmap | What Is Product Roadmap | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

🔥What Is Phishing? #shorts #simplilearn

🔥What Is Phishing? #shorts #simplilearn

Cloud Computing Full Course 2026 | Cloud Computing Tutorial | Cloud Computing Course | Simplilearn

Cloud Computing Full Course 2026 | Cloud Computing Tutorial | Cloud Computing Course | Simplilearn

Simplilearn Reviews | Overcoming Rejection & career plateau to finding a New Job : Bhaskar Banerji

Simplilearn Reviews | Overcoming Rejection & career plateau to finding a New Job : Bhaskar Banerji

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

VLSI Design Course 2026 | VLSI Tutorial For Beginners | VLSI Physical Design | Simplilearn

VLSI Design Course 2026 | VLSI Tutorial For Beginners | VLSI Physical Design | Simplilearn

Related AI Lessons

Super Mario is mathier than you think

Super Mario's world is full of mathematical concepts, making it a great example of how math is used in real-world problem-solving

MIT Technology Review

A Geometry Puzzle With 3 Circles

Solve a geometry puzzle involving 3 circles using mathematical reasoning and visualization techniques

Medium · Data Science

The Consecutive Integers Divisibility Trick

Learn the Consecutive Integers Divisibility Trick to simplify difficult proofs in mathematics and programming

Medium · Programming

The Mayans Invented Zero Before Most of the World — Here Is Their Number System in Python

Learn about the Mayan number system and its implementation in Python, highlighting the importance of zero in their base-20 system

Medium · Python

Chapters (10)

Introduction to Machine Learning Full Course 2025

2:04 Probablity and Statistics

48:23 Mathematics for machine learning

2:38:47 What is Machine Learning

2:47:26 Use of AI in Defense Sector

2:47:26 Machine Learning Roadmap in 2025

2:59:52 Machine learning Basics

4:01:52 Machine Learning Algorithms

4:58:24 Types Of Machine Learning

4:58:24 Bagg

How to Open OSM Files (OpenStreetMap Data)

File Extension Geeks