Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps

Data Professor · Beginner ·📄 Research Papers Explained ·6y ago

Key Takeaways

The video discusses the CRISP-DM framework, a 6-step process for data mining and data science projects, and its application in a pediatric research project. The steps include Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.

Full Transcript

and it would also be nice to do something about Chris diem is for crossed industry standard process for doing data mining so that muscles for the class of six major steps so when you find on google c RI SP - p.m. so it has six major staffs so it's going to be is a process diagram or communicate point it's the best practice so instead of tagging on its word you could follow when you want to build your first data mining birthday so they're Christian framework a business understanding so it's a first step so this is understanding completing your domain understanding soul in your closet is your domain understanding all seven of the children data that you calculate so that's your business so the thing is your objective of your project you have already told us that you want to correlate some parameters relating to the cognitive function it's a good function and and I'll see you looking into the height weight and BMI that's a common factor whether whether they underneath Olympic offers a question so you want to see whether we don't be speed or over weights will have any influence on their ability in class right and you thought he could it be that the exam so that's a revision is understanding so the next thing is related to be understanding so it in understanding context we have to get to know your data so we're lacking your domain so we have to talk to you so it would be nice if you have like a definition of each of each variable so that we can see and it is possible if you could group them because they each spot variables are cognitive functions defined variables for executive function so that the thing is if we are to analyze it as a group we might be able to compare the tracks if you have MS and very messy function versus copies of function you see whether they have any difference for comparing any much mass spec so the next thing will be to diligently the exploration so will people use SPSS or abuse basics itself yes comparer may cut count in the median percentage aviation of beast variable and remember when but we have the class they mention before class that you separated according to the BMI ready not to swim the whole weight by the way their normal right so for each of them we do stratifying did we do stratification of the data and for each of the four group you calculate the mean value and standard deviation money group for each variable box of memories of oil actually how many diction you have the motion control we healthy the parameters related to density function and the publicly Punishers we chose China by your class would it be uh normal whether they are slimming whether they are over Greece will be able to see whether we have any difference in the exit function and cognitive function so we will learn a lot from the data exploration is another first down and then we can decide that your how we can create our model maybe we want to stop anything together and build what the model or maybe want to selectively take out some variable or we want to compare contrast set up their faces maybe one of them compare executed function versus cognitive function whether they have any influence on their ability to learn and really of doing the analysis maybe if you go by the bar or an icon we can color them you could color the group of normal the group of puppies the boom of slim and then for each school and for each variable we'll be able to see the relative distribution will be able to see in general we can even use machine learning like principal component analysis so then we use in any general as the distribution of see the students on its label suited from the whole weight room right circle in the normal group with a big hockey game distribution there are similar among each of the school in terms of their ability to learn as measured by hypothesis function burgers and a special finding itself in foster care reports so there's a lot of things that we do and david preparation rightly the preparation is something there are missing values or maybe there will be some accuracy or that is maybe some features that we combine some features will be transformed if you will have a distribution of being over the feature if the feature distribution is not normally distributed or it's not so less is first you might need for beta we might do some walk transformation so that we can make the distribution on unit one right and then the next thing will be to actually create the model right so if this would be past summer will be the set that we turn off so if you got quality they tell me we can do classification or in that right you could use something like this you should treat right or we could use other black box analytically what we're going to seen then we'll network that's a good question but the thing I like about decision tree and record forest is that it allows us to interpret the moment so we can understand yes so you get an understanding of the inside of the prediction wall I just work what is panel what variables are important whether the top of the function is high and low would it make learning of your group student better or worse and the evaluation of assistance as I mention one little although we enjoyed them all you look at the feature we should be chartered plane and then that will help us to either the employee if we create that the model is satisfactory and then we will deploy the model by the deployment model we could also develop a date but navigation and then we give the web application to the teachers at school and the teacher from school can then measure or evaluate the group of the students based on the survey and we'll get to them right so you give them the survey that you use to collect the bingo sent me up here the assassins are ready to eat made into a web application and this to enter and then get a immediate feedback how can they what is theater competence function or learning capability score if we get that public school of internet obviously functions right it also positivity of the suggest us on what parameter the teacher or the Tyrians should put on that little aside right now so that the public discuss would be enhanced station we also just help out other students five years below under cognitive skills on what happiness a popular difference this has been scored low it is animos angular's so that tear pockets of individual who need asked instead of a fantastic example for our making of their site empty shell memories killing for the time but Monica should quite the earth Katie strength should know why we English woman primary venue were specific variable where we are we have this beautiful birthday birthday points and thickeners the development of each child should be customized retirement all child have the same positive capability right there might be other confounding factors that might influence the results very functional I suppose this means they're just the attacks let well maybe they're distracted so they're on the other issue that might influence the score right or muscle so that these might be some of the limitations of this study like so none the thing is how can find other variables to accurately measure they're outside the test yes maybe observation based on the entire semester based on only one snapshot in time apparently that night and then the quality they into the test and he didn't perform well they doesn't mean that he's not informing well but it might mean that maybe he's exhausted and just because of that baby he performed on surreal so there are other factors that determine coming for the teacher to be on a continual basis progression over time or maybe we could do it like a like average for an entire semester to something that then questions a student complicated function or a function over the life time spans there surpass that is only these small fragments of what actually happened that's maybe she'll hotner yes so follow up would be nice name and if there's possibility of Shawn not sure about like long-term measurements you know together baseline a noisy place maybe if you just you know measure it in one time frame and I gotta capture all of the sound in the room or all the week on signal right you know make sure there are signal so the signal could happen anytime and then when we're catching at time 1 to 5 maybe the signal will appear in time 6 to 10 so if you look capture the only scene of all the ice pack we might miss out on something for the signal so say this one s very good so obvious think that we should repeat we should we should we should review this new reservations maybe move me out of it that means not able to inspire friends thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos

Original Description

In this video, we will be discussing about the 6 steps of a Data Mining / Data Science project in this discussion with Dr. Sarun Kunwittaya on his pediatric research. Joining in on the discussion is Mohammad Rafiq Malik, a Data Science Intern in the research group. 🌟 Buy me a coffee: https://www.buymeacoffee.com/dataprofessor ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds ✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds ⭕ Subscribe: If you're new here, it would mean the world to me if you would consider subscribing to this channel. ✅ Subscribe: https://www.youtube.com/dataprofessor?sub_confirmation=1 ⭕ Recommended Tools: Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it! ✅ Check out Kite: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=dataprofessor&utm_content=description-only ⭕ Recommended Books: ✅ Hands-On Machine Learning with Scikit-Learn : https://amzn.to/3hTKuTt ✅ Data Science from Scratch : https://amzn.to/3fO0JiZ ✅ Python Data Science Handbook : https://amzn.to/37Tvf8n ✅ R for Data Science : https://amzn.to/2YCPcgW ✅ Artifi
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 12 of 60

1 How a Biologist became a Data Scientist
How a Biologist became a Data Scientist
Data Professor
2 WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
3 WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
4 WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
5 Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
6 Quotes #1 on Big Data and Data Science
Quotes #1 on Big Data and Data Science
Data Professor
7 Quotes #2 on Big Data and Data Science
Quotes #2 on Big Data and Data Science
Data Professor
8 Quotes #3 on Big Data and Data Science
Quotes #3 on Big Data and Data Science
Data Professor
9 Quotes #4 on Big Data and Data Science
Quotes #4 on Big Data and Data Science
Data Professor
10 Quotes #5 on Big Data and Data Science
Quotes #5 on Big Data and Data Science
Data Professor
11 Data Science 101: Starting a Data Science / Data Mining Project
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
13 R Programming 101: How to Define Variables
R Programming 101: How to Define Variables
Data Professor
14 R Programming 101: Read and Write CSV files
R Programming 101: Read and Write CSV files
Data Professor
15 Data Science 101: Basic Command-Line for Data Science
Data Science 101: Basic Command-Line for Data Science
Data Professor
16 Strategies for Learning Data Science in 2020 (Data Science 101)
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
17 Building your Data Science Portfolio with GitHub (Data Science 101)
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
18 R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
19 Exploratory Data Analysis in R: Towards Data Understanding
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
20 Exploratory Data Analysis in R: Quick Dive into Data Visualization
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
21 Machine Learning in R: Building a Classification Model
Machine Learning in R: Building a Classification Model
Data Professor
22 Machine Learning in R: Repurpose Machine Learning Code for New Data
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
23 Data Science 101: Deploying your Machine Learning Model
Data Science 101: Deploying your Machine Learning Model
Data Professor
24 Machine Learning in R: Deploy Machine Learning Model using RDS
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
25 Data Pre-processing in R: Handling Missing Data
Data Pre-processing in R: Handling Missing Data
Data Professor
26 Machine Learning in R: Speed up Model Building with Parallel Computing
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
27 Data Science 101: Overview of Machine Learning Model Building Process
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
28 Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
29 Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
30 Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
31 Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
32 Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
33 Machine Learning in R: Building a Linear Regression Model
Machine Learning in R: Building a Linear Regression Model
Data Professor
34 What programming language to learn for Data Science? R versus Python
What programming language to learn for Data Science? R versus Python
Data Professor
35 How to Become a Data Scientist (Learning Path and Skill Sets Needed)
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
36 Using Python in R
Using Python in R
Data Professor
37 Interpretable Machine Learning Models
Interpretable Machine Learning Models
Data Professor
38 Making Scatter Plots in R [Data Visualisation in R series]
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
39 Machine Learning in Python: Building a Classification Model
Machine Learning in Python: Building a Classification Model
Data Professor
40 Compare Machine Learning Classifiers in Python
Compare Machine Learning Classifiers in Python
Data Professor
41 Hyperparameter Tuning of Machine Learning Model in Python
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
42 Practical Introduction to Google Colab for Data Science
Practical Introduction to Google Colab for Data Science
Data Professor
43 File Handling in Google Colab for Data Science
File Handling in Google Colab for Data Science
Data Professor
44 Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
45 Machine Learning in Python: Building a Linear Regression Model
Machine Learning in Python: Building a Linear Regression Model
Data Professor
46 Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
47 How to Plot an ROC Curve in Python | Machine Learning in Python
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
48 Installing conda on Google Colab for Data Science
Installing conda on Google Colab for Data Science
Data Professor
49 Use native R on Google Colab for Data Science
Use native R on Google Colab for Data Science
Data Professor
50 How to Save and Download files from Google Colab
How to Save and Download files from Google Colab
Data Professor
51 Easy Web Scraping in Python using Pandas for Data Science
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
52 Data Science for Computational Drug Discovery using Python (Part 1)
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
53 Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
54 Exploratory Data Analysis in Python using pandas
Exploratory Data Analysis in Python using pandas
Data Professor
55 Quick tour of PyCaret (a low-code machine learning library in Python)
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
56 How to Upload Files to Google Colab
How to Upload Files to Google Colab
Data Professor
57 How to Install and Use Pandas Profiling on Google Colab
How to Install and Use Pandas Profiling on Google Colab
Data Professor
58 How to Adjust the Style of Pandas DataFrame
How to Adjust the Style of Pandas DataFrame
Data Professor
59 How to use Bamboolib for Data Wrangling in Data Science
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
60 How to use Pandas Profiling on Kaggle
How to use Pandas Profiling on Kaggle
Data Professor

The CRISP-DM framework is a 6-step process for data mining and data science projects. It includes Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. This framework is essential for conducting research and applying data science techniques.

Key Takeaways
  1. Define the objective of the project
  2. Identify the variables and get to know the data
  3. Clean, transform, and format the data for analysis
  4. Create a model to solve the problem
  5. Test and validate the model
  6. Deploy the model
💡 Long-term measurements are necessary to capture a student's true ability, and signal can appear at any time and may be missed if only one time frame is captured.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Beyond Big Vendors: ERP Systems Explained #shorts
Digital Transformation with Eric Kimberling
Watch →