Data Noise | Introduction to Data Mining | Part 7
Skills:
Data Literacy80%
Key Takeaways
The video discusses data noise in the context of data mining, including its causes, effects, and measurement, with examples and visual illustrations to help understand the concept.
Full Transcript
so those of you who have a scientific or signal-processing background are probably familiar with the term noise noise in the data science context is when we have an invalid signal of some sort that overlaps valid data this obscures our our actual attribute values and fundamentally what it means is that some of our data objects have invalid values in some of the attributes they don't have real the inaccurate values there so examples of this in real life we have the distortion of a person's voice over the phone snow on old television screens peeled CRT television screens noise can appear because of human inconsistency in labeling you see this a lot in sports for instance that require human judging there's a lot of inconsistency in how people get labeled there and just in general if you're trying to say ranked websites for instance human constancy and lays in labeling can be a real problem so as sort of a practical example of what noise can do when there's a lot of it this is a pretty straightforward signal we've got two sine waves here with different with different frequencies but the same amplitude there's a blue one and a green one and if we in so we could generate the sine wave it looks very clean very pretty we can even sort of distinguish the two different sine waves if we add those two waves together and then throw noise at it just sort of basic white noise like you might see in any kind of randomization thing and you end up with something that looks like this so the noise has completely obscured our actual signal so noise is again fundamentally invalid data points that are that are obscuring our signals we have to be there's always some noise in any system it's just the nature of the universe sadly but understanding where your noise is at its worst and how you can deal with it is very important but even recognizing that it's there is the first step recognizing which of your attributes are noisy versus which are not are more noisy which is that what versus which of them are less noisy sort of the the complementary problem complimentary problem to noise is the problem of outliers so outliers often look like noise at first their data objects that have characteristics that are considerably different from most of the other objects in the data set so if we look at sort of the the visual here we've got some sort of two-dimensional graphing of our data and most of our to each dot each pixel point represents a data object that's been plotted on a graph so we've got you know four clusters very Nut kind of nicely defined clusters and then we've got these three other points just kind of hanging out in the middle of nowhere far away from all of the other data so the big distinction between outliers are that between outliers and noise is that outliers are actually valid values the data was collected properly it's clean but it's outside of the normal range the data object for some reason doesn't look like a normal object all right so so that's outliers and noise those are sort of the first category of data quality problems that get encountered a lot
Original Description
In this data mining fundamentals tutorial, we discuss data noise that can overlap valid data and outliers. Data noise can appear because of human inconsistency and labeling. We will provide you with several examples of data noise, and how data noise can be measured and recorded.
Table of Contents:
0:00 Introduction
0:10 Noise
2:52 Outliers
--
At Data Science Dojo, we believe data science is for everyone. Our data science trainings have been attended by more than 10,000 employees from over 2,500 companies globally, including many leaders in tech like Microsoft, Google, and Facebook. For more information please visit: https://hubs.la/Q01Z-13k0
💼 Learn to build LLM-powered apps in just 40 hours with our Large Language Models bootcamp: https://hubs.la/Q01ZZGL-0
💼 Get started in the world of data with our top-rated data science bootcamp: https://hubs.la/Q01ZZDpt0
💼 Master Python for data science, analytics, machine learning, and data engineering: https://hubs.la/Q01ZZD-s0
💼 Explore, analyze, and visualize your data with Power BI desktop: https://hubs.la/Q01ZZF8B0
--
Unleash your data science potential for FREE! Dive into our tutorials, events & courses today!
📚 Learn the essentials of data science and analytics with our data science tutorials: https://hubs.la/Q01ZZJJK0
📚 Stay ahead of the curve with the latest data science content, subscribe to our newsletter now: https://hubs.la/Q01ZZBy10
📚 Connect with other data scientists and AI professionals at our community events: https://hubs.la/Q01ZZLd80
📚 Checkout our free data science courses: https://hubs.la/Q01ZZMcm0
📚 Get your daily dose of data science with our trending blogs: https://hubs.la/Q01ZZMWl0
--
📱 Social media links
Connect with us: https://www.linkedin.com/company/data-science-dojo
Follow us: https://twitter.com/DataScienceDojo
Keep up with us: https://www.instagram.com/data_science_dojo/
Like us: https://www.facebook.com/datasciencedojo
Find us: https://www.threads.net/@d
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data Science Dojo · Data Science Dojo · 57 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
▶
58
59
60
Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Data Science Dojo
Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Science Dojo
Reading External Data Sources | Beginning Azure ML | Part 2
Data Science Dojo
Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Data Science Dojo
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Data Science Dojo
Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Data Science Dojo
Feature Engineering & R Script | Beginning Azure ML | Part 6
Data Science Dojo
Building Your First Model | Beginning Azure ML | Part 7
Data Science Dojo
Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Data Science Dojo
Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Data Science Dojo
Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Data Science Dojo
Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Data Science Dojo
Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Data Science Dojo
Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Data Science Dojo
David Wechsler on the Impact of Data Science Bootcamp
Data Science Dojo
Andrew Choi on the Impact of Data Science Bootcamp
Data Science Dojo
Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Data Science Dojo
Michael DAndrea on the Impact of Data Science Bootcamp
Data Science Dojo
Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Science Dojo
Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Data Science Dojo
Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Data Science Dojo
Scale R to Big Data with Hadoop & Spark | Community Webinar
Data Science Dojo
Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Data Science Dojo
Ryan DeMartino on the Impact of Data Science Bootcamp
Data Science Dojo
Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Data Science Dojo
Wade Wimer on the Impact of Data Science Bootcamp
Data Science Dojo
Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Data Science Dojo
Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Data Science Dojo
Lance Milner on the Impact of Data Science Bootcamp
Data Science Dojo
Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Data Science Dojo
Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Data Science Dojo
Michael Atlin on the Impact of Data Science Bootcamp
Data Science Dojo
Amina Tariq's In-Person Experience at Data Science Bootcamp
Data Science Dojo
Ceo's Revelation about Data Science Bootcamp
Data Science Dojo
Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Data Science Dojo
Kevin Hillaker on the Impact of Data Science Bootcamp
Data Science Dojo
Marko Topalovic's Experience with Data Science Bootcamp
Data Science Dojo
Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Data Science Dojo
Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Data Science Dojo
Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Data Science Dojo
Vang Xiong on the Impact of Data Science Bootcamp
Data Science Dojo
Data Scientist's Experience at Our Data Science Bootcamp
Data Science Dojo
Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Data Science Dojo
Introduction To Titanic Kaggle Competition | Part 1
Data Science Dojo
Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Data Science Dojo
Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Data Science Dojo
How To Do Titanic Kaggle Competition in R | Part 3.1
Data Science Dojo
How to do the Titanic Kaggle competition in R | Part 3.1
Data Science Dojo
Delve Deeper into Data Science with Data Science Bootcamp
Data Science Dojo
Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Data Science Dojo
Shaena Montanari on the Impact of Data Science Bootcamp
Data Science Dojo
Types of Sampling | Introduction to Data Mining | Part 12
Data Science Dojo
Sampling for Data Selection | Introduction to Data Mining | Part 11
Data Science Dojo
Data Aggregation | Introduction to Data Mining | Part 10
Data Science Dojo
Data Cleaning | Introduction to Data Mining | Part 9
Data Science Dojo
Missing & Duplicated Data | Introduction to Data Mining | Part 8
Data Science Dojo
Data Noise | Introduction to Data Mining | Part 7
Data Science Dojo
Graph and Ordered Data | Introduction to Data Mining | Part 5
Data Science Dojo
Document Data & Transaction Data | Introduction to Data Mining | Part 4
Data Science Dojo
Data Quality | Introduction to Data Mining | Part 6
Data Science Dojo
More on: Data Literacy
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Building LSTMs with PyTorch and Lightning AI Part 7: Resuming Training with Checkpoints
Dev.to · Rijul Rajesh
How AI Learns with Less Labeled Data
Medium · AI
Comparing Sarvam-30B and Qwen2.5–14B on Spider Text-to-SQL: An Active-Parameter Perspective
Medium · LLM
Debugging Benchmark: DeepSeek V4 Pro vs MiMo V2.5 Pro
Dev.to · Stanislav
Chapters (3)
Introduction
0:10
Noise
2:52
Outliers
🎓
Tutor Explanation
DeepCamp AI