This Will Change Data Science as We Know It (ChatGPT)

Dave Ebbelaar · Beginner ·🧠 Large Language Models ·3y ago

Key Takeaways

The video demonstrates the capabilities of ChatGPT, a large language model trained by OpenAI, in assisting with data science tasks, specifically outlier detection in sensor data using Python and the IQR method.

Full Transcript

this is so good it's even specifying the types over here it actually did the job unless you've been living under a rock this thing has been popping up everywhere and for a good reason so me having a background in artificial intelligence I'm of course naturally interested in new developments like this and I've been putting it to the test but mainly just to play around with it see its capabilities but this day I was recording a new video for my YouTube series where we cover a complete machine learning project from start to finish using Python and I came up with the idea to put chat GPT to the test with an actual data science problem so the series that I was recording today was all about outlier detection how to detect outliers in sensor data and the results that I got were so good that I was like okay I have to stop what I'm doing right now I have to make a video about this because you guys have to see this before chat GPT turns into a paid model or something like that so now it's a free as you can see right here it's a free research preview and this is free for everyone so let me show you what I did this morning and let's see how awesome this is so in a YouTube series that I'm working on we're working with sensor data that is measuring accelerometer and gyroscope data not really important for this video right now but we want to create an outlier detection algorithm that can go over the first six columns so the ones that you are seeing right here and loop over all these numerical values and identify whether there are any outliers and then mark them as either true or false this is basically the request that I put or gave to gpt3 so let's see how it works so here we go create a python function that can Mark columns from a pandas data frame as outliers using the IQR method let's see what we get okay so it's thinking okay here it's starting okay so two Mark column says out Liars using IQ elements okay now it's actually creating code for us so a function Mark outliers IQR really good name so we have a bundles data frame and we have a column and then okay what does it do q1 Q3 looking good IQR yeah okay what else you got oh we got we even get an example wow okay so let's check this out so without changing anything we copy the code and we come back over here let me clear this up and now this function takes as input data frame so that is our day F and let's say we want to take the first column in our data frame so let's run this wow okay so we're getting an outlier column so it's all nands is that correct there are actually true values in here so it actually did the job so we have a function that can take a data frame as an input and a column and an output that same data frame with a new column called outlier and we can even make this even better to say okay we don't call this outlier but you say call and then plus and we do an underscore let me check so now we should have yes so we know which column it is okay this is really awesome and so now what if we say for call in outlier columns we do this and then we change it to that let's start with the fresh data frame again check the result for all of the six numerical columns within our data frame we now have a series indicating with either true or an N whether the value is an outlier or not so wow that is actually amazing right this is so cool let's let's do one more test so now create the same function but with the local outlier factor or love method so so let's see if this this actually works okay here it goes yes it's using a scikit learn local outlier Factor this is so good it's even special specifying the types over here so data frame is supposed to be bonus data frame and a column is supposed to be a string and output is a Ponders data frame this is such a proper way of writing a python function and I typically never do that because I'm too lazy to write it out so also like the name of this function so first Mark outliers IQR and then LOF is using the abbreviation of local outlier factor it's just like how can it make sense of all of that I find it so interesting this is legit a better function than I would be able to come up with on my own also like all the it's probably commented it's it's beautiful wow all I can say is that I am impressed like for real like artificial intelligence is here like it's here in front of us we can use it and it's free for everyone just go to chat.openai.com and play around with this I will definitely be putting this more to the test also for my my data science project but just from looking at this test alone it's there are so many possibilities with this and of course you still need your expert judgment and experience as a data scientist to determine whether this code is actually useful and can be applied to the problem at hand but this can save you so much time looking up certain syntax and writing out the specific structure of python functions and commenting everything like this is just it's beautiful it saves so much time and I think this can really help people that are new to data science as well like like just just showing by looking at examples of okay how do I do this how do I structure a function how do I comment code you basically have a mentor that you can look up to and then look okay how is my mentor in the in this case the AI writing this code and then you can learn from it and apply it to your own problems it it is really fascinating and this is this is going to change data science as we know it and not just data science coding and I would even go as far as saying like the world in general this is such a huge leap in like technology technological Improvement I would say it's just I'm really impressed man so go ahead and try it out play around with this is actually really fun now that's what I wanted to show you in today's quick video and I should probably get back to recording the video that I was supposed to be recording today now if you find this video helpful then please consider subscribing to the channel for those of you that are new here my name is Dave I work as a freelance data scientist I'm also the founder of data Lumina which is a coaching business for data professionals that want to learn how to start a business and on this YouTube channel we make videos about machine learning python data science and also freelancing so if that's your thing consider subscribing please like the video and then I'll see you in the next one wow what the actual

Original Description

In this video, I'll be showing you how you can use ChatGPT for data science and why it's such an awesome tool. ChatGPT is a large language model trained by OpenAI that can generate human-like text based on the input it receives. This makes it a powerful tool for data science, as it can help you generate insights and even create entire python functions based on your input. Check out the machine learning series here: https://youtube.com/playlist?list=PL-Y17yukoyy0sT2hoSQxn1TdV0J7-MX4K If you like data and entrepreneurship, then check out https://datalumina.io If you find these videos helpful, consider subscribing @daveebbelaar ​
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Dave Ebbelaar · Dave Ebbelaar · 25 of 60

1 How to Install Homebrew on Mac (Getting Started)
How to Install Homebrew on Mac (Getting Started)
Dave Ebbelaar
2 How to Install Python on Mac (Homebrew)
How to Install Python on Mac (Homebrew)
Dave Ebbelaar
3 How to Install Anaconda on Mac (Getting Started)
How to Install Anaconda on Mac (Getting Started)
Dave Ebbelaar
4 How to Set up VS Code for Data Science & AI
How to Set up VS Code for Data Science & AI
Dave Ebbelaar
5 How to Use Git in VS Code for Data Science
How to Use Git in VS Code for Data Science
Dave Ebbelaar
6 Data Science Desk Setup to Maximize Productivity
Data Science Desk Setup to Maximize Productivity
Dave Ebbelaar
7 THIS Is How I Write Clean Data Science Code EVERY TIME
THIS Is How I Write Clean Data Science Code EVERY TIME
Dave Ebbelaar
8 Data Science Tutorial - Project Structure
Data Science Tutorial - Project Structure
Dave Ebbelaar
9 Changing rcParams for Better Data Science Plots | Matplotlib Tutorial
Changing rcParams for Better Data Science Plots | Matplotlib Tutorial
Dave Ebbelaar
10 How to Read Excel Files with Python (Pandas Tutorial)
How to Read Excel Files with Python (Pandas Tutorial)
Dave Ebbelaar
11 My Data Science Journey (Zero to Freelance)
My Data Science Journey (Zero to Freelance)
Dave Ebbelaar
12 How I Automate Data Visualization in Python
How I Automate Data Visualization in Python
Dave Ebbelaar
13 16 Apps I Use Daily as a Data Scientist
16 Apps I Use Daily as a Data Scientist
Dave Ebbelaar
14 How to Manage Conda Environments for Data Science
How to Manage Conda Environments for Data Science
Dave Ebbelaar
15 How to Export Machine Learning Models in Python
How to Export Machine Learning Models in Python
Dave Ebbelaar
16 VS Code Speed Hack for Data Science
VS Code Speed Hack for Data Science
Dave Ebbelaar
17 17 VS Code Tips That Will Change Your Data Science Workflow
17 VS Code Tips That Will Change Your Data Science Workflow
Dave Ebbelaar
18 How to Predict the Future with Python (Forecasting Tutorial)
How to Predict the Future with Python (Forecasting Tutorial)
Dave Ebbelaar
19 How to Use Python Environment Variables
How to Use Python Environment Variables
Dave Ebbelaar
20 7 Data Science Tips for Beginners in 2023
7 Data Science Tips for Beginners in 2023
Dave Ebbelaar
21 How to Effectively Use the Data Science Lifecycle
How to Effectively Use the Data Science Lifecycle
Dave Ebbelaar
22 Full Machine Learning Project — Coding a Fitness Tracker with Python (Part 1)
Full Machine Learning Project — Coding a Fitness Tracker with Python (Part 1)
Dave Ebbelaar
23 Full Machine Learning Project — Processing Raw Data (Part 2)
Full Machine Learning Project — Processing Raw Data (Part 2)
Dave Ebbelaar
24 Full Machine Learning Project — Data Visualization with Matplotlib (Part 3)
Full Machine Learning Project — Data Visualization with Matplotlib (Part 3)
Dave Ebbelaar
This Will Change Data Science as We Know It (ChatGPT)
This Will Change Data Science as We Know It (ChatGPT)
Dave Ebbelaar
26 Full Machine Learning Project — Detecting Outliers in Sensor Data (Part 4)
Full Machine Learning Project — Detecting Outliers in Sensor Data (Part 4)
Dave Ebbelaar
27 Full Machine Learning Project — Low-pass Filter & Principal Component Analysis (Part 5a)
Full Machine Learning Project — Low-pass Filter & Principal Component Analysis (Part 5a)
Dave Ebbelaar
28 Full Machine Learning Project — Fourier Transformation & Clustering (Part 5b)
Full Machine Learning Project — Fourier Transformation & Clustering (Part 5b)
Dave Ebbelaar
29 Full Machine Learning Project — Predictive Modelling (Part 6)
Full Machine Learning Project — Predictive Modelling (Part 6)
Dave Ebbelaar
30 Automate Machine Learning with ChatGPT
Automate Machine Learning with ChatGPT
Dave Ebbelaar
31 Scraping Web Datasets for Data Science Projects
Scraping Web Datasets for Data Science Projects
Dave Ebbelaar
32 Full Machine Learning Project — Counting Repetitions (Part 7)
Full Machine Learning Project — Counting Repetitions (Part 7)
Dave Ebbelaar
33 How to Use GitHub Copilot for Data Science (Python + VS Code)
How to Use GitHub Copilot for Data Science (Python + VS Code)
Dave Ebbelaar
34 Every Beginner Data Scientist Should Understand This
Every Beginner Data Scientist Should Understand This
Dave Ebbelaar
35 Revealing My New AI-Powered Data Science Workflow
Revealing My New AI-Powered Data Science Workflow
Dave Ebbelaar
36 Auto-GPT Tutorial - Create Your Personal AI Assistant 🦾
Auto-GPT Tutorial - Create Your Personal AI Assistant 🦾
Dave Ebbelaar
37 Build Your Own Auto-GPT Apps with LangChain (Python Tutorial)
Build Your Own Auto-GPT Apps with LangChain (Python Tutorial)
Dave Ebbelaar
38 Building Slack AI Assistants with Python & LangChain
Building Slack AI Assistants with Python & LangChain
Dave Ebbelaar
39 ChatGPT Code Interpreter - Goodbye Data Analysts?
ChatGPT Code Interpreter - Goodbye Data Analysts?
Dave Ebbelaar
40 How to Deploy AI Apps to the Cloud with Flask & Azure
How to Deploy AI Apps to the Cloud with Flask & Azure
Dave Ebbelaar
41 How to Build an AI Document Chatbot in 10 Minutes
How to Build an AI Document Chatbot in 10 Minutes
Dave Ebbelaar
42 Is Falcon LLM the OpenAI Alternative? An Experimental Setup with LangChain
Is Falcon LLM the OpenAI Alternative? An Experimental Setup with LangChain
Dave Ebbelaar
43 GPT Engineer... Generate an entire codebase with one prompt
GPT Engineer... Generate an entire codebase with one prompt
Dave Ebbelaar
44 Pandas DataFrame Agent... the future of data analysis?
Pandas DataFrame Agent... the future of data analysis?
Dave Ebbelaar
45 OpenAI Function Calling - Full Beginner Tutorial
OpenAI Function Calling - Full Beginner Tutorial
Dave Ebbelaar
46 How to use ChatGPT's new “Code Interpreter” feature
How to use ChatGPT's new “Code Interpreter” feature
Dave Ebbelaar
47 LangChain just launched their new "LangSmith" platform
LangChain just launched their new "LangSmith" platform
Dave Ebbelaar
48 How I'd Learn AI (if I could start over)
How I'd Learn AI (if I could start over)
Dave Ebbelaar
49 I Used AI To Scrape The Web & Write PDF Reports
I Used AI To Scrape The Web & Write PDF Reports
Dave Ebbelaar
50 LangSmith Tutorial - LLM Evaluation for Beginners
LangSmith Tutorial - LLM Evaluation for Beginners
Dave Ebbelaar
51 7 Lessons for New AI Engineers - Beginner’s Guide
7 Lessons for New AI Engineers - Beginner’s Guide
Dave Ebbelaar
52 The Rise of the "New-Age" Machine Learning Engineer
The Rise of the "New-Age" Machine Learning Engineer
Dave Ebbelaar
53 OpenAI Assistants Tutorial for Beginners
OpenAI Assistants Tutorial for Beginners
Dave Ebbelaar
54 How To Connect OpenAI To WhatsApp (Python Tutorial)
How To Connect OpenAI To WhatsApp (Python Tutorial)
Dave Ebbelaar
55 How to Build Chatbot Interfaces with Python
How to Build Chatbot Interfaces with Python
Dave Ebbelaar
56 PostgreSQL as VectorDB - Beginner Tutorial
PostgreSQL as VectorDB - Beginner Tutorial
Dave Ebbelaar
57 My MacBook Setup (as a coder & business owner)
My MacBook Setup (as a coder & business owner)
Dave Ebbelaar
58 Easiest Way to Connect AI Chatbots to WhatsApp
Easiest Way to Connect AI Chatbots to WhatsApp
Dave Ebbelaar
59 ClickUp Tutorial - What Is ClickUp Brain? 🧠
ClickUp Tutorial - What Is ClickUp Brain? 🧠
Dave Ebbelaar
60 My Development Workflow for Data & AI Projects
My Development Workflow for Data & AI Projects
Dave Ebbelaar

The video showcases ChatGPT's ability to generate Python code for outlier detection in sensor data, making it a valuable tool for data scientists. By leveraging ChatGPT, data scientists can save time and improve their workflow.

Key Takeaways
  1. Create a Python function to mark columns as outliers using the IQR method
  2. Use ChatGPT to generate the function
  3. Test the function on sample data
  4. Refine the function to use the local outlier factor method
  5. Apply the function to a dataset
💡 ChatGPT can generate high-quality Python code for data science tasks, making it a powerful tool for data scientists

Related AI Lessons

Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →