Accelerating Pandas with NVIDIA's cuDF: Basic Statistical Analysis and Data Cleaning (Ep. 06)
In this episode, we continue our series on NVIDIA's cuDF, a CUDA-accelerated version of Pandas. We'll focus on performing basic statistical analysis on a large dataset of 4.3 million newspaper articles, demonstrating the advantages of GPU acceleration. By comparing CPU and GPU performance, we showcase how tasks like word counts and text length calculations can be sped up dramatically using the NVIDIA RTX 5000 GPU. Additionally, we'll walk through essential data cleaning techniques to improve data quality.
00:00 Introduction to QDF and Video Overview
00:47 Exciting Hardware Setup for the Series
02:06 Loading and Preparing the Dataset
03:55 Performing Statistical Analysis on CPU
05:20 Accelerating Analysis with GPU
08:54 Identifying and Cleaning Bad Data
14:31 Conclusion and Next Steps
Notebook: https://github.com/wjbmattingly/cuda-python/blob/main/notebooks/cudf/05_manipulating_data.ipynb
Dell Workstation: https://www.dell.com/en-us/dt/ai-technologies/index.htm?utm_source=William&utm_medium=Content&utm_campaign=Polars
RTX 5000 Ada: https://nvda.ws/4gPZUpQ
RAPIDS: https://nvda.ws/3CwwcHK
Join this channel to get access to perks:
https://www.youtube.com/channel/UC5vr5PwcXiKX_-6NTteAlXw/join
If you enjoy this video, please subscribe.
✅Be my Patron: https://www.patreon.com/WJBMattingly
✅PayPal: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=AZ73QW52SUX8N¤cy_code=USD&source=url
If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
You can follow me at:
https://twitter.com/wjb_mattingly
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Related AI Lessons
Chapters (7)
Introduction to QDF and Video Overview
0:47
Exciting Hardware Setup for the Series
2:06
Loading and Preparing the Dataset
3:55
Performing Statistical Analysis on CPU
5:20
Accelerating Analysis with GPU
8:54
Identifying and Cleaning Bad Data
14:31
Conclusion and Next Steps
🎓
Tutor Explanation
DeepCamp AI