RAPIDS: GPU-Accelerated Data Analytics & Machine Learning

NVIDIA Developer · Advanced ·📊 Data Analytics & Business Intelligence ·5y ago

Key Takeaways

The video demonstrates the use of RAPIDS, a suite of GPU-accelerated data analytics and machine learning libraries, to analyze and predict taxi fares using the Kaggle New York City taxicab data set.

Full Transcript

a data set that's very familiar to data scientists is a kaggle new york city taxicab data set we brought this data set into the omni-side gpu accelerated analytics platform where we can visualize and manipulate the tabular data in real time as well as quickly explore the data this data set contains records of taxi fares for a 12-month period and has information such as time passenger count as well as geographic locations the dot on the map are drop off locations where the size of the dot indicates how long the trip was and the color is the tip amount zero being blue and the max value being yellow you can see there's a lot of blue indicating zero tip which is a bit odd but as you explore the data you'll realize that the only way tips are being tracked is through credit card transactions which is not a true indicator of what trips truly are and is skewing the average tip amount i can show only credit card transactions by clicking on credit within this pie chart notice how the average tip changed from 11 to 20 percent the omni-side gpu accelerated analytics platform is based upon a gpu accelerated database that runs entirely in gpu memory it allows me as a business intelligence analyst to quickly gain insights and visualize large amounts of data i can build charts based upon this data so that i can further analyze maybe i wanted to show where taxi fares that are requiring larger vehicles here are the fares that had nine passengers for example i can add trips that had eight or seven passengers as well this allows me to gain an understanding where larger vehicles were requested it's very fast and a great way to allow data exploration across the enterprise this next part is where the data scientist comes into play what if you wanted to predict taxi fares maybe when somebody submitted a taxi fare request on an application like an app on their phone you wanted to give them an estimate and you wanted these estimates to be based upon real taxi fares that happened at the beginning of the month so that's where a data scientist comes in and you can use rapids to do those predictions rapids is an end-to-end open source ecosystem of gpu accelerated data science libraries that run on nvidia gpus a data scientist interacts with rapid libraries using a jupiter notebook and omnisci has jupiter notebook integration into their application the advantage to using rapids is that the data is coming directly from the omni-side gpu accelerated database no need to go back to csvs once the data is in the data frame we can use rapids and an etl workflow which imports inspects and cleans the data as well as selects a training set which is used for training a model to predict taxi fares this end-to-end workflow of data processing and model training is faster in using less hardware than cpu only so let's begin the rapids workflow here i go in and i import in libraries i connect to the data source and this brings in the data here is the data the next thing i'll do is inspect and clean up the data for example i have columns from 2014 that were named a certain way but they did not map the same in 2015. well through rapids you can concatenate fields and do data cleanup that's what we're doing here this is the data cleanup phase let's go on to the next part of the rapids workflow i can add new columns or new interesting features for example if i didn't have a way to understand if a taxi fare was on a weekend or maybe i wanted to know what day of the week it was without having to read it from a date field i can create that data and in this case i did i added it to the end of the data set i also used a distance calculator so i could see the distance of the trip itself as you can see this data has been added and exists as new columns within the data set the next part of the workflow is to pick a training set so here i will use 75 of the data to do training on and then i will use the remaining 25 to test my model against so i create that training set and we do the training it uses xg boost to do the training and lastly we test our model against that remaining 25 percent we then calculate a root mean square error which is basically a standard deviation to see how close we can get between the predictions and the ground truth the end result was that the taxi fare was within a two dollar range which i think is pretty fair estimate to give to taxi riders now that we have predictions we can bring this data back into the omni sci dashboard so that we can visualize and compare predictions against actual data the graph predicted amount versus actual fare provides this visual comparison blue are more accurate predictions whereas red are less accurate

Original Description

The RAPIDS suite of software libraries, built on CUDA-X AI, gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes. This demonstration uses RAPIDS, and OmniSci’s GPU-accelerated analytics platform to quickly visualize and run queries on the 1.1 billion New York City taxi ride dataset. "To learn more about RAPIDS and to try using GPU-accelerated analytics using and open-source version of OmniSci (including sample data), please visit https://developer.nvidia.com/rapids https://rapids.ai/ https://ngc.nvidia.com/catalog/containers/partners:omnisci-os "
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from NVIDIA Developer · NVIDIA Developer · 35 of 60

1 Ray Tracing Essentials Part 2: Rasterization versus Ray Tracing
Ray Tracing Essentials Part 2: Rasterization versus Ray Tracing
NVIDIA Developer
2 Ray Tracing Essentials Part 3: Ray Tracing Hardware
Ray Tracing Essentials Part 3: Ray Tracing Hardware
NVIDIA Developer
3 Ray Tracing Essentials Part 4: The Ray Tracing Pipeline
Ray Tracing Essentials Part 4: The Ray Tracing Pipeline
NVIDIA Developer
4 NsightGraphics 2020 2 Release Spotlight
NsightGraphics 2020 2 Release Spotlight
NVIDIA Developer
5 Ray Tracing Essentials Part 5: Ray Tracing Effects
Ray Tracing Essentials Part 5: Ray Tracing Effects
NVIDIA Developer
6 Ray Tracing Essentials Part 6: The Rendering Equation
Ray Tracing Essentials Part 6: The Rendering Equation
NVIDIA Developer
7 Ray Tracing Essentials Part 7: Denoising for Ray Tracing
Ray Tracing Essentials Part 7: Denoising for Ray Tracing
NVIDIA Developer
8 Spatiotemporal Importance Resampling for Many-Light Ray Tracing (ReSTIR)
Spatiotemporal Importance Resampling for Many-Light Ray Tracing (ReSTIR)
NVIDIA Developer
9 Announcing Cloud-Native Support for Jetson Platform
Announcing Cloud-Native Support for Jetson Platform
NVIDIA Developer
10 JetsonTV: Build your next project with NVIDIA Jetson
JetsonTV: Build your next project with NVIDIA Jetson
NVIDIA Developer
11 Nsight Compute Feature Spotlight: Roofline Analysis, Asynchronous Copy, Sparse Data Compression
Nsight Compute Feature Spotlight: Roofline Analysis, Asynchronous Copy, Sparse Data Compression
NVIDIA Developer
12 Nsight Systems Feature Spotlight: OpenMP
Nsight Systems Feature Spotlight: OpenMP
NVIDIA Developer
13 Isaac Sim 2020: Deep Dive
Isaac Sim 2020: Deep Dive
NVIDIA Developer
14 NVIDIA Jetson: Enabling AI-Powered Autonomous Machines at Scale
NVIDIA Jetson: Enabling AI-Powered Autonomous Machines at Scale
NVIDIA Developer
15 NVIDIA Tools to Train, Build, and Deploy Intelligent Vision Applications at the Edge
NVIDIA Tools to Train, Build, and Deploy Intelligent Vision Applications at the Edge
NVIDIA Developer
16 Jetson Xavier NX Developer Kit: The Next Leap in Edge Computing
Jetson Xavier NX Developer Kit: The Next Leap in Edge Computing
NVIDIA Developer
17 Synthesizing High-Resolution Images with StyleGAN2
Synthesizing High-Resolution Images with StyleGAN2
NVIDIA Developer
18 NVIDIA Robotics: Isaac SDK and Sim 2020.1
NVIDIA Robotics: Isaac SDK and Sim 2020.1
NVIDIA Developer
19 Accelerating COVID-19 Research with GPUs
Accelerating COVID-19 Research with GPUs
NVIDIA Developer
20 Visualizing 150 Terabytes of Data
Visualizing 150 Terabytes of Data
NVIDIA Developer
21 Boosting Performance and Utilization with Multi-Instance GPU
Boosting Performance and Utilization with Multi-Instance GPU
NVIDIA Developer
22 Running Multiple Workloads on a Single A100 GPU
Running Multiple Workloads on a Single A100 GPU
NVIDIA Developer
23 NVIDIA Nsight Feature Spotlight: GPU Trace
NVIDIA Nsight Feature Spotlight: GPU Trace
NVIDIA Developer
24 Spark 3 Demo: Comparing Performance of GPUs vs. CPUs
Spark 3 Demo: Comparing Performance of GPUs vs. CPUs
NVIDIA Developer
25 NVIDIA Jetson Nano Wins Edge AI and Vision Alliance Award
NVIDIA Jetson Nano Wins Edge AI and Vision Alliance Award
NVIDIA Developer
26 NVIDIA IndeX on Google Cloud Platform Marketplace
NVIDIA IndeX on Google Cloud Platform Marketplace
NVIDIA Developer
27 DeepStream SDK: Best practices for performance optimization
DeepStream SDK: Best practices for performance optimization
NVIDIA Developer
28 Efficiently Deploying GPU Accelerated 5G CloudRAN for Edge AI Inferencing
Efficiently Deploying GPU Accelerated 5G CloudRAN for Edge AI Inferencing
NVIDIA Developer
29 NVIDIA PhysicsNeMo - Accelerating Scientific & Engineering Simulation Workflows with AI
NVIDIA PhysicsNeMo - Accelerating Scientific & Engineering Simulation Workflows with AI
NVIDIA Developer
30 NVIDIA Deep Learning Institute Instructor-Led Training Available Remotely
NVIDIA Deep Learning Institute Instructor-Led Training Available Remotely
NVIDIA Developer
31 Advancing AR Glasses
Advancing AR Glasses
NVIDIA Developer
32 Blender Cycles: RTX On
Blender Cycles: RTX On
NVIDIA Developer
33 Real-Time GPU-Accelerated Data Analytics of 250 million Flight Data Records of 737 Max grounding
Real-Time GPU-Accelerated Data Analytics of 250 million Flight Data Records of 737 Max grounding
NVIDIA Developer
34 Assessing Property Damage with AI
Assessing Property Damage with AI
NVIDIA Developer
RAPIDS: GPU-Accelerated Data Analytics & Machine Learning
RAPIDS: GPU-Accelerated Data Analytics & Machine Learning
NVIDIA Developer
36 DaVinci Resolve Turns RTX On
DaVinci Resolve Turns RTX On
NVIDIA Developer
37 RAPIDS with Plotly Dash : GPU-Accelerated Census 2010 Visualization
RAPIDS with Plotly Dash : GPU-Accelerated Census 2010 Visualization
NVIDIA Developer
38 NVIDIA IndeX for arivis5D Cloud Platform
NVIDIA IndeX for arivis5D Cloud Platform
NVIDIA Developer
39 NVIDIA Backchannel: Behind the Scenes of Marbles at Night RTX
NVIDIA Backchannel: Behind the Scenes of Marbles at Night RTX
NVIDIA Developer
40 NVIDIA Backchannel: Sneak Peek into Marbles RTX in Omniverse
NVIDIA Backchannel: Sneak Peek into Marbles RTX in Omniverse
NVIDIA Developer
41 How to Create "Paint" in Substance Painter
How to Create "Paint" in Substance Painter
NVIDIA Developer
42 Accelerate AI development for Computer Vision on the NVIDIA Jetson with alwaysAI
Accelerate AI development for Computer Vision on the NVIDIA Jetson with alwaysAI
NVIDIA Developer
43 Securing Next Generation Apps over VMware Cloud Foundation with Bluefield-2 DPU
Securing Next Generation Apps over VMware Cloud Foundation with Bluefield-2 DPU
NVIDIA Developer
44 Accelerated Data Centers with NVIDIA and VMware
Accelerated Data Centers with NVIDIA and VMware
NVIDIA Developer
45 GPU-Accelerated Motion Blur in Blender Cycles
GPU-Accelerated Motion Blur in Blender Cycles
NVIDIA Developer
46 NVIDIA Clara Guardian Virtual Patient Assistant
NVIDIA Clara Guardian Virtual Patient Assistant
NVIDIA Developer
47 Revolutionizing Supercomputing with NVIDIA UFM Cyber-AI
Revolutionizing Supercomputing with NVIDIA UFM Cyber-AI
NVIDIA Developer
48 Inventing Virtual Meetings of Tomorrow with NVIDIA AI Research
Inventing Virtual Meetings of Tomorrow with NVIDIA AI Research
NVIDIA Developer
49 Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion
Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion
NVIDIA Developer
50 Getting started with Jetson Nano 2GB Developer Kit
Getting started with Jetson Nano 2GB Developer Kit
NVIDIA Developer
51 NVIDIA Jetson Developer Community AI Projects
NVIDIA Jetson Developer Community AI Projects
NVIDIA Developer
52 Open-source projects on NVIDIA Jetson Nano 2GB Developer Kit
Open-source projects on NVIDIA Jetson Nano 2GB Developer Kit
NVIDIA Developer
53 Real-Time Ray Tracing with Project Lavina
Real-Time Ray Tracing with Project Lavina
NVIDIA Developer
54 Jetson AI Fundamentals - S1E2 - Hello Camera
Jetson AI Fundamentals - S1E2 - Hello Camera
NVIDIA Developer
55 Develop Optimized Conversational AI Models with NVIDIA NeMo on DGX A100
Develop Optimized Conversational AI Models with NVIDIA NeMo on DGX A100
NVIDIA Developer
56 Jetson AI Fundamentals - S1E4 - Image Regression Project
Jetson AI Fundamentals - S1E4 - Image Regression Project
NVIDIA Developer
57 Jetson AI Fundamentals - S2E1 - JetBot Intro and Hardware
Jetson AI Fundamentals - S2E1 - JetBot Intro and Hardware
NVIDIA Developer
58 Jetson AI Fundamentals - S2E2 - JetBot Software Setup
Jetson AI Fundamentals - S2E2 - JetBot Software Setup
NVIDIA Developer
59 Jetson AI Fundamentals - S1E1 - First Time Setup with JetPack
Jetson AI Fundamentals - S1E1 - First Time Setup with JetPack
NVIDIA Developer
60 Jetson AI Fundamentals - S1E3 - Image Classification Project
Jetson AI Fundamentals - S1E3 - Image Classification Project
NVIDIA Developer

The video showcases the use of RAPIDS for end-to-end data science and analytics pipelines on GPUs, demonstrating its capabilities in data visualization, exploration, and machine learning model training. The RAPIDS workflow is used to predict taxi fares, and the results are visualized and compared against actual data.

Key Takeaways
  1. Import data into OmniSci
  2. Visualize and explore data using OmniSci
  3. Import RAPIDS libraries and connect to data source
  4. Inspect and clean data
  5. Add new columns and features
  6. Pick a training set and train a model
  7. Test model performance and calculate root mean square error
  8. Visualize and compare predictions against actual data
💡 The use of GPU-accelerated data analytics and machine learning libraries like RAPIDS can significantly speed up data processing and model training tasks, enabling faster insights and predictions.

Related AI Lessons

The HiPPO is always right
Learn why traditional analytics often fails to produce verifiable conclusions and how the HiPPO effect impacts decision-making in companies
Dev.to · Sharmin Sirajudeen
How to Extract Saudi Arabia Property Data Across Bayut.sa, Wasalt.sa, Aqar.fm and PropertyFinder.sa
Extract Saudi Arabia property data from major portals like Bayut.sa and PropertyFinder.sa using the REGA advertisement-license trick
Dev.to · Omar Eldeeb
Norway vs France (1:4) — A 97% Crime Index Anomaly: When Ruthless Efficiency Buries the xG Evidence
Learn how to analyze sports games using statistical models and xG evidence to identify anomalies in game outcomes
Medium · Data Science
How to Build an H-1B Salary Database by Employer (the Real Data Source + Python)
Build an H-1B salary database by employer using Python and the DOL OFLC LCA disclosure files to gain insights into salary trends
Dev.to · Omar Eldeeb
Up next
Spreadsheet Guy Meets the CFO: "Define How Much"
Digital Transformation with Eric Kimberling
Watch →