RAPIDS: GPU-Accelerated Data Analytics & Machine Learning

NVIDIA Developer · Advanced ·📊 Data Analytics & Business Intelligence ·5y ago

Skills: ML Pipelines90%Data Literacy80%Supervised Learning80%

Key Takeaways

The video demonstrates the use of RAPIDS, a suite of GPU-accelerated data analytics and machine learning libraries, to analyze and predict taxi fares using the Kaggle New York City taxicab data set.

Full Transcript

a data set that's very familiar to data scientists is a kaggle new york city taxicab data set we brought this data set into the omni-side gpu accelerated analytics platform where we can visualize and manipulate the tabular data in real time as well as quickly explore the data this data set contains records of taxi fares for a 12-month period and has information such as time passenger count as well as geographic locations the dot on the map are drop off locations where the size of the dot indicates how long the trip was and the color is the tip amount zero being blue and the max value being yellow you can see there's a lot of blue indicating zero tip which is a bit odd but as you explore the data you'll realize that the only way tips are being tracked is through credit card transactions which is not a true indicator of what trips truly are and is skewing the average tip amount i can show only credit card transactions by clicking on credit within this pie chart notice how the average tip changed from 11 to 20 percent the omni-side gpu accelerated analytics platform is based upon a gpu accelerated database that runs entirely in gpu memory it allows me as a business intelligence analyst to quickly gain insights and visualize large amounts of data i can build charts based upon this data so that i can further analyze maybe i wanted to show where taxi fares that are requiring larger vehicles here are the fares that had nine passengers for example i can add trips that had eight or seven passengers as well this allows me to gain an understanding where larger vehicles were requested it's very fast and a great way to allow data exploration across the enterprise this next part is where the data scientist comes into play what if you wanted to predict taxi fares maybe when somebody submitted a taxi fare request on an application like an app on their phone you wanted to give them an estimate and you wanted these estimates to be based upon real taxi fares that happened at the beginning of the month so that's where a data scientist comes in and you can use rapids to do those predictions rapids is an end-to-end open source ecosystem of gpu accelerated data science libraries that run on nvidia gpus a data scientist interacts with rapid libraries using a jupiter notebook and omnisci has jupiter notebook integration into their application the advantage to using rapids is that the data is coming directly from the omni-side gpu accelerated database no need to go back to csvs once the data is in the data frame we can use rapids and an etl workflow which imports inspects and cleans the data as well as selects a training set which is used for training a model to predict taxi fares this end-to-end workflow of data processing and model training is faster in using less hardware than cpu only so let's begin the rapids workflow here i go in and i import in libraries i connect to the data source and this brings in the data here is the data the next thing i'll do is inspect and clean up the data for example i have columns from 2014 that were named a certain way but they did not map the same in 2015. well through rapids you can concatenate fields and do data cleanup that's what we're doing here this is the data cleanup phase let's go on to the next part of the rapids workflow i can add new columns or new interesting features for example if i didn't have a way to understand if a taxi fare was on a weekend or maybe i wanted to know what day of the week it was without having to read it from a date field i can create that data and in this case i did i added it to the end of the data set i also used a distance calculator so i could see the distance of the trip itself as you can see this data has been added and exists as new columns within the data set the next part of the workflow is to pick a training set so here i will use 75 of the data to do training on and then i will use the remaining 25 to test my model against so i create that training set and we do the training it uses xg boost to do the training and lastly we test our model against that remaining 25 percent we then calculate a root mean square error which is basically a standard deviation to see how close we can get between the predictions and the ground truth the end result was that the taxi fare was within a two dollar range which i think is pretty fair estimate to give to taxi riders now that we have predictions we can bring this data back into the omni sci dashboard so that we can visualize and compare predictions against actual data the graph predicted amount versus actual fare provides this visual comparison blue are more accurate predictions whereas red are less accurate

Original Description

The RAPIDS suite of software libraries, built on CUDA-X AI, gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes. This demonstration uses RAPIDS, and OmniSci’s GPU-accelerated analytics platform to quickly visualize and run queries on the 1.1 billion New York City taxi ride dataset. "To learn more about RAPIDS and to try using GPU-accelerated analytics using and open-source version of OmniSci (including sample data), please visit https://developer.nvidia.com/rapids https://rapids.ai/ https://ngc.nvidia.com/catalog/containers/partners:omnisci-os "

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from NVIDIA Developer · NVIDIA Developer · 35 of 60

← Previous Next →

Ray Tracing Essentials Part 2: Rasterization versus Ray Tracing

Ray Tracing Essentials Part 2: Rasterization versus Ray Tracing

NVIDIA Developer

Ray Tracing Essentials Part 3: Ray Tracing Hardware

Ray Tracing Essentials Part 3: Ray Tracing Hardware

NVIDIA Developer

Ray Tracing Essentials Part 4: The Ray Tracing Pipeline

Ray Tracing Essentials Part 4: The Ray Tracing Pipeline

NVIDIA Developer

NsightGraphics 2020 2 Release Spotlight

NsightGraphics 2020 2 Release Spotlight

NVIDIA Developer

Ray Tracing Essentials Part 5: Ray Tracing Effects

Ray Tracing Essentials Part 5: Ray Tracing Effects

NVIDIA Developer

Ray Tracing Essentials Part 6: The Rendering Equation

Ray Tracing Essentials Part 6: The Rendering Equation

NVIDIA Developer

Ray Tracing Essentials Part 7: Denoising for Ray Tracing

Ray Tracing Essentials Part 7: Denoising for Ray Tracing

NVIDIA Developer

Spatiotemporal Importance Resampling for Many-Light Ray Tracing (ReSTIR)

Spatiotemporal Importance Resampling for Many-Light Ray Tracing (ReSTIR)

NVIDIA Developer

Announcing Cloud-Native Support for Jetson Platform

Announcing Cloud-Native Support for Jetson Platform

NVIDIA Developer

JetsonTV: Build your next project with NVIDIA Jetson

JetsonTV: Build your next project with NVIDIA Jetson

NVIDIA Developer

Nsight Compute Feature Spotlight: Roofline Analysis, Asynchronous Copy, Sparse Data Compression

Nsight Compute Feature Spotlight: Roofline Analysis, Asynchronous Copy, Sparse Data Compression

NVIDIA Developer

Nsight Systems Feature Spotlight: OpenMP

Nsight Systems Feature Spotlight: OpenMP

NVIDIA Developer

Isaac Sim 2020: Deep Dive

Isaac Sim 2020: Deep Dive

NVIDIA Developer

NVIDIA Jetson: Enabling AI-Powered Autonomous Machines at Scale

NVIDIA Jetson: Enabling AI-Powered Autonomous Machines at Scale

NVIDIA Developer

NVIDIA Tools to Train, Build, and Deploy Intelligent Vision Applications at the Edge

NVIDIA Tools to Train, Build, and Deploy Intelligent Vision Applications at the Edge

NVIDIA Developer

Jetson Xavier NX Developer Kit: The Next Leap in Edge Computing

Jetson Xavier NX Developer Kit: The Next Leap in Edge Computing

NVIDIA Developer

Synthesizing High-Resolution Images with StyleGAN2

Synthesizing High-Resolution Images with StyleGAN2

NVIDIA Developer

NVIDIA Robotics: Isaac SDK and Sim 2020.1

NVIDIA Robotics: Isaac SDK and Sim 2020.1

NVIDIA Developer

Accelerating COVID-19 Research with GPUs

Accelerating COVID-19 Research with GPUs

NVIDIA Developer

Visualizing 150 Terabytes of Data

Visualizing 150 Terabytes of Data

NVIDIA Developer

Boosting Performance and Utilization with Multi-Instance GPU

Boosting Performance and Utilization with Multi-Instance GPU

NVIDIA Developer

Running Multiple Workloads on a Single A100 GPU

Running Multiple Workloads on a Single A100 GPU

NVIDIA Developer

NVIDIA Nsight Feature Spotlight: GPU Trace

NVIDIA Nsight Feature Spotlight: GPU Trace

NVIDIA Developer

Spark 3 Demo: Comparing Performance of GPUs vs. CPUs

Spark 3 Demo: Comparing Performance of GPUs vs. CPUs

NVIDIA Developer

NVIDIA Jetson Nano Wins Edge AI and Vision Alliance Award

NVIDIA Jetson Nano Wins Edge AI and Vision Alliance Award

NVIDIA Developer

NVIDIA IndeX on Google Cloud Platform Marketplace

NVIDIA IndeX on Google Cloud Platform Marketplace

NVIDIA Developer

DeepStream SDK: Best practices for performance optimization

DeepStream SDK: Best practices for performance optimization

NVIDIA Developer

Efficiently Deploying GPU Accelerated 5G CloudRAN for Edge AI Inferencing

Efficiently Deploying GPU Accelerated 5G CloudRAN for Edge AI Inferencing

NVIDIA Developer

NVIDIA PhysicsNeMo - Accelerating Scientific & Engineering Simulation Workflows with AI

NVIDIA PhysicsNeMo - Accelerating Scientific & Engineering Simulation Workflows with AI

NVIDIA Developer

NVIDIA Deep Learning Institute Instructor-Led Training Available Remotely

NVIDIA Deep Learning Institute Instructor-Led Training Available Remotely

NVIDIA Developer

Advancing AR Glasses

Advancing AR Glasses

NVIDIA Developer

Blender Cycles: RTX On

Blender Cycles: RTX On

NVIDIA Developer

Real-Time GPU-Accelerated Data Analytics of 250 million Flight Data Records of 737 Max grounding

Real-Time GPU-Accelerated Data Analytics of 250 million Flight Data Records of 737 Max grounding

NVIDIA Developer

Assessing Property Damage with AI

Assessing Property Damage with AI

NVIDIA Developer

RAPIDS: GPU-Accelerated Data Analytics & Machine Learning

RAPIDS: GPU-Accelerated Data Analytics & Machine Learning

NVIDIA Developer

DaVinci Resolve Turns RTX On

DaVinci Resolve Turns RTX On

NVIDIA Developer

RAPIDS with Plotly Dash : GPU-Accelerated Census 2010 Visualization

RAPIDS with Plotly Dash : GPU-Accelerated Census 2010 Visualization

NVIDIA Developer

NVIDIA IndeX for arivis5D Cloud Platform

NVIDIA IndeX for arivis5D Cloud Platform

NVIDIA Developer

NVIDIA Backchannel: Behind the Scenes of Marbles at Night RTX

NVIDIA Backchannel: Behind the Scenes of Marbles at Night RTX

NVIDIA Developer

NVIDIA Backchannel: Sneak Peek into Marbles RTX in Omniverse

NVIDIA Backchannel: Sneak Peek into Marbles RTX in Omniverse

NVIDIA Developer

How to Create "Paint" in Substance Painter

How to Create "Paint" in Substance Painter

NVIDIA Developer

Accelerate AI development for Computer Vision on the NVIDIA Jetson with alwaysAI

Accelerate AI development for Computer Vision on the NVIDIA Jetson with alwaysAI

NVIDIA Developer

Securing Next Generation Apps over VMware Cloud Foundation with Bluefield-2 DPU

Securing Next Generation Apps over VMware Cloud Foundation with Bluefield-2 DPU

NVIDIA Developer

Accelerated Data Centers with NVIDIA and VMware

Accelerated Data Centers with NVIDIA and VMware

NVIDIA Developer

GPU-Accelerated Motion Blur in Blender Cycles

GPU-Accelerated Motion Blur in Blender Cycles

NVIDIA Developer

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Developer

Revolutionizing Supercomputing with NVIDIA UFM Cyber-AI

Revolutionizing Supercomputing with NVIDIA UFM Cyber-AI

NVIDIA Developer

Inventing Virtual Meetings of Tomorrow with NVIDIA AI Research

Inventing Virtual Meetings of Tomorrow with NVIDIA AI Research

NVIDIA Developer

Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion

Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion

NVIDIA Developer

Getting started with Jetson Nano 2GB Developer Kit

Getting started with Jetson Nano 2GB Developer Kit

NVIDIA Developer

NVIDIA Jetson Developer Community AI Projects

NVIDIA Jetson Developer Community AI Projects

NVIDIA Developer

Open-source projects on NVIDIA Jetson Nano 2GB Developer Kit

Open-source projects on NVIDIA Jetson Nano 2GB Developer Kit

NVIDIA Developer

Real-Time Ray Tracing with Project Lavina

Real-Time Ray Tracing with Project Lavina

NVIDIA Developer

Jetson AI Fundamentals - S1E2 - Hello Camera

Jetson AI Fundamentals - S1E2 - Hello Camera

NVIDIA Developer

Develop Optimized Conversational AI Models with NVIDIA NeMo on DGX A100

Develop Optimized Conversational AI Models with NVIDIA NeMo on DGX A100

NVIDIA Developer

Jetson AI Fundamentals - S1E4 - Image Regression Project

Jetson AI Fundamentals - S1E4 - Image Regression Project

NVIDIA Developer

Jetson AI Fundamentals - S2E1 - JetBot Intro and Hardware

Jetson AI Fundamentals - S2E1 - JetBot Intro and Hardware

NVIDIA Developer

Jetson AI Fundamentals - S2E2 - JetBot Software Setup

Jetson AI Fundamentals - S2E2 - JetBot Software Setup

NVIDIA Developer

Jetson AI Fundamentals - S1E1 - First Time Setup with JetPack

Jetson AI Fundamentals - S1E1 - First Time Setup with JetPack

NVIDIA Developer

Jetson AI Fundamentals - S1E3 - Image Classification Project

Jetson AI Fundamentals - S1E3 - Image Classification Project

NVIDIA Developer

The video showcases the use of RAPIDS for end-to-end data science and analytics pipelines on GPUs, demonstrating its capabilities in data visualization, exploration, and machine learning model training. The RAPIDS workflow is used to predict taxi fares, and the results are visualized and compared against actual data.

Key Takeaways

Import data into OmniSci
Visualize and explore data using OmniSci
Import RAPIDS libraries and connect to data source
Inspect and clean data
Add new columns and features
Pick a training set and train a model
Test model performance and calculate root mean square error
Visualize and compare predictions against actual data

💡 The use of GPU-accelerated data analytics and machine learning libraries like RAPIDS can significantly speed up data processing and model training tasks, enabling faster insights and predictions.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related AI Lessons

The HiPPO is always right

Learn why traditional analytics often fails to produce verifiable conclusions and how the HiPPO effect impacts decision-making in companies

Dev.to · Sharmin Sirajudeen

How to Extract Saudi Arabia Property Data Across Bayut.sa, Wasalt.sa, Aqar.fm and PropertyFinder.sa

Extract Saudi Arabia property data from major portals like Bayut.sa and PropertyFinder.sa using the REGA advertisement-license trick

Dev.to · Omar Eldeeb

Norway vs France (1:4) — A 97% Crime Index Anomaly: When Ruthless Efficiency Buries the xG Evidence

Learn how to analyze sports games using statistical models and xG evidence to identify anomalies in game outcomes

Medium · Data Science

How to Build an H-1B Salary Database by Employer (the Real Data Source + Python)

Build an H-1B salary database by employer using Python and the DOL OFLC LCA disclosure files to gain insights into salary trends

Dev.to · Omar Eldeeb

Spreadsheet Guy Meets the CFO: "Define How Much"

Digital Transformation with Eric Kimberling