Model Tradeoffs and the Future of Computer Vision

Roboflow · Intermediate ·👁️ Computer Vision ·5y ago

Skills: Modern CV Models90%CV Basics80%

Key Takeaways

The video discusses model trade-offs in computer vision, including considerations such as model performance, speed, and size, as well as deployment environments, and explores the future of computer vision with advancements in compute capabilities and frameworks like OpenVINO and TensorRT.

Full Transcript

hey there it's joseph from rebel flow today's discussion we're going to focus on evaluating different model architectures why is it you choose one model over another what sort of trade-offs do you have to consider um i mean you want to open this up with what is it that you should be thinking about when comparing one machine learning model versus another yeah so i think there are just kind of a few broad categories um and then we'll kind of dive in and start to take apart each one of these categories in isolation but the main things you need to be considering are uh how fast is your model going to trade how fast is it going to inference how well is your model going to perform um and then also after you've kind of weighed all of those in in place you need to start thinking about deployment so what's the model size of your your model like in terms of just raw storage um and then also where can you deploy your model so in computer vision that becomes really relevant where uh you're thinking do you need real-time object detection or real-time detection or or you uh it's some latency okay can you do it on the server side where can you actually put your model once you're done so those i think are kind of the main components that you're weighing as you're thinking about different model architectures yeah yeah i think it makes a ton of sense and the components there that are interrelated is really like model performance um in terms of its speed and its accuracy and model size and how that determines as you're mentioning deployment environment and otherwise right so if you have a large model you would in terms of number of parameters and therefore in terms of size of that model in terms of the number of megabytes that it takes up you would generally expect that model to be more accurate right because it has greater granularity it can be fine-tuned to be a bit more performant but also likely slower performing right inferences takes a bit longer and it takes more computational power and in that that side of things i mean you're probably going to be considering server-side deployment i don't know like large models that you're going to be able to get away with deploying on device um but maybe i'm wrong i mean like if we're thinking about that as one of the the trade-offs like where is the line like what which architectures might fall into one corner versus another is this gonna change over time like how do you think about that yeah certainly and i mean it's definitely an exciting time to uh to be in computer division from that standpoint and that the compute capabilities um of different uh different compute engines like gpus and mobidius bpus and uh there are tpus and there's all the different use now that you can be um uh deploying things to you know it's just uh vastly increasing so the size of these models that we're able to actually deploy to the edge um is is ever increasing and not only on the hardware side but also on the software side there are frameworks like openvino and uh nvidia's tensor rt which are speeding up uh the the speed of computations on these things so um you're actually starting to get even more performant models that are larger that you can uh deploy at faster speeds um so that's definitely kind of changing the game and reawakening architectures that previously were kind of uh monolithic and were unable to be to be wielded but certainly the dichotomies here that that joseph has been elaborating on on the fact that a larger model is going to perform better and it's going to infer slower is just kind of always going to be a brief fact and state of the art for a lot of these tasks will always be uh with these extremely large models that are trained on you know many many tpus yeah i mean efficient debt comes to mind there right i mean that's why efficient release d0 to d7 um and yeah i mean the of course if you have the resources of a google it's just fundamentally different than um than elsewhere um i'm gonna ask sort of a question so like if you think about this and you extend out um 10 20 30 years um what does vision look like as a result of these considerations like does that mean that like you're right like as as models become more performant as size becomes less of an issue as we can start to do real time with higher quality um i mean today that looks like the warring frameworks if you will of you know tensor rt and openvino and whatnot from nvidia and intel respectively but you extend it over like ignore those details paper over those just imagine that we're 10 years 20 years 30 years in the future what do you think like yeah state of vision looks like yeah so i mean i think i'll just start with kind of reflecting on the present day um just in the in the few years um you know that that i've seen this these sort of things evolving and i think it's just incredible today that there's uh gpu resources available on say like google collab um for only ten dollars a month you can be accessing a tesla v100 you know and that's just given you by default now and that that's insane you know that uh that that kind of compute is out there and available and i think naturally that's going to kind of only increase as time goes by so the size of the models that are going to be able to deploy the speed that they're going to be able to train and their their performance will just rapidly increase say in 20 to 30 years so that means that the amount of data that is required to learn certain tasks and the you know the size of the data set will be able to shrink you know to be able to achieve these tasks so you'll be able to bootstrap a lot quicker to new areas and new domains um which i think is exciting and you know certain certain problems that we thought were um unreachable before like self-driving cars for example will be um you know for reality we'll be able to achieve those things i don't know what what you think yeah i mean i think those are that's entirely accurate around the the what like i mean the cost of training you're continuing to fall the amount of data that's collected and stored what is it doubles every every couple years the but the implications for that i mean given current circumstances um surrounding the rise in inevitability of remote work not just for you know office jobs but for like remote sensing jobs or like sending drones forward or like i mean in ten ten years we'll probably have uh our first broad scale augmented reality use cases i mean the vision isn't just like if you think about it like right now it's kind of like vision is is is a separate component part it's like okay i gotta get my phone and i start to do vision stuff but like in a very short period of time it'll just be ingrained in an enhancing part of the way that we experience the world and the way that businesses and products etc can experience the world and so the real-time understanding of the real world around us as well as a computer cam with the context that a computer can provide is just going to be um in a lot of ways like trying to predict uber when there first was you know a backlink in the 90s right like when you had arpanet who would have thought about uber as like the same way that we're talking about vision now versus now versus what's capable and it makes for some really really exciting capabilities that um really excited basically be a part of building the future yeah it's it's really an amazing uh amazing time to be be part of things awesome so i mean from evaluating model trade-offs to the future of vision um i really enjoyed this one thanks so much yeah thanks for being here guys uh don't forget to like and subscribe below for more videos like this and uh we'll see you on the next video

Original Description

Choosing a model depends on a number of factors, which are discussed by the fireside in this video. Naturally, these considerations lead one to think about the future of computer vision. Let us know your thoughts below!

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Roboflow · Roboflow · 42 of 60

← Previous Next →

YOLOv3 PyTorch Notebook Tutorial

YOLOv3 PyTorch Notebook Tutorial

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv5 on a Custom Dataset

How to Train YOLOv5 on a Custom Dataset

How to Use the Roboflow Dataset Health Check

How to Use the Roboflow Dataset Health Check

What is Mean Average Precision (mAP)?

What is Mean Average Precision (mAP)?

How to Use the Roboflow Model Library

How to Use the Roboflow Model Library

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

Ask the Roboflow Team Anything - Episode 1

Ask the Roboflow Team Anything - Episode 1

Exploring The COCO Dataset

Exploring The COCO Dataset

Community Spotlight: Improving Uno with Computer Vision

Community Spotlight: Improving Uno with Computer Vision

Mosaic Data Augmentation - Deep Dive

Mosaic Data Augmentation - Deep Dive

Hands on with the OAK-1

Hands on with the OAK-1

Glenn Jocher: What is New in YOLO v5?

Glenn Jocher: What is New in YOLO v5?

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

Tackling the Small Object Problem in Object Detection

Tackling the Small Object Problem in Object Detection

Fast.ai v2 Released - What's New?

Fast.ai v2 Released - What's New?

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

How to Train a Custom Resnet34 Image Classification Model

How to Train a Custom Resnet34 Image Classification Model

How to Label Images for Object Detection with CVAT

How to Label Images for Object Detection with CVAT

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Getting Started with VoTT - Computer Vision Annotation

Getting Started with VoTT - Computer Vision Annotation

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Train YOLOv4 on a Custom Dataset in Darknet

How to Train YOLOv4 on a Custom Dataset in Darknet

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Getting Started with Image Data Augmentation

Getting Started with Image Data Augmentation

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

GA Hosts Roboflow - Healthcare and AI

GA Hosts Roboflow - Healthcare and AI

How do self driving cars know when to stop?

How do self driving cars know when to stop?

What is PASCAL VOC XML?

What is PASCAL VOC XML?

AutoML Showdown: Google vs Amazon vs Microsoft

AutoML Showdown: Google vs Amazon vs Microsoft

How is computer vision changing manufacturing?

How is computer vision changing manufacturing?

The Alphabet in American Sign Language

The Alphabet in American Sign Language

Luxonis OAK-D: Computer Vision on Device

Luxonis OAK-D: Computer Vision on Device

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

TensorFlow vs PyTorch: Fireside

TensorFlow vs PyTorch: Fireside

Occlusion Techniques in Computer Vision

Occlusion Techniques in Computer Vision

A Customizable Web Application for Your Computer Vision Model

A Customizable Web Application for Your Computer Vision Model

Model Tradeoffs and the Future of Computer Vision

Model Tradeoffs and the Future of Computer Vision

Designing an Augmented Reality Board Game App

Designing an Augmented Reality Board Game App

YOLOv4 - Advanced Tactics

YOLOv4 - Advanced Tactics

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

Fireside Chat: Computer Vision in Agriculture

Fireside Chat: Computer Vision in Agriculture

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

What is Image Preprocessing?

What is Image Preprocessing?

Building a Community of Creators with BlkArthouse and Von Deon

Building a Community of Creators with BlkArthouse and Von Deon

How to Train Scaled-YOLOv4 to Detect Custom Objects

How to Train Scaled-YOLOv4 to Detect Custom Objects

Intro to Computer Vision: Fireside

Intro to Computer Vision: Fireside

The Best Way to Annotate Images for Object Detection

The Best Way to Annotate Images for Object Detection

The Computer Vision Process: Fireside

The Computer Vision Process: Fireside

How to Annotate Images with Your Team Using Roboflow

How to Annotate Images with Your Team Using Roboflow

Introducing the Roboflow Object Count Histogram

Introducing the Roboflow Object Count Histogram

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

CLIP: OpenAI's amazing new zero-shot image classifier

CLIP: OpenAI's amazing new zero-shot image classifier

How I hacked my Nest camera to run custom models

How I hacked my Nest camera to run custom models

Getting Started with the Roboflow Inference API

Getting Started with the Roboflow Inference API

Transfer Learning in Computer Vision | What, How, Why

Transfer Learning in Computer Vision | What, How, Why

The video discusses model trade-offs in computer vision and explores the future of the field with advancements in compute capabilities and frameworks. Viewers can learn to evaluate model trade-offs and understand deployment considerations for computer vision models.

Key Takeaways

Evaluate model performance and size trade-offs
Consider deployment environments for computer vision models
Explore advancements in compute capabilities and frameworks like OpenVINO and TensorRT
Optimize model performance and size for deployment
Choose appropriate model architectures for computer vision tasks

💡 The size and performance of computer vision models are increasingly important considerations for deployment, and advancements in compute capabilities and frameworks are changing the game for model development and deployment.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Modern CV Models

View skill →

YOLOE: Real-time Zero-shot Object Detection | Visual Prompting | Live Coding & Q&A (Mar 14th)

YOLOE: Real-time Zero-shot Object Detection | Visual Prompting | Live Coding & Q&A (Mar 14th)

Statistical Learning: 10.Py Convolutional Neural Network: CIFAR Image Data I 2023

Statistical Learning: 10.Py Convolutional Neural Network: CIFAR Image Data I 2023

Stanford Online

RF-DETR: How to Train SOTA for Object Detection on a Custom Dataset | Step-by-step guide

RF-DETR: How to Train SOTA for Object Detection on a Custom Dataset | Step-by-step guide

Build a Deep Facial Recognition App // Part 8 - Kivy Computer Vision App with OpenCV and Tensorflow

Build a Deep Facial Recognition App // Part 8 - Kivy Computer Vision App with OpenCV and Tensorflow

Nicholas Renotte

Deep Learning with PyTorch : Image Segmentation

Deep Learning with PyTorch : Image Segmentation

Mesh Optimization Using FlexiCubes with NVIDIA Kaolin Library v0.15.0

Mesh Optimization Using FlexiCubes with NVIDIA Kaolin Library v0.15.0

NVIDIA Developer

Related Reads

Mistral's 8B Robostral Navigate outperforms multi-sensor robots

Mistral's 8B Robostral Navigate achieves superior performance with a single RGB camera, outperforming multi-sensor robots

Dev.to · ironbyte-rgb

How Computer Vision Allows Machines to See the World

Learn how Computer Vision enables machines to interpret visual data, transforming industries from healthcare to security

Medium · Machine Learning

What makes face attractive?

Discover the key factors that make a face attractive, and how AI can analyze facial features to determine attractiveness

What makes face attractive?

Learn what makes a face attractive using deep learning principles and analysis of facial features

Medium · Deep Learning

9-Phase Computer Vision Roadmap 2026 | AI & Deep Learning | #shorts