Machine learning for Accessibility | Session

Google for Developers · Advanced ·📐 ML Fundamentals ·5y ago

Key Takeaways

Explores three case studies on machine learning for Accessibility, including Voice Access, Lookout, and Live Transcribe

Full Transcript

[Music] hi everyone my name is tom hume i'm a product manager in google research i'm here today with my colleague scott and saga to talk about the intersection of machine learning and accessibility so around the world one billion people currently experience some form of disability which means that people with disabilities are using your app right now and in larger numbers than you might expect thanks to advances in machine learning today we can help people in ways that just weren't possible a few years ago so today we'll show you a few examples of this help tell you how we're thinking about accessibility and machine learning and give you a few hints for how you might apply it in your own apps and as it happens i'm the product manager for voice access so let's start there what's voice access it's an android app that lets you control your phone using your voice we designed it for and with people with manual dexterity challenges but we think it would benefit everyone who wants a hands-free experience once you've downloaded voice access from the play store and set it up you can give your phone simple commands commands like open maps search for stinson beach tap directions we launched a big redesign of voice access last year with a huge set of improvements we rethought the user interface we added some new capabilities but today i'd like to talk to you about how we use machine learning to make voice access easier and faster so let's step back for a second and remind ourselves how accessibility works on android when you're writing an app all the different components of your user interface have some text associated with them this text is used in different ways a screen reader for the blind might read it out voice access uses it to provide labels on screen and to understand commands if you're using a good user interface toolkit this text and some other important information is automatically set up but for some elements like photos and icons developers need to add the labels themselves and i'm sad to say that many developers don't add these labels or use toolkits that don't support accessibility well which means that important parts of their apps are just invisible to their users there have been some studies into this if you google inaccessible button disease you'll find a good one and sometimes even when developers put in the work they focus on screen readers and text that works there isn't so good for voice access but what if voice access could look at images on screen in the same way a sighted person would recognize icons and give them labels well this would have two benefits firstly where an app developer hasn't given an icon a label voice access could add one but also it would mean that users could refer to the same icon with the same name consistently across apps a classic three dot overflow icon for instance might be labeled menu by some apps overflow or options by others now all these apps are trying to do the right thing by their users but we shouldn't expect users to learn different names for the same icon in different apps so that's exactly what we did android r added a new screenshot api for accessibility services to use voice access uses this new api to take a screenshot and passes that screenshot into a machine learning model we call iconnet iconnet gives precise information about which icons are on screen and where giving them labels and then voice access takes those labels plus the ones an app has provided and uses them and here's my favorite part it does all of this locally without your screen ever leaving your android device we think this is a great use for machine learning to fill in some of the gaps in android apps for users with accessibility needs and to do so quickly and privately voice access detected about 30 icons when we launched we added another 40 in february and there's more intelligence to come so i'll end with a plea to developers please download voice access from the play store and use it to test your applications there are literally millions of people worldwide who have manual dexterity issues voice access gives them full use of their android device and your apps testing your app with voice access is good for these people and it's good for you you can download it at g dot co slash voice access and now my colleague scott is going to talk to you about lookout scott thanks tom i'm scott adams and i work in research as the product manager on lookout and i'm here to talk to you about a case study and ml applications for people with visual impairments lookout is an app that uses the smartphone's camera to recognize objects and text for users who are blind or low vision we use some cool technology that's available to you through mlkit apis but there's more to it than connecting a camera to a classifier especially in accessibility is critical to design with your users and not just for them there are two questions to ask one how well do i understand what the user needs and two how can i fit the technology to those needs at google this is codified in our ai principles such as be socially responsible and be built and tested for safety for example lookout went through proactive adversarial testing to guard against unfair bias this kind of sophisticated evaluation is critical but the path is much easier if you build with your users from the very beginning for instance lookout uses several different image classifiers here's how we used user feedback to tune for precision and recall as a refresher imagine we have a mix of apples and oranges and an apple detector if we have a high precision apple detector then the detector is taking no chances it's only going to say apple if it's certain i have an apple in my hand on the negative side if i have an apple that's let's say i'll be shaped or a different color than usual it's going to say nothing so we'll have false negatives where i do have an apple in my hand but the detector is silent contrast that with high recoil in this case the detector may be so sensitive that anything that is round and about the size of my hand is an apple so every time i show an apple it's saying apple every time that's terrific the downside is i might show it in orange and it's going to say apple and that's a false positive where it's saying apple but there is no apple in my hand and ideally we have both high precision and high recall but that's often not possible so how do we tune that well it may come down to the user if i really prefer an apple to an orange i may prefer a high precision detector so with that in mind here's what we learned from our users on lookout i'll talk about two cases one on currency and one on objects so for currency some types like us dollars are the same size color and texture regardless of their value which makes it impossible to distinguish a one dollar bill from 100 bill if i can't see the bill so our goal is to identify the value of the bill now for objects imagine that i can't see and i'm going into a room that i'm unfamiliar with i might want to know what's in that room what kind of furniture so i could understand what kind of room it is like a living room versus an office so with that in mind what did users tell us for currency it was fairly intuitive they said listen never confuse a one dollar bill for a 100 bill don't guess if you're not certain say nothing so in this case this is a pretty clear signal for precision now the easy thing to do is at first glance take that lesson and carry it over to objects but there's some nuance here imagine we have an object and it could be an arm chair or it could be a couch if we're very high precision we may say neither because we're not sure and from users we learned that actually either answer has some value even if it's imprecise so whether it's an armchair or it's a couch i know that this is a pretty big object and this room might be a living room so we actually want to balance between the two here so taking a big step back getting this kind of user feedback about development is critical do it during design during development during testing not after release it takes longer but you make a better product okay if you're excited to start writing your own accessibility apps with computer vision then dive into ml kit we have barcode scanning ocr and object detection and if you or someone you know is bundled low vision please consider lookout next i'm pleased to introduce my colleague sagar hi i'm sagar i work in machine perception team in google research my team's mission is to help machines understand the world like humans do we work on live transcribe and sound notifications live transcribe is an app for deaf and hard of hearing people to get captions for real-time conversations in over 80 different languages like transcribe is a great example of using ml technologies together with good ux research to provide meaningful experiences it uses automatic speech recognition of curse as well as models to detect speech versus other audio and also a sound event detection model to provide sound chips to the user i want to share the story of how live transcribe came about meet dmitry he's a computer science researcher who has been working on speech recognition for over three decades dmitry typically relied on lip reading and a professional caption for communications as good as he is with lip reading we would often struggle to have impromptu conversations around the water cooler liberating alone is only about 50 accurate and professional captioners are not always present as asr technology improved and became more pervasive dimitri launched an experiment with the team to create an app for real-time captioning that android app uses google's cloud speech-to-text api to turn that water cooler chat into captions on the screen of his phone together with dmitry's decades of experience in speech technology the team i traded on the app and improved it to a point where he started using it daily for personal and professional conversations dmitry was excited to try more things in the app motivated by his insights as well as enthusiasm we started looking at what else we could do to make the app better what if the app could interpret more than just spoken conversation a few years ago we released a data set called audio set which allows developers to recognize over 600 different sound event classes in their applications the easiest way to get started with audio set is to start with a pre-trained model from that data set for that you can directly integrate our open search pre-trained model called yamnet to understand different sound events here is a use case demonstrating how critical this non-speech information can be dimitri recalls a story of how one day he was asleep and the smoke alarm was active he could not hear it but luckily his neighbor came in and woke him up as our researchers talked to more potential users we heard countless stories like this and this led us to use haptic alerts vibrations delivered to your smartwatch to notify users about important sounds in their home so a few months ago we launched sound notifications it tells you when it hears sounds around your home like sirens dog parking babies crying water running or if somebody knocks on your door we also open source live transcribes android app engine so developers can customize it for their own use cases by integrating it within a bigger existing app or porting it to other platforms another such experiment of using ml for accessibility is project shuva shuba which means sign language in japanese is a project centered around sign language detection and understanding together with the nippon foundation and the chinese university of hong kong we created a web game for people to learn a bit of japanese and hong kong sign languages and since there are over 150 different sign languages in the world including american sign language indian sign language and many many more to better help developers create their own gesture and sign language understanding systems we have open sourced a gesture detection toolkit for developers you can check out demos of project shuba at the i o sandbox thank you for watching just like dimitri and our team's prototype live transcribe and its extensions with many existing bits and pieces we welcome developers to experiment with such different machine learning technologies to help the world become more accessible please reach out if you'd like to learn [Music] more you

Original Description

Explore three case studies covering Voice Access, Lookout, and Live Transcribe along with Sound Notifications. We also look at the intersection between Google’s machine learning research and Accessibility with takeaways for all Android developers. Resources: Voice Access → https://g.co/voiceaccess Lookout → https://goo.gle/lookout Google Developers ML Kit → https://goo.gle/3xyC3F6 Speakers: Tom Hume, Scott Adams, Sagar Savla Watch more: Google Developers at Google I/O 2021 Playlist → https://goo.gle/io21-GoogleDevelopers All Google I/O 2021 Technical Sessions → https://goo.gle/io21-technicalsessions All Google I/O 2021 Sessions → https://goo.gle/io21-allsessions Subscribe to Google Developers → https://goo.gle/developers #GoogleIO #Accessibility #ML/AI product: Cloud - AI and Machine Learning - AI Platform; event: Google I/O 2021; fullname: Tom Hume, Scott Adams, Sagar Savla; re_ty: Premiere;

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Google for Developers · Google for Developers · 0 of 60

← Previous Next →

Developer Journey - Sunnyvale DSC Summit ‘19

Developer Journey - Sunnyvale DSC Summit ‘19

Google for Developers

How Google is working with students - Sunnyvale DSC Summit ‘19

How Google is working with students - Sunnyvale DSC Summit ‘19

Google for Developers

Starting your career in the Cloud - Sunnyvale DSC Summit ‘19

Starting your career in the Cloud - Sunnyvale DSC Summit ‘19

Google for Developers

The Solution Challenge - Sunnyvale DSC Summit ‘19

The Solution Challenge - Sunnyvale DSC Summit ‘19

Google for Developers

Firebase - Sunnyvale DSC Summit ‘19

Firebase - Sunnyvale DSC Summit ‘19

Google for Developers

Cloud Hero - Sunnyvale DSC Summit ‘19

Cloud Hero - Sunnyvale DSC Summit ‘19

Google for Developers

Panel discussion - Sunnyvale DSC Summit ‘19

Panel discussion - Sunnyvale DSC Summit ‘19

Google for Developers

The art of negotiation - Sunnyvale DSC Summit ‘19

The art of negotiation - Sunnyvale DSC Summit ‘19

Google for Developers

Courage to care, solve and share - Sunnyvale DSC Summit ‘19

Courage to care, solve and share - Sunnyvale DSC Summit ‘19

Google for Developers

Version 9 of Angular, Glass Enterprise Edition 2, path to DX deprecation, & more!

Version 9 of Angular, Glass Enterprise Edition 2, path to DX deprecation, & more!

Google for Developers

[DEPRECATING] Introducing a new series (Assistant for Developers Pro Tips)

[DEPRECATING] Introducing a new series (Assistant for Developers Pro Tips)

Google for Developers

Detecting memory bugs with HWASan, Bazel 2.1, Next ‘20 session guide, & more!

Detecting memory bugs with HWASan, Bazel 2.1, Next ‘20 session guide, & more!

Google for Developers

Why Podcast.app chose a .app domain name

Why Podcast.app chose a .app domain name

Google for Developers

Machine Learning Bootcamp Jakarta 2019

Machine Learning Bootcamp Jakarta 2019

Google for Developers

Android Studio 3.6, Android 11 Developer Preview, Kubeflow 1.0, & more!

Android Studio 3.6, Android 11 Developer Preview, Kubeflow 1.0, & more!

Google for Developers

[DEPRECATING] Importance of community (Assistant on Air)

[DEPRECATING] Importance of community (Assistant on Air)

Google for Developers

Why the Flutter team switched from .io to a .dev domain name

Why the Flutter team switched from .io to a .dev domain name

Google for Developers

3 website-building tips from .dev creators

3 website-building tips from .dev creators

Google for Developers

Why NimbleDroid chose a .app domain name

Why NimbleDroid chose a .app domain name

Google for Developers

Android Platform Codelab, Bazel 2.2, Maps Android Utility Library v1.0, & more!

Android Platform Codelab, Bazel 2.2, Maps Android Utility Library v1.0, & more!

Google for Developers

Google for Games Developer Summit: A free, digital experience for game developers

Google for Games Developer Summit: A free, digital experience for game developers

Google for Developers

Inspecting Home Graph (Assistant for Developers Pro Tips)

Inspecting Home Graph (Assistant for Developers Pro Tips)

Google for Developers

Google for Games Developer Summit Keynote

Google for Games Developer Summit Keynote

Google for Developers

Stadia Games & Entertainment presents: Keys to a great game pitch (Google Games Dev Summit)

Stadia Games & Entertainment presents: Keys to a great game pitch (Google Games Dev Summit)

Google for Developers

Empowering game developers with Stadia R&D (Google Games Dev Summit)

Empowering game developers with Stadia R&D (Google Games Dev Summit)

Google for Developers

Supercharging discoverability with Stadia (Google Games Dev Summit)

Supercharging discoverability with Stadia (Google Games Dev Summit)

Google for Developers

Stadia Games & Entertainment presents: Creating for content creators (Google Games Dev Summit)

Stadia Games & Entertainment presents: Creating for content creators (Google Games Dev Summit)

Google for Developers

Bringing Destiny to Stadia: A postmortem (Google Games Dev Summit)

Bringing Destiny to Stadia: A postmortem (Google Games Dev Summit)

Google for Developers

Live Captioning in Google Slides

Live Captioning in Google Slides

Google for Developers

[DEPRECATING] User engagement for the Google Assistant

[DEPRECATING] User engagement for the Google Assistant

Google for Developers

TensorFlow Dev Summit ‘20, Google for Games Dev Summit, Cloud AI Platform Pipelines, & much more!

TensorFlow Dev Summit ‘20, Google for Games Dev Summit, Cloud AI Platform Pipelines, & much more!

Google for Developers

Top 5 from the TensorFlow Dev Summit 2020

Top 5 from the TensorFlow Dev Summit 2020

Google for Developers

Developer Student Clubs 2019 Turkey Leads Summit

Developer Student Clubs 2019 Turkey Leads Summit

Google for Developers

Building simpler payment experiences | Google Pay Plugin for Magento 2

Building simpler payment experiences | Google Pay Plugin for Magento 2

Google for Developers

Become A Developer Student Club Lead

Become A Developer Student Club Lead

Google for Developers

Firebase Kotlin Extensions, ARM apps on the Android Emulator, Angular v9.1, & more!

Firebase Kotlin Extensions, ARM apps on the Android Emulator, Angular v9.1, & more!

Google for Developers

Test suite for Smart Home (Assistant for Developers Pro Tips)

Test suite for Smart Home (Assistant for Developers Pro Tips)

Google for Developers

Google Play updates, Bazel 3.0, Business Console for Google Pay, & more!

Google Play updates, Bazel 3.0, Business Console for Google Pay, & more!

Google for Developers

How to use error logs (Assistant for Developers Pro Tips)

How to use error logs (Assistant for Developers Pro Tips)

Google for Developers

Contact Center AI, Android Studio 4.1 Canary 5, TensorFlow QAT API, & more!

Contact Center AI, Android Studio 4.1 Canary 5, TensorFlow QAT API, & more!

Google for Developers

WebView DevTools, Kotlin meets gRPC, Flutter CodePen support, & more! (Episode 200)

WebView DevTools, Kotlin meets gRPC, Flutter CodePen support, & more! (Episode 200)

Google for Developers

Offline handling for Smart Home (Assistant for Developers Pro Tips)

Offline handling for Smart Home (Assistant for Developers Pro Tips)

Google for Developers

Android 11 Dev Preview 3, Google Fonts for Flutter, Shielded VM, & more!

Android 11 Dev Preview 3, Google Fonts for Flutter, Shielded VM, & more!

Google for Developers

Machine Learning Foundations: Ep #1 - What is ML?

Machine Learning Foundations: Ep #1 - What is ML?

Google for Developers

Flutter web support updates, BigQuery materialized views, Cloud Spanner emulator, & more!

Flutter web support updates, BigQuery materialized views, Cloud Spanner emulator, & more!

Google for Developers

Computer vision by building a neural network with TensorFlow | Machine Learning Foundations

Computer vision by building a neural network with TensorFlow | Machine Learning Foundations

Google for Developers

Machine Learning Foundations: Ep #3 - Convolutions and pooling

Machine Learning Foundations: Ep #3 - Convolutions and pooling

Google for Developers

Android 11 Beta plans, Flutter 1.17, Dart 2.8, & much more!

Android 11 Beta plans, Flutter 1.17, Dart 2.8, & much more!

Google for Developers

Machine Learning Foundations: Ep #4 - Coding with Convolutional Neural Networks

Machine Learning Foundations: Ep #4 - Coding with Convolutional Neural Networks

Google for Developers

Google Developers ML Summit

Google Developers ML Summit

Google for Developers

Real-world image classification using convolutional neural networks | Machine Learning Foundations

Real-world image classification using convolutional neural networks | Machine Learning Foundations

Google for Developers

Adobe XD support for Flutter, Architecture Framework, temporary closures with Places API, & more!

Adobe XD support for Flutter, Architecture Framework, temporary closures with Places API, & more!

Google for Developers

Machine Learning Foundations: Ep #6 - Convolutional cats and dogs

Machine Learning Foundations: Ep #6 - Convolutional cats and dogs

Google for Developers

Machine Learning Foundations: Ep #7 - Image augmentation and overfitting

Machine Learning Foundations: Ep #7 - Image augmentation and overfitting

Google for Developers

Announcing Firebase Live, Flutter Day, Java 11 on Google Cloud Functions, & more!

Announcing Firebase Live, Flutter Day, Java 11 on Google Cloud Functions, & more!

Google for Developers

Machine Learning Foundations: Ep #8 - Tokenization for Natural Language Processing

Machine Learning Foundations: Ep #8 - Tokenization for Natural Language Processing

Google for Developers

Android 11 Beta, Google Play Asset Delivery, Firebase Crashlytics SDK, & much more!

Android 11 Beta, Google Play Asset Delivery, Firebase Crashlytics SDK, & much more!

Google for Developers

Natural Language Processing: Using sequencing APIs in TensorFlow | Machine Learning Foundations

Natural Language Processing: Using sequencing APIs in TensorFlow | Machine Learning Foundations

Google for Developers

Build a sarcasm classifier using NLP and TensorFlow | Machine Learning Foundations

Build a sarcasm classifier using NLP and TensorFlow | Machine Learning Foundations

Google for Developers

AR Realism with the ARCore Depth API

AR Realism with the ARCore Depth API

Google for Developers

Related AI Lessons

Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2

Learn the basics of the TypeScript compiler to write better JavaScript code

Medium · JavaScript

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting with a simple code tweak and understand the difference between Ridge and Lasso regression

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting in machine learning models with a simple code tweak and understand the difference between Ridge and Lasso regression

Medium · Machine Learning

Stop Overfitting With Basically One Line of Code

Prevent overfitting in models with a simple code tweak, understanding the difference between Ridge and Lasso regression

Medium · Data Science

Learn Deep Learning by Hand (Beginner's Guide - Part 1)