Machine learning for Accessibility | Session

Google for Developers · Advanced ·📐 ML Fundamentals ·5y ago

Key Takeaways

Explores three case studies on machine learning for Accessibility, including Voice Access, Lookout, and Live Transcribe

Full Transcript

[Music] hi everyone my name is tom hume i'm a product manager in google research i'm here today with my colleague scott and saga to talk about the intersection of machine learning and accessibility so around the world one billion people currently experience some form of disability which means that people with disabilities are using your app right now and in larger numbers than you might expect thanks to advances in machine learning today we can help people in ways that just weren't possible a few years ago so today we'll show you a few examples of this help tell you how we're thinking about accessibility and machine learning and give you a few hints for how you might apply it in your own apps and as it happens i'm the product manager for voice access so let's start there what's voice access it's an android app that lets you control your phone using your voice we designed it for and with people with manual dexterity challenges but we think it would benefit everyone who wants a hands-free experience once you've downloaded voice access from the play store and set it up you can give your phone simple commands commands like open maps search for stinson beach tap directions we launched a big redesign of voice access last year with a huge set of improvements we rethought the user interface we added some new capabilities but today i'd like to talk to you about how we use machine learning to make voice access easier and faster so let's step back for a second and remind ourselves how accessibility works on android when you're writing an app all the different components of your user interface have some text associated with them this text is used in different ways a screen reader for the blind might read it out voice access uses it to provide labels on screen and to understand commands if you're using a good user interface toolkit this text and some other important information is automatically set up but for some elements like photos and icons developers need to add the labels themselves and i'm sad to say that many developers don't add these labels or use toolkits that don't support accessibility well which means that important parts of their apps are just invisible to their users there have been some studies into this if you google inaccessible button disease you'll find a good one and sometimes even when developers put in the work they focus on screen readers and text that works there isn't so good for voice access but what if voice access could look at images on screen in the same way a sighted person would recognize icons and give them labels well this would have two benefits firstly where an app developer hasn't given an icon a label voice access could add one but also it would mean that users could refer to the same icon with the same name consistently across apps a classic three dot overflow icon for instance might be labeled menu by some apps overflow or options by others now all these apps are trying to do the right thing by their users but we shouldn't expect users to learn different names for the same icon in different apps so that's exactly what we did android r added a new screenshot api for accessibility services to use voice access uses this new api to take a screenshot and passes that screenshot into a machine learning model we call iconnet iconnet gives precise information about which icons are on screen and where giving them labels and then voice access takes those labels plus the ones an app has provided and uses them and here's my favorite part it does all of this locally without your screen ever leaving your android device we think this is a great use for machine learning to fill in some of the gaps in android apps for users with accessibility needs and to do so quickly and privately voice access detected about 30 icons when we launched we added another 40 in february and there's more intelligence to come so i'll end with a plea to developers please download voice access from the play store and use it to test your applications there are literally millions of people worldwide who have manual dexterity issues voice access gives them full use of their android device and your apps testing your app with voice access is good for these people and it's good for you you can download it at g dot co slash voice access and now my colleague scott is going to talk to you about lookout scott thanks tom i'm scott adams and i work in research as the product manager on lookout and i'm here to talk to you about a case study and ml applications for people with visual impairments lookout is an app that uses the smartphone's camera to recognize objects and text for users who are blind or low vision we use some cool technology that's available to you through mlkit apis but there's more to it than connecting a camera to a classifier especially in accessibility is critical to design with your users and not just for them there are two questions to ask one how well do i understand what the user needs and two how can i fit the technology to those needs at google this is codified in our ai principles such as be socially responsible and be built and tested for safety for example lookout went through proactive adversarial testing to guard against unfair bias this kind of sophisticated evaluation is critical but the path is much easier if you build with your users from the very beginning for instance lookout uses several different image classifiers here's how we used user feedback to tune for precision and recall as a refresher imagine we have a mix of apples and oranges and an apple detector if we have a high precision apple detector then the detector is taking no chances it's only going to say apple if it's certain i have an apple in my hand on the negative side if i have an apple that's let's say i'll be shaped or a different color than usual it's going to say nothing so we'll have false negatives where i do have an apple in my hand but the detector is silent contrast that with high recoil in this case the detector may be so sensitive that anything that is round and about the size of my hand is an apple so every time i show an apple it's saying apple every time that's terrific the downside is i might show it in orange and it's going to say apple and that's a false positive where it's saying apple but there is no apple in my hand and ideally we have both high precision and high recall but that's often not possible so how do we tune that well it may come down to the user if i really prefer an apple to an orange i may prefer a high precision detector so with that in mind here's what we learned from our users on lookout i'll talk about two cases one on currency and one on objects so for currency some types like us dollars are the same size color and texture regardless of their value which makes it impossible to distinguish a one dollar bill from 100 bill if i can't see the bill so our goal is to identify the value of the bill now for objects imagine that i can't see and i'm going into a room that i'm unfamiliar with i might want to know what's in that room what kind of furniture so i could understand what kind of room it is like a living room versus an office so with that in mind what did users tell us for currency it was fairly intuitive they said listen never confuse a one dollar bill for a 100 bill don't guess if you're not certain say nothing so in this case this is a pretty clear signal for precision now the easy thing to do is at first glance take that lesson and carry it over to objects but there's some nuance here imagine we have an object and it could be an arm chair or it could be a couch if we're very high precision we may say neither because we're not sure and from users we learned that actually either answer has some value even if it's imprecise so whether it's an armchair or it's a couch i know that this is a pretty big object and this room might be a living room so we actually want to balance between the two here so taking a big step back getting this kind of user feedback about development is critical do it during design during development during testing not after release it takes longer but you make a better product okay if you're excited to start writing your own accessibility apps with computer vision then dive into ml kit we have barcode scanning ocr and object detection and if you or someone you know is bundled low vision please consider lookout next i'm pleased to introduce my colleague sagar hi i'm sagar i work in machine perception team in google research my team's mission is to help machines understand the world like humans do we work on live transcribe and sound notifications live transcribe is an app for deaf and hard of hearing people to get captions for real-time conversations in over 80 different languages like transcribe is a great example of using ml technologies together with good ux research to provide meaningful experiences it uses automatic speech recognition of curse as well as models to detect speech versus other audio and also a sound event detection model to provide sound chips to the user i want to share the story of how live transcribe came about meet dmitry he's a computer science researcher who has been working on speech recognition for over three decades dmitry typically relied on lip reading and a professional caption for communications as good as he is with lip reading we would often struggle to have impromptu conversations around the water cooler liberating alone is only about 50 accurate and professional captioners are not always present as asr technology improved and became more pervasive dimitri launched an experiment with the team to create an app for real-time captioning that android app uses google's cloud speech-to-text api to turn that water cooler chat into captions on the screen of his phone together with dmitry's decades of experience in speech technology the team i traded on the app and improved it to a point where he started using it daily for personal and professional conversations dmitry was excited to try more things in the app motivated by his insights as well as enthusiasm we started looking at what else we could do to make the app better what if the app could interpret more than just spoken conversation a few years ago we released a data set called audio set which allows developers to recognize over 600 different sound event classes in their applications the easiest way to get started with audio set is to start with a pre-trained model from that data set for that you can directly integrate our open search pre-trained model called yamnet to understand different sound events here is a use case demonstrating how critical this non-speech information can be dimitri recalls a story of how one day he was asleep and the smoke alarm was active he could not hear it but luckily his neighbor came in and woke him up as our researchers talked to more potential users we heard countless stories like this and this led us to use haptic alerts vibrations delivered to your smartwatch to notify users about important sounds in their home so a few months ago we launched sound notifications it tells you when it hears sounds around your home like sirens dog parking babies crying water running or if somebody knocks on your door we also open source live transcribes android app engine so developers can customize it for their own use cases by integrating it within a bigger existing app or porting it to other platforms another such experiment of using ml for accessibility is project shuva shuba which means sign language in japanese is a project centered around sign language detection and understanding together with the nippon foundation and the chinese university of hong kong we created a web game for people to learn a bit of japanese and hong kong sign languages and since there are over 150 different sign languages in the world including american sign language indian sign language and many many more to better help developers create their own gesture and sign language understanding systems we have open sourced a gesture detection toolkit for developers you can check out demos of project shuba at the i o sandbox thank you for watching just like dimitri and our team's prototype live transcribe and its extensions with many existing bits and pieces we welcome developers to experiment with such different machine learning technologies to help the world become more accessible please reach out if you'd like to learn [Music] more you

Original Description

Explore three case studies covering Voice Access, Lookout, and Live Transcribe along with Sound Notifications. We also look at the intersection between Google’s machine learning research and Accessibility with takeaways for all Android developers. Resources: Voice Access → https://g.co/voiceaccess Lookout → https://goo.gle/lookout Google Developers ML Kit → https://goo.gle/3xyC3F6 Speakers: Tom Hume, Scott Adams, Sagar Savla Watch more: Google Developers at Google I/O 2021 Playlist → https://goo.gle/io21-GoogleDevelopers All Google I/O 2021 Technical Sessions → https://goo.gle/io21-technicalsessions All Google I/O 2021 Sessions → https://goo.gle/io21-allsessions Subscribe to Google Developers → https://goo.gle/developers #GoogleIO #Accessibility #ML/AI product: Cloud - AI and Machine Learning - AI Platform; event: Google I/O 2021; fullname: Tom Hume, Scott Adams, Sagar Savla; re_ty: Premiere;
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Google for Developers · Google for Developers · 0 of 60

← Previous Next →
1 Developer Journey - Sunnyvale DSC Summit ‘19
Developer Journey - Sunnyvale DSC Summit ‘19
Google for Developers
2 How Google is working with students - Sunnyvale DSC Summit ‘19
How Google is working with students - Sunnyvale DSC Summit ‘19
Google for Developers
3 Starting your career in the Cloud - Sunnyvale DSC Summit ‘19
Starting your career in the Cloud - Sunnyvale DSC Summit ‘19
Google for Developers
4 The Solution Challenge  - Sunnyvale DSC Summit ‘19
The Solution Challenge - Sunnyvale DSC Summit ‘19
Google for Developers
5 Firebase - Sunnyvale DSC Summit ‘19
Firebase - Sunnyvale DSC Summit ‘19
Google for Developers
6 Cloud Hero - Sunnyvale DSC Summit ‘19
Cloud Hero - Sunnyvale DSC Summit ‘19
Google for Developers
7 Panel discussion  - Sunnyvale DSC Summit ‘19
Panel discussion - Sunnyvale DSC Summit ‘19
Google for Developers
8 The art of negotiation - Sunnyvale DSC Summit ‘19
The art of negotiation - Sunnyvale DSC Summit ‘19
Google for Developers
9 Courage to care, solve and share - Sunnyvale DSC Summit ‘19
Courage to care, solve and share - Sunnyvale DSC Summit ‘19
Google for Developers
10 Version 9 of Angular, Glass Enterprise Edition 2, path to DX deprecation, & more!
Version 9 of Angular, Glass Enterprise Edition 2, path to DX deprecation, & more!
Google for Developers
11 [DEPRECATING] Introducing a new series (Assistant for Developers Pro Tips)
[DEPRECATING] Introducing a new series (Assistant for Developers Pro Tips)
Google for Developers
12 Detecting memory bugs with HWASan, Bazel 2.1, Next ‘20 session guide, & more!
Detecting memory bugs with HWASan, Bazel 2.1, Next ‘20 session guide, & more!
Google for Developers
13 Why Podcast.app chose a .app domain name
Why Podcast.app chose a .app domain name
Google for Developers
14 Machine Learning Bootcamp Jakarta 2019
Machine Learning Bootcamp Jakarta 2019
Google for Developers
15 Android Studio 3.6, Android 11 Developer Preview, Kubeflow 1.0, & more!
Android Studio 3.6, Android 11 Developer Preview, Kubeflow 1.0, & more!
Google for Developers
16 [DEPRECATING]  Importance of community (Assistant on Air)
[DEPRECATING] Importance of community (Assistant on Air)
Google for Developers
17 Why the Flutter team switched from .io to a .dev domain name
Why the Flutter team switched from .io to a .dev domain name
Google for Developers
18 3 website-building tips from .dev creators
3 website-building tips from .dev creators
Google for Developers
19 Why NimbleDroid chose a .app domain name
Why NimbleDroid chose a .app domain name
Google for Developers
20 Android Platform Codelab, Bazel 2.2, Maps Android Utility Library v1.0, & more!
Android Platform Codelab, Bazel 2.2, Maps Android Utility Library v1.0, & more!
Google for Developers
21 Google for Games Developer Summit: A free, digital experience for game developers
Google for Games Developer Summit: A free, digital experience for game developers
Google for Developers
22 Inspecting Home Graph (Assistant for Developers Pro Tips)
Inspecting Home Graph (Assistant for Developers Pro Tips)
Google for Developers
23 Google for Games Developer Summit Keynote
Google for Games Developer Summit Keynote
Google for Developers
24 Stadia Games & Entertainment presents: Keys to a great game pitch (Google Games Dev Summit)
Stadia Games & Entertainment presents: Keys to a great game pitch (Google Games Dev Summit)
Google for Developers
25 Empowering game developers with Stadia R&D (Google Games Dev Summit)
Empowering game developers with Stadia R&D (Google Games Dev Summit)
Google for Developers
26 Supercharging discoverability with Stadia (Google Games Dev Summit)
Supercharging discoverability with Stadia (Google Games Dev Summit)
Google for Developers
27 Stadia Games & Entertainment presents: Creating for content creators (Google Games Dev Summit)
Stadia Games & Entertainment presents: Creating for content creators (Google Games Dev Summit)
Google for Developers
28 Bringing Destiny to Stadia: A postmortem (Google Games Dev Summit)
Bringing Destiny to Stadia: A postmortem (Google Games Dev Summit)
Google for Developers
29 Live Captioning in Google Slides
Live Captioning in Google Slides
Google for Developers
30 [DEPRECATING]  User engagement for the Google Assistant
[DEPRECATING] User engagement for the Google Assistant
Google for Developers
31 TensorFlow Dev Summit ‘20, Google for Games Dev Summit, Cloud AI Platform Pipelines, & much more!
TensorFlow Dev Summit ‘20, Google for Games Dev Summit, Cloud AI Platform Pipelines, & much more!
Google for Developers
32 Top 5 from the TensorFlow Dev Summit 2020
Top 5 from the TensorFlow Dev Summit 2020
Google for Developers
33 Developer Student Clubs 2019 Turkey Leads Summit
Developer Student Clubs 2019 Turkey Leads Summit
Google for Developers
34 Building simpler payment experiences | Google Pay Plugin for Magento 2
Building simpler payment experiences | Google Pay Plugin for Magento 2
Google for Developers
35 Become A Developer Student Club Lead
Become A Developer Student Club Lead
Google for Developers
36 Firebase Kotlin Extensions, ARM apps on the Android Emulator, Angular v9.1, & more!
Firebase Kotlin Extensions, ARM apps on the Android Emulator, Angular v9.1, & more!
Google for Developers
37 Test suite for Smart Home (Assistant for Developers Pro Tips)
Test suite for Smart Home (Assistant for Developers Pro Tips)
Google for Developers
38 Google Play updates, Bazel 3.0, Business Console for Google Pay, & more!
Google Play updates, Bazel 3.0, Business Console for Google Pay, & more!
Google for Developers
39 How to use error logs (Assistant for Developers Pro Tips)
How to use error logs (Assistant for Developers Pro Tips)
Google for Developers
40 Contact Center AI, Android Studio 4.1 Canary 5, TensorFlow QAT API, & more!
Contact Center AI, Android Studio 4.1 Canary 5, TensorFlow QAT API, & more!
Google for Developers
41 WebView DevTools, Kotlin meets gRPC, Flutter CodePen support, & more! (Episode 200)
WebView DevTools, Kotlin meets gRPC, Flutter CodePen support, & more! (Episode 200)
Google for Developers
42 Offline handling for Smart Home (Assistant for Developers Pro Tips)
Offline handling for Smart Home (Assistant for Developers Pro Tips)
Google for Developers
43 Android 11 Dev Preview 3, Google Fonts for Flutter, Shielded VM, & more!
Android 11 Dev Preview 3, Google Fonts for Flutter, Shielded VM, & more!
Google for Developers
44 Machine Learning Foundations: Ep #1 - What is ML?
Machine Learning Foundations: Ep #1 - What is ML?
Google for Developers
45 Flutter web support updates, BigQuery materialized views, Cloud Spanner emulator, & more!
Flutter web support updates, BigQuery materialized views, Cloud Spanner emulator, & more!
Google for Developers
46 Computer vision by building a neural network with TensorFlow | Machine Learning Foundations
Computer vision by building a neural network with TensorFlow | Machine Learning Foundations
Google for Developers
47 Machine Learning Foundations: Ep #3 - Convolutions and pooling
Machine Learning Foundations: Ep #3 - Convolutions and pooling
Google for Developers
48 Android 11 Beta plans, Flutter 1.17, Dart 2.8, & much more!
Android 11 Beta plans, Flutter 1.17, Dart 2.8, & much more!
Google for Developers
49 Machine Learning Foundations: Ep #4 - Coding with Convolutional Neural Networks
Machine Learning Foundations: Ep #4 - Coding with Convolutional Neural Networks
Google for Developers
50 Google Developers ML Summit
Google Developers ML Summit
Google for Developers
51 Real-world image classification using convolutional neural networks | Machine Learning Foundations
Real-world image classification using convolutional neural networks | Machine Learning Foundations
Google for Developers
52 Adobe XD support for Flutter, Architecture Framework, temporary closures with Places API, & more!
Adobe XD support for Flutter, Architecture Framework, temporary closures with Places API, & more!
Google for Developers
53 Machine Learning Foundations: Ep #6 - Convolutional cats and dogs
Machine Learning Foundations: Ep #6 - Convolutional cats and dogs
Google for Developers
54 Machine Learning Foundations: Ep #7 - Image augmentation and overfitting
Machine Learning Foundations: Ep #7 - Image augmentation and overfitting
Google for Developers
55 Announcing Firebase Live, Flutter Day, Java 11 on Google Cloud Functions, & more!
Announcing Firebase Live, Flutter Day, Java 11 on Google Cloud Functions, & more!
Google for Developers
56 Machine Learning Foundations: Ep #8 - Tokenization for Natural Language Processing
Machine Learning Foundations: Ep #8 - Tokenization for Natural Language Processing
Google for Developers
57 Android 11 Beta, Google Play Asset Delivery, Firebase Crashlytics SDK, & much more!
Android 11 Beta, Google Play Asset Delivery, Firebase Crashlytics SDK, & much more!
Google for Developers
58 Natural Language Processing: Using sequencing APIs in TensorFlow | Machine Learning Foundations
Natural Language Processing: Using sequencing APIs in TensorFlow | Machine Learning Foundations
Google for Developers
59 Build a sarcasm classifier using NLP and TensorFlow | Machine Learning Foundations
Build a sarcasm classifier using NLP and TensorFlow | Machine Learning Foundations
Google for Developers
60 AR Realism with the ARCore Depth API
AR Realism with the ARCore Depth API
Google for Developers

Related AI Lessons

Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Learn the basics of the TypeScript compiler to write better JavaScript code
Medium · JavaScript
Stop Overfitting With Basically One Line of Code
Learn to prevent overfitting with a simple code tweak and understand the difference between Ridge and Lasso regression
Medium · AI
Stop Overfitting With Basically One Line of Code
Learn to prevent overfitting in machine learning models with a simple code tweak and understand the difference between Ridge and Lasso regression
Medium · Machine Learning
Stop Overfitting With Basically One Line of Code
Prevent overfitting in models with a simple code tweak, understanding the difference between Ridge and Lasso regression
Medium · Data Science
Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →