PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch · Beginner ·🧬 Deep Learning ·6y ago

Key Takeaways

Dolby Labs utilizes PyTorch for deep learning in audio processing, addressing challenges such as high dimensionality and temporal dependency, and achieving breakthroughs in speech coding and voice conversion. The solutions involve spectrogram-based representation, audio-specific networks, and models like WAV RNN and Sample RNN.

Full Transcript

[Music] hi my name is Vivek and I lead the AIT in module B so in the last ten years we have had great success in applying deep learning to audio but I'm more excited about the fact that this is just a beginning of what's possible I plan to go over some of the challenges of using deep learning for audio recent breakthroughs and some applications we are working on I hope after the presentation you'll find yourself curious about audio AI before I go into rest of my presentation I wanted to acknowledge how helpful part watch has been in our journey my team loves it because it's easy to use dynamic graphs make it easy to iterate over architectures and the support is excellent in the real case we found a bug the patch we provided was merged in days more personally my framework of charge used to be torch but when fighters came along I ended up learning Python just so that I could use PI torch that's me really glad I did that also wanted to give a quick shout out the speech print project which Dolby sponsoring the team in miele is developing a toolkit which would simplify doing speech research on top of pi torch check out their website for details here's a brief history of Dolby's innovation in the audio space for over 50 years we have created solutions which enhance audio experience starting with noise reduction in the 60s to creating technologies like the alba digital plus and Dalby act mas which are now standard for high quality audio there are over 11 billion devices with Dolby audio anytime you're listening to high-quality audio you're likely using to all this technology and in the last few years as deep learning is fundamentally changing how audio processing is done we are combining our audio expertise to create new state-of-the-art technologies talking about challenges a significant strength of deep learning is to work with draw samples without any handcrafted features but this gets very challenging with audio the first difficulty is dimensions consider a 64 by 64 pixel image it contains a lot of information you can identify the celebrity guess their age and their ethnicity but equivalent bytes of uncompressed audio is just enough for one word secondly audio has a structure at multiple time scales ranging from the scales of milliseconds to minutes each sample of audio is dependent on the sample preceding it but on a larger time scale is also dependent on the node being played or the phoneme being spoken modeling all these temporal dependencies becomes challenging thirdly perception which of these sound different in audio perception matters a lot even though this waveform look very different they sound exactly the same in most deep learning application l1 or l2 losses are usually good enough but they're very brittle when it comes to audio things like phase shift alignment errors or clock drifts make this measure completely break down so to deal with these challenges there are two basic approaches one is to use spectrogram based representation so that audio is transformed into an image like representation and we can use image inspired networks the other option is to use networks designed specifically for audio which is what I'm gonna focus on in the next few slides three years ago there was a breakthrough in speed generation or audio generation to order regressive models were developed with generated audio on a sample by sample basis both these models use slightly different approaches wavenet used dilated convolution where as sample RNN from mila use a multi rate RNN but what's important here is both these architectures were designed specifically for audio and handled the high dimensionality and the multi-level temporal dependency of audio more recently we have had models like WAV RNN and WAV glow which also generate audio on a sample by sample basis all these models were able to achieve a naturalness which was significantly better than all the prior approaches in fact these approaches were so powerful they led to a breakthrough in speech coding and by speech coding I mean speech compression in the last two years both Google and Dalby have published works that drastically improve speech coding while Google's focus has been on low bitrate our focus has been on high quality audio describing what we do in audio coding is always challenging so I'm boring an analogy that our partners at Netflix used to describe video coding she is Mary Kondo the author of life-changing magic of tidying up she has added decluttering show on Netflix and the approach she uses for decluttering is to pick up each item and discard everything which does not give joy and after you have discarded most of your positions she has a great method of folding everything into squares so that they can be efficiently packed and we do something similar in speech coding we analyzed to identify what is essential discarding everything else then we pack this bits in a way which is the most efficient and on the decoder side we unpack the bits and reconstruct the speech this way of encoding decoding has been used for decades but at really low bitrate when we have discarded a lot of information it's hard to synthesize speech which is high quality but now we're deep learning we have powerful generative models which can generate high-quality speech which is natural sounding giving your joy back getting a bit deeper the first year of sample RNN is an MLP which is done which is then connected to a stack of GRU rnas running at different time resolutions the lowest layer is running on a sample resolution whereas the topmost layer is running on a 10 millisecond or 160 sample resolution the idea being that these are an ends focus on a different level of abstraction phoneme identity on the top fine details on the bottom and this is the way it is able to manage the multi-level temporal dependency of audio without conditioning sample Aaron and babbles which is producing sounds which vaguely sound like speech but does not make any sense the control sample or an end the condition at using quantize recorded parameters from the bit stream the bit stream is generated using an internal vocoder which is able to capture the essence of speech at really low bitrate if you're interested in learning more we have a poster please check it out here are the listening test results mr white band is the current state of the art codec which sorry mr white band is the current codec which is being used in our cell phones silk is the current state of the art codec which at lower bitrate is able to generate a quality better than mr white band our solution sample our own on even at 6.4 kilobits per second we were able to achieve a quality which was comparable or better than silk at 16 kilobits per second just to give you an idea how significant this is the last breakthrough which happened in speech coding was over 30 years ago when kelp came out kelp reduce the bitrate by approximately 20 to 30 percent this is 2.5 times improvement this function this work and similar work done by Google is the biggest step function speech coding has ever seen now talking to now let's talk about a completely different application voice conversion so voice conversion is a technique where we can make somebody speech sound like that of a target speech oh the way we achieve it was by using an architecture similar to audio coding but instead of conditioning it on codec parameters we conditioned it on content and target speaker embeddings these target speaker embeddings end up learning the style of the target speaker like how they pronounce their phone names their fundamental frequency their accents our quality was much better than conventional voice conversion techniques and the results were published in interspace 2018 let me show you a quick demo the first audio is a source speaker which we would derive the content the next is a target whose style we are trying to emulate and finally is a synthesized speech which should sound like the target speech so the input speech those who hold the property think so too and so far it is fortunate dock target his flatteries delude and his professions of affection gratify you the synthesized speech those who hold the property think so too and so far it is fortunate amazing isn't it we are very excited about the potentials here so hopefully this has provided some connection using deep learning for audio some challenges and some recent developments and thank you PI thoughts for being awesome partners along the way hopefully have inspired some of you to be more curious and excited about the work happening in this area personally I'm really excited by the progress community has made but I'm more amazed by the fact that this is just the beginning and it's up to us to define where this technology takes us thank you if you are interested in learning more I will be hanging out next to our poster most of my team would be there as well also feel free to contact me on Twitter [Music] [Applause]

Original Description

Hear how Dolby Labs is using PyTorch to develop deep learning for audio, and learn about the challenges that audio AI presents and the breakthroughs and applications they’ve built at Dolby to push the field forward.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 44 of 60

1 What is PyTorch?
What is PyTorch?
PyTorch
2 PyTorch Tutorial: A Quick Preview
PyTorch Tutorial: A Quick Preview
PyTorch
3 PyTorch Summer Hackathon 2019
PyTorch Summer Hackathon 2019
PyTorch
4 Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
PyTorch
5 PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch
6 Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
PyTorch
7 Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
PyTorch
8 Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
PyTorch
9 Introduction to Machine Learning for Developers at F8 2019
Introduction to Machine Learning for Developers at F8 2019
PyTorch
10 Powered by PyTorch at F8 2019
Powered by PyTorch at F8 2019
PyTorch
11 Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
PyTorch
12 New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
PyTorch
13 PyTorch Developer Conference 2018: Recap
PyTorch Developer Conference 2018: Recap
PyTorch
14 PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch
15 PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch
16 PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch
17 PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch
18 PyTorch Developer Conference 2019 | Full Livestream
PyTorch Developer Conference 2019 | Full Livestream
PyTorch
19 PyTorch Developer Conference 2019: Recap
PyTorch Developer Conference 2019: Recap
PyTorch
20 PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch
21 What’s new in PyTorch 1.3 - Lin Qiao
What’s new in PyTorch 1.3 - Lin Qiao
PyTorch
22 PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch
23 Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
PyTorch
24 Quantization - Dmytro Dzhulgakov
Quantization - Dmytro Dzhulgakov
PyTorch
25 PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch
26 Apex -  Michael Carilli, NVIDIA
Apex - Michael Carilli, NVIDIA
PyTorch
27 Dataloader Design for PyTorch - Tongzhou Wang, MIT
Dataloader Design for PyTorch - Tongzhou Wang, MIT
PyTorch
28 Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
PyTorch
29 PyTorch Mobile - David Reiss
PyTorch Mobile - David Reiss
PyTorch
30 Model Interpretability with Captum - Narine Kokhilkyan
Model Interpretability with Captum - Narine Kokhilkyan
PyTorch
31 Detectron2 - Next Gen Object Detection Library - Yuxin Wu
Detectron2 - Next Gen Object Detection Library - Yuxin Wu
PyTorch
32 Speech Extensions to Fairseq - Dmytro Okhonko
Speech Extensions to Fairseq - Dmytro Okhonko
PyTorch
33 PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch
34 PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch
35 PyTorch in Robotics - Yisong Yue, Caltech
PyTorch in Robotics - Yisong Yue, Caltech
PyTorch
36 StanfordNLP - Yuhao Zhang, Stanford
StanfordNLP - Yuhao Zhang, Stanford
PyTorch
37 Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
PyTorch
38 Collaborative Natural Language Inference - Sasha Rush, Cornell
Collaborative Natural Language Inference - Sasha Rush, Cornell
PyTorch
39 Privacy Preserving AI - Andrew Trask, OpenMined
Privacy Preserving AI - Andrew Trask, OpenMined
PyTorch
40 CrypTen - Laurens van der Maaten
CrypTen - Laurens van der Maaten
PyTorch
41 PyTorch at Uber - Sidney Zhang, Uber
PyTorch at Uber - Sidney Zhang, Uber
PyTorch
42 PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch
43 PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch
PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch
45 PyTorch Developer Conference 2019 - Panel Discussion
PyTorch Developer Conference 2019 - Panel Discussion
PyTorch
46 Using deep learning and PyTorch to power next gen aircraft at Caltech
Using deep learning and PyTorch to power next gen aircraft at Caltech
PyTorch
47 Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
PyTorch
48 TorchScript and PyTorch JIT | Deep Dive
TorchScript and PyTorch JIT | Deep Dive
PyTorch
49 Announcing the PyTorch Global Summer Hackathon 2020
Announcing the PyTorch Global Summer Hackathon 2020
PyTorch
50 Opening Up the Black Box: Model Understanding with Captum and PyTorch
Opening Up the Black Box: Model Understanding with Captum and PyTorch
PyTorch
51 PyTorch Mobile Runtime for Android
PyTorch Mobile Runtime for Android
PyTorch
52 Torchvision in 5 minutes
Torchvision in 5 minutes
PyTorch
53 3D Deep Learning with PyTorch3D
3D Deep Learning with PyTorch3D
PyTorch
54 What is Torchtext?
What is Torchtext?
PyTorch
55 TorchAudio: A Quick Intro
TorchAudio: A Quick Intro
PyTorch
56 PyTorch Mobile Runtime for iOS
PyTorch Mobile Runtime for iOS
PyTorch
57 PySlowFast: Deep learning with Video
PySlowFast: Deep learning with Video
PyTorch
58 PyTorch Pruning | How it's Made by Michela Paganini
PyTorch Pruning | How it's Made by Michela Paganini
PyTorch
59 Measuring Fairness in Machine Learning Systems
Measuring Fairness in Machine Learning Systems
PyTorch
60 PyTorch for Hackathons
PyTorch for Hackathons
PyTorch

This video discusses Dolby Labs' use of PyTorch for deep learning in audio processing, including challenges, solutions, and breakthroughs in speech coding and voice conversion. Viewers can learn about the applications of deep learning in audio and the techniques used to overcome challenges. The video also covers the use of spectrogram-based representation, audio-specific networks, and models like WAV RNN and Sample RNN.

Key Takeaways
  1. Understand the challenges of audio processing
  2. Learn about spectrogram-based representation and audio-specific networks
  3. Study the WAV RNN and Sample RNN models
  4. Explore applications of deep learning in audio
  5. Implement deep learning pipelines for audio processing
💡 Deep learning can be used to improve audio processing by addressing challenges such as high dimensionality and temporal dependency, and achieving breakthroughs in speech coding and voice conversion.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →