Project Common Voice

Data Skeptic · Intermediate ·📄 Research Papers Explained ·8y ago

Skills: Research Methods90%Reading ML Papers80%Paper Reproduction70%RAG Basics60%Vector Stores50%

Key Takeaways

The video discusses Project Common Voice, an open-source speech recognition system developed by Mozilla, and its potential to democratize speech recognition technology. The project provides a large dataset of audio samples and corresponding text, allowing researchers and developers to build and improve speech recognition models.

Full Transcript

springboard the aptly-named springboard might be the company that helps you take a big jump in your career they're not just an online learning system where you get access to a curriculum and some quizzes and examples and things like that one of the things that's special about springboard is you can assign a one-on-one mentor they're going to meet just with you on a weekly basis how often have you struggled to understand something only to have a friend tell you in a very simple way how to do it take a look at all the mentors available on the springboard website there's people from all sorts of different backgrounds to connect with you can find the mentor who's right for you they've got financing available and they make a guarantee on getting you a job check out their site for the specifics on that but springboards really serious about helping people move ahead their placement rates have been sky-high so far and in addition to your mentor you'll have career coaches helping you figure out how to navigate the job market and find a place that's right for you I'm gonna give you a bitly link so we're only gonna say it once so grab a pen or B quickly your thumbs to type this into your phone pause if you need it head over to bitly vit ly / data science learn all one word all lowercase data science learn people finish the program by doing a capstone we've got a special offer here for the first two people who make it through to their capstone I'm gonna offer a personal hour of my time to sit with you have you present your capstone to me if you'd like I'll give you feedback and any advice we can spend the hour however you'd like but I'd love to see what your Capstone's turn out to be springboard alums have gone on to work at uber LinkedIn Facebook Amazon and many other places don't forget that job guarantee I mentioned to learn more about springboard visit that bitly link I mentioned which will be in the show notes data skeptic is the official podcast of data skeptic comm bringing a stories interviews and mini episodes on topics and data science machine learning statistics and artificial intelligence [Music] Celinda today on data skeptic we've got one of the people involved in something called Project common voice they're building out a system that's going to make available lots of different audio samples of people talking and also the text of what they said that'll help researchers build better speech recognition tools now I want to ask you something when you were a kid was there any speech recognition and any of your toys or devices or anything no so what changed within your lifetime and when did it change well I imagine the algorithms for understanding our speech have gotten better I actually think back in my day there was something where you could speak into a microphone and try and have a type but we didn't use it because we had heard God it had gotten bad reviews no we were I was pretty good at typing we had taken a beacon oh yeah you did maybe speaking so I'm very good adept typer mm-hmm no bug splats on the window shield what does that mean well there's a game where it's a good driving car and every time you have a typo it splatters a bug on the windshield I'm sorry you took a lot for granted assuming we would know that but okay yeah then like if you could type faster it's like a car driving and your speed goes up that's cool yes it's like you're racing if you could type faster and it's like a game that was the best part of maybe speaking everything else is like not as fun so speech recognition though obviously everyone's gonna still need to type that's gonna be a common input but voice as an input is becoming absolutely prolific right and I see you doing it with your phone quite a bit yeah sometimes it freezes though yeah well that's a longer discussion about why that is the point is you sort of set it correctly you were like oh I guess the algorithms have gotten better that's essentially true the algorithms have sort of gotten better but more than that the people who are successful with this stuff they have massive amounts of training data and computational power to build these models that with deep learning that are capable of recognizing speech so is that why it's gotten better because they have more data well I would say I mean there's like 20 good reasons why it's gotten better more data and the capacity to process more data or two of the big ones oh capacity okay can you build a model on 10,000 hours of speech text that's unheard of in the 90s but today it's it's practical although I must say it's something that only large companies can really do right now yes but the things like project common voice are turning that around where everyone can do some speech recognition stuff where is common voice base they're part of Mozilla oh yeah so here's something that's interesting to me the speech recognition you see in the market today is all proprietary now you can get it through api's people will expose it but you can't have their actual algorithm nor can you improve it so you know for certain languages we might be running behind it works really well in English I can testify to that I don't know if and I bet it works well in Chinese because Baidu is one of the big companies who made a lot of progress in speech recognition you mean Mandarin yeah what did I say you said Chinese oh yeah yeah Mandarin well what's the other language that a lot of Chinese people speak Cantonese oh yeah there's Cantonese Mandarin Mandarin and I don't know the others probably more yeah so I would bet those probably work well but something like Tagalog or I don't know some more exotic language I bet the speech recognition systems don't do well do you know anything about tonal languages well Vietnamese is tonal I can't really speak it well enough to even try using it so what I like about project commodores is it kind of democratizes this it makes that data set available to lots of researchers so if maybe you were someone who wanted to build a good technology that would interpret Vietnamese is a language if maybe the commercial offerings are running behind you have a decent opportunity to work on that given some of both the models they've released and this code base that they're building up now the code base is in lots of languages maybe it doesn't have a lot of Vietnamese in it but you could basically train the initial model and then maybe do some transfer learning so regardless I think this is a great tool that I'm glad to see is becoming open-source thanks to things like this in a very short time we'll just consider speech recognition to be like a utility it'll just be available for free in software development amazing yeah it really is it's a very transformative thing as having in our lifetime it's finally slowly happening here so I know that let's get into the interview and learn a little bit more about what project Commodores is doing let's hear it Andreea natal is a software engineer at Mozilla with over 17 years of experience in the development architecture and management of software projects he has extensive experience in software engineering having worked in various industries such as Internet legal government media Wall Street micro electronics and mobile Andrea's passionate about speech and audio technologies and has deployed a large number of award-winning applications across different platforms including mobile IVR and IOT he is particularly interested in building conversational agents speech enabled devices and voice controlled VR and AR experiences andre holds a bachelor's degree in information systems from the Universidad and hem be more um be in Sao Paulo Brazil Andre welcome to data skeptic to begin with I'd love to know a little bit about the project you've been working on right now we as you guys know so Mozilla is a non-profit open-source you know company everything we develop develop a bunch of things besides Firefox and right now I'm part of the speech recognition team one of the projects that I'm working on it is the Vista voice feel it is at Expo experiment to speech enable search engines on Firefox and also common voice it is initiative sure you know as for community to donate speech recordings you know so we could move our own speech recognition models and also you know give back those that data to the community of researchers should be so they could view their own speech models and etc those are the current words that were working on but in the future we have plans to also ship in a speech API Mozilla to have their own speech API so we call you know both your IP speech API in Firefox but also in a different you know projects like unload model and it could use on your own project so those are our current you know speech initiative is inside Mozilla we also have something called a deep speech it is a research project that we are building our own speech decoder using tensor flow we also have a web of things gateway it's going to be an IOT platform that is going to be nice kitchen table so yeah those are echoing speech initiatives quite a lot broke open neck was yeah yeah definitely very exciting too and I think cutting-edge voice has been becoming ubiquitous in the last few years and I see a lot of success in this space that was one of the things that attracted me to the project common voice because there are some challenges if maybe I wanted to take on my own speech project I'm not a big company before Common voice was there anyway an independent researcher could really get involved in any sort of interesting machine learning on speech work that is a project call at Vox Forge not sure if you heard about it is Detroit they try to do the same but I've got some attractions now is the moment as I said you know it's definite engineer it's hard to get data right so that's our goal we're not both people didn't know data collect from from home voice but also the models that we build they are going to be you know open available so not only data itself you're gonna change the data ourselves and ship the models so yeah we you also face at that same situation where always hard and you need to rely on certified companies and for independent developers it's super hard you need to either buy you know license it from New Orleans or from some other parts or the third-party companies or you need to use your build yourself so we are here to make that better to help the researchers and applications you spread out so that's how are you to meet goal well that's a very interesting idea the the cots will all of it actually but especially shipping the model so I as a data scientist might not need to train my own model I could start with yours maybe and that's sufficient for me or maybe I use it as the basis for transfer learning what do you mean when you say shipping the model am i getting your tensorflow code and objects that's exactly that's the goal our thing so for example in this deep speech is specifically the codes are their code is completely one percent on github so but in the future we're gonna also train at those models and cheat them no we're gonna make them available so you can just download our model and do as I said your transfer lauren iran improve the models we're gonna share the check points and just use that from there but in the case for voice fuel you are using cow the cow to use a speech toolkit that user also used upon their own ads we started from a free training the model for you are going to enhance and model and also make the models available for download so you and the work that's unique in the data science world specially for speech yeah that's our goal so to make it better and contribute back to the community you know mostly imagine all right we are one percent open-source we are made from the public to the public so that's our goal everything we're gonna build here is going to be open available everything open that's really exciting I think it's gonna open up a lot of opportunities for voice and applications I can't even predict yet maybe I'm curious to hear more about how you get the corpus that is project common voice how do you get all the recordings in and make them available the data itself the transcriptions and the corpus of text so users could speak it is in the easies inside of our repo maybe you could link the links with the repo here where it's business github.com / mo 0 / voice web that's the name for common voice I know can sound weird but that's the you know our name that before you know be name it as common was his voice - web actually the name of the repo so the corpus is all there you can just want to build with text send a pull request with the with the corpus of text and then we merge and that's it that makes available to the users the regarded data is basically users you know they are just saying the world that the saint is that we ask them to say and then we start adding our service I mean at some point into the end of this year we're gonna make all that data data collected available to download for the community that's the ultimate goal and also to add other languages as well right now we are using only English but for sure you're gonna and all support Brazilian Portuguese Spanish Frank Germany so let's start with me to go that's one of the problems with speech there's a lot of I mean not a lot of but it's possible to get corpus of speech in English already you can get you can have some lib speech or some other but for all the language is really hard so that's the goal something that's really novel and interesting to me I didn't even realize I could send you a pull request if I maybe I had sentences that I thought somehow were phonetically interesting and not in the corpus you might accept my poll and then someone would go out and say the thing I asked them to say and that would enhance the data set already like that people want to be those different sentences that's the goal oh that's outstanding in kind of an opportunity for like a linguist to maybe to make some contribution that's it if I wanted to contribute just my voice how would I go about doing that it is void org just go there and the plugins right in the browser so someone who wants to maybe doesn't even know how to code really well but would be willing ever eat some sentences they could be an official open-source contributor exactly no yeah you don't need to be a coder at all just browse to forestock mozilla.org and start reading you don't even need to contribute to read sentences if you just want to read to listen and validate that you can't you just just that because we have post the contributing platform so people can respond donate data but you also have the validating you know we need to validate do they aren't properly annotation so you could you cannot listen to sentences and just you know check if is correct or not so when we when we make this data available we're gonna also give them annotated right so you're gonna have properly validated data from the commute as well so this both crowdsourcing to record and also to validate so both validation recording are called sources oh very neat it would be kind of surprising to me if someone wanted to waste a lot of time and go and just make nonsense recordings so I you know or something like that to sabotage the project I bet everyone is pretty well intentioned in the recordings they provide almost 100% of the sentences are properly read they read exactly it writes exactly what you asked it for there's no kind of sabotage people saying netizens almost 100% are accurate readings of what you asked for it is awesome that's really good oh wow yeah I thought you'd at least have some errors just on people fiddling with their equipment or something but sounds like I guess maybe the easiness of the interface I experienced when I gave it a shot maybe that just makes it so simple for people so we are also you know probably we are going to ship a new version and enhance these new current version with some gamification you know capabilities we also have a lot of kids reading that so if you early project you know we're launching on to one month ago there is two going on so we're gonna support our languages we are getting a lot of contributions actually I don't know the exact numbers Brian now but I can say was a big hit since you're so broadly accepting who can contribute you're probably getting a wide variance of different types of voices which is really good you know old young male/female deep hi all these sorts of things reflecting the real world but you're also getting a variety of different technology and background noises and recording equipment of various quality and things like that that's probably a strength because a machine learning researcher needs a lot of good example cases but can you talk a little bit about the variants you see in the recording so far is there a big difference in noise quality and things like that yes I mean microphones background noises accents gender oh yes that's a hue tomato your opportunity right so we need that when I talk about speech we need you we need that we need the noise we need to generalize a lot of you know if the noise or not so yes absolutely right so you're having countries from the whole world so I know in the natural language processing world we we noticed that the larger the corpus the better the models get I assume there's some similar results here that the bigger the amount of recordings you get the more robust the models will become do you have any estimates or goals about how much recorded time the project hopes to get yes we are playing together 10,000 hours good-morrow production ready models so that's all you tomato or 10,000 hours why 10,000 because that's what you found in the deep learning literature you know the papers that we read elsewhere and also that I remembered by those deep speech they mentioned ten out ten thousand hours so we need some explainin areas less than that or two thousand three thousand we found it just you need more so we set these goals of ten thousand based on the literature well let's see you know if that keeps going the merge you get the better right yeah absolutely the more the better for sure yeah the more the better for sure getting back to the variety of different types of voices you get even within a language we generally see different accents or different sort of styles of speech do you have a sense of how many researchers will end up using this data set well they maybe first need to tag things by you know this is accent a and accent B or does the deep learning approaches to training seem to just kind of normalize across all of those things yes I believe the deep learning we can just we don't we don't need to tag accents right now I think the future is going to be useful and if you want to know build models localize the poor areas accidents you know areas of the country are serving but right now we don't have that yet what I agree is show that something that can be you know taking account in the future and maybe tag you know that Lego and build models you know for specific assets I'm sure that can get better but right now you're not doing that yeah I've been debating myself whether or not I think it's obviously it sounds like it's very useful but I feel like a lot of these deep learning systems would just learn different representations that are somewhat close you know and just account for the accent so it's sort of a neat way that the technology's doing automated feature recognition potentially yes all right so raising the test that we need we found the more variety that you have the better if you have practice for my brain my accent my Brazilian accent mix at least American accent words well you know what I mean I think that's - you are automatically we are also asking people the contributors to give us their own profile but it's not mandatory and that for sure what the hell but it's not mandatory well let's see that's that's something that we need to take an account for sure and when it comes time that the entire corpus becomes open sourced as well how do you see that happening will there be some zip file on that I can download or what would be the process for me to start researching with it access to the raw data first we're gonna do some polish try you know to get out from the corpus the valid data but yeah some things like that you're gonna download a big tar ball somewhere with all the data inside but also after we do some reviewing of the data out oh yeah that's the goal you're just going to download the raw files the PCM files with the transcriptions so you can train your own tensorflow or cloudy or something something else we also have imported so far we have this deep speech project and that we are running so we have some importers that serve this red is to train labour speed which three lips which data with Fisher data we're gonna have a right and importer to our own common voice but also for cloudy we that's something that I'm going to be working on you know create an importer for cloudy and RSVP that's how we call them howdy so you can just download the data and and train your box what is that system County I'm not familiar with that yes howdy it is an open-source toolkit should those speech recognition to carry on speech definition air colder it is actually a speech recognition decoder that uses deep neural Nets it is a protein you know that is transferred from by John Hopkins University if I recall correctly but yeah it is an open source speech dokie so we can build your own speech service using them it's really very powerful and we ship advice few years in it's definitely take a look yeah absolutely in a hover link to that in the show notes as well can you tell me a little bit more about deep speech I understand that you knew you're using the raw audio data is the input what are the outputs what is that model train to recognizer to predict just basically string after you train and you get your motto you're gonna put your your own raw data your a PCM file and I'm gonna give you a battery string about what was it coded basically that just put data I got to put your a string raw audio to text then write yes or audio and give it back the nutrients this is describe the description that's basically just that one of the things that's exciting to me about deep speech in the whole openness of the project is that I've noticed some of these systems I expect deep speech is one of them they're very good at everyday conversation but if you and I maybe had a very technical discussion about pharmaceuticals or something like that there'll be all these words that aren't necessarily common in a lot of examples so the translation might network is good when it comes to jargon so to speak but if that's if that's important to me if I'm going to study text to speech for pharmaceuticals how much does a project like deep speech bootstrap that effort in language modeling there are two different approach to do speech recognition that's a bunch of actually hidden Markov models and then I need the microphone displays deep neural Nets and then pure deep inner honest right we're gonna need a representation of those sentences which are doing pure deep in their own ads right because when you're using hidden Markov models plus deep neural Nets you have the phonemes map it as well so when you're doing pure end-to-end deep neural Nets you don't have phones at all when we want to need the language model covering those sentences there to you trying to talk like for example this example decimation and pharmaceutical you know situations even though if you have the data I mean the audio there the outputs Camaro trainer that we're gonna need that language model as well that's the difference between you know when you're using a hidden Markov model easy printer on a sedan and deep inner on that you know pure and try and recover neutral system you still need the language model right even if you have you if we're talking about training if you have that huge you know data corpus of speech a bunch of speech data there but you need to cover new sentences you don't need to require that if you have the forms cat read or in the deep inner in that case if you have them you know the data you know generalize are already you just need to include introduce your new sentences to the language model that's basically just text you don't need to record those phrases those centers that they want you know to support that's the beauty of speech resignation right so you just can keep putting text for sure if you have the only representation of those sentences as well is going to be better but if you have the the phones cover it in taking the case of hidden Markov model is dependent on that it is enough so that's how we can you know improve the engine you can both records and have more sentences added to their model I mean recordings you know we train the model keep adapting with new synthesis what do you offer can basically just import text to the language model covering those new sentences you know you have two different ways to keep the model better keep the engine battery itself so I know I appreciate you took the time out of a busy schedule to meet with me today for the interview and a lot of that centers around some major announcements Mozilla has just had can you tell me about some of the exciting new features that went out recently so yeah we have this voice feel like extension that you just ship it it is basically a test pilot program that we have Mozilla Firefox we have this test pilot program that is basically an experiment that install a weather station or install on your browser I'm just going to speech enable search engines basically Google the club and Yahoo you're gonna be able to do web search you using your voice and that's the first one in the future you're going to be shipping the API as well web speech API inside the browser so you can you know using javascript speech enable your web pages or website and also as a separate note module so you could use your own projects you know that are not proud or wise so those are the right will the current ones also common voice it's just ship a deep speech or will research project and also our web of things gateway you're gonna ship a platform for IOT so you could also use speech if you want to you have your own IOT all your own you know pitch enable splitter at home you can just go use the rest but I find a microphone and you're gonna you don't have your own IOT platform at home so yeah those are the current to your project that you have so far so I imagine the choice for node is that it's a great language that'll run in the browser and server side and you can transpile in many different ways and so it's very robust i like the idea of using the node extension maybe in the back end of my website if i'm doing some offline processing but also making it available in the future where people who come to my web site can I can take advantage of that API for them to record in the browser and things like that with that in mind can you tell me a little bit about how you connect your tensorflow model via node actually are talking about online speech recognition ok so we're gonna have an endpoint or web service and pointing that is running inside our own Mozilla's you know and cloud and you talk with this PC we just you know push raw PCM and or in this case that we are supporting right now is opals data we just push you know you do do a post with passing these data these are the data and I give you back the results so there's no model that you are writing just encapsulates the car which it is you know this web service what we have installed in our cloud premises the decoder the disk specific case for voice fit is cloudy and you communicate with that through this web service using node but in the future and when you ship deep speech is basically you're going to be able to run offline as well and also has plans of shipping these services that are running on one shipping offline in the future we're gonna have both are we gonna have our planning on online trying to answer your specific question right now you're not using just a flow yes for voice feel for this event designer point I have online will just you using cloudy basically just Sarah our in our entire infrastructure how these service with our own models and build endpoint of a web service endpoint so you could just connect to it and put your data basically that simple was that very easy very easy for this specific implementation for voice so we use a pre training model from API Tory I they made available inside County repo and model and you started from that and wanna incremented model even though you know keep improving that model the point here is everything it is affair is available modules github you can go there and have access to 100% open source everything you are doing that's excellent voice in my opinion were right in the sort of Industrial Revolution of it you know the technology is finally good enough that it works with deep learning and we're starting to have people figuring out the right yoots use cases but one thing that's a little unclear to me is you know where does that service come from is it a part of an operating system do I build it myself and clearly common voice is going to change some of the game there because now it's democratized it's open anyone can use it is there a vision Mozilla has for how this technology should exist is it going to be part of the browser where is is voice recognition essentially going to be as a service so that's a good question so right now we are using the browser right because these are a flagship product we have that already for it was you know way easier to just put in the browser but I agree with you is a bigger tools right so we're gonna have a mobile we also have gonna smart or just at home or also have this boom off smart speakers at home I mean the new generation if you see if you knew the revs interacting with devices we are growing up with speech right they just expect to work doesn't matter where so I have a clear vision that in the future every device is going to be speech enabled your TV your watch everything every device your refrigerators but that's going to be transparent to the user right you know and also for the user doesn't matter online or offline he just expects to work well you need to take privacy counsel you need to be sure that you know you are now you know extrapolating users and rights to not you know share these data but they are just expect work so you're starting with the browser because it's our flagship project have other ideas we are trying to explore new ideas in the near future but that's our goal you know two speeds going to be ubiquitous and now is the browser the way you interact with the web is going to change it's changing right so conversational you know interfaces I am it's pretty clear that's going to be the future right you just chat with your agent and doesn't matter if it's going to be the browser or a v2 assistant or something in your TV your core so we have some experience um ideas of you know delivering a more conversational interface and that start with speech right for sure you can text the the front hand for a conversational interface if you can call like that it's going to be speech right so we start with speech but in the future for sure you can start adding you know conversational interfaces and natural language really understanding so those are the next Madison but every so it's going to be one person to be able to the speech and the first is the the front door for that excellent yeah I share your vision of the future and I'm so grateful to see that there's a fully open project that's gonna get enable just about anybody who wants to work with speech to do so yeah so in terms of timelines people can obviously make contributions already in the ways we've talked about I'd encourage everyone to go at least record a few sentences get your voice represented in the corpus on top of that when do you think timeline wise researchers should start checking the site and looking for how they might benefit from the project for komal voice that deadlines the end of this year around the same we're gonna be you know sharing data I start sharing data that we captured that we know the user is donated actually sorry for voice few is already there you just you know encourage everyone to go to test peridot from the Fox calm and install voice feel and give a shot why not give a try and see how it works but those are open you know just to contribute and also deep speech if you are more data scientist you know more into deep learning and training models and have some GPUs you know to spare and test just go to github.com flash Moses led a deep speech and start contribute your finding issues so those are the three ones I have so far excellent we'll have links to all those in shownotes and Andre give me a heads up later in the year when that launch happens I'll make an announcement on the show so people can head over and possibly start consuming that absolutely I'm happy I'm very happy for for that having you back so yeah well thank you so much for taking the time to come on and share a little bit about all the exciting projects going on data skeptic is a listener-supported program to support the show visit data skeptic comm and click on the membership tab [Music]

Original Description

Thanks to our sponsor Springboard. In this week's episode, guest Andre Natal from Mozilla joins our host, Kyle Polich, to discuss a couple exciting new developments in open source speech recognition systems, which include Project Common Voice. In June 2017, Mozilla launched a new open source project, Common Voice, a novel complementary project to the TensorFlow-based DeepSpeech implementation. DeepSpeech is a deep learning-based voice recognition system that was designed by Baidu, which they describe in greater detail in their research paper. DeepSpeech is a speech-to-text engine, and Mozilla hopes that, in the future, they can use Common Voice data to train their DeepSpeech engine.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 59 of 60

← Previous Next →

Data Skeptic book giveaway contest winner selection

Data Skeptic book giveaway contest winner selection

OpenHouse - Front end and API overview

OpenHouse - Front end and API overview

OpenHouse Crawling with AWS Lambda

OpenHouse Crawling with AWS Lambda

[MINI] Logistic Regression on Audio Data

[MINI] Logistic Regression on Audio Data

Data Provenance and Reproducibility with Pachyderm

Data Provenance and Reproducibility with Pachyderm

[MINI] Primer on Deep Learning

[MINI] Primer on Deep Learning

Big Data Tools and Trends

Big Data Tools and Trends

[MINI] Automated Feature Engineering

[MINI] Automated Feature Engineering

The Data Refuge Project

The Data Refuge Project

[MINI] The Perceptron

[MINI] The Perceptron

[MINI] Feed Forward Neural Networks

[MINI] Feed Forward Neural Networks

Data Science at Patreon

Data Science at Patreon

[MINI] Backpropagation

[MINI] Backpropagation

[MINI] Generative Adversarial Networks

[MINI] Generative Adversarial Networks

[MINI] AdaBoost

[MINI] AdaBoost

[MINI] The Bootstrap

[MINI] The Bootstrap

[MINI] Gini Coefficients

[MINI] Gini Coefficients

[MINI] Random Forest

[MINI] Random Forest

[MINI] Heteroskedasticity

[MINI] Heteroskedasticity

Urban Congestion

Urban Congestion

[MINI] The CAP Theorem

[MINI] The CAP Theorem

Unstructured Data for Finance

Unstructured Data for Finance

Detecting Terrorists with Facial Recognition?

Detecting Terrorists with Facial Recognition?

Predictive Models on Random Data

Predictive Models on Random Data

[MINI] F1 Score

[MINI] F1 Score

Machine Learning on Images with Noisy Human-centric Labels

Machine Learning on Images with Noisy Human-centric Labels

The Library Problem

The Library Problem

Stealing Models from the Cloud

Stealing Models from the Cloud

Data Science at eHarmony

Data Science at eHarmony

Multiple Comparisons and Conversion Optimization

Multiple Comparisons and Conversion Optimization

Election Predictions

Election Predictions

[MINI] Calculating Feature Importance

[MINI] Calculating Feature Importance

MS Connect Conference

MS Connect Conference

The Police Data and the Data Driven Justice Initiatives

The Police Data and the Data Driven Justice Initiatives

Studying Competition and Gender Through Chess

Studying Competition and Gender Through Chess

[MINI] Goodhart's Law

[MINI] Goodhart's Law

Trusting Machine Learning Models with LIME

Trusting Machine Learning Models with LIME

Predictive Policing

Predictive Policing

Mutli-Agent Diverse Generative Adversarial Networks

Mutli-Agent Diverse Generative Adversarial Networks

[MINI] Convolutional Neural Networks

[MINI] Convolutional Neural Networks

Unsupervised Depth Perception

Unsupervised Depth Perception

[MINI] Max-pooling

[MINI] Max-pooling

Activation Functions

Activation Functions

[MINI] The Vanishing Gradient

[MINI] The Vanishing Gradient

Estimating Sheep Pain with Facial Recognition

Estimating Sheep Pain with Facial Recognition

[MINI] Conditional Independence

[MINI] Conditional Independence

MINI: Bayesian Belief Networks

MINI: Bayesian Belief Networks

Project Common Voice

Project Common Voice

[MINI] Recurrent Neural Networks

[MINI] Recurrent Neural Networks

The video discusses Project Common Voice, an open-source speech recognition system, and its potential to democratize speech recognition technology. The project provides a large dataset of audio samples and corresponding text, allowing researchers and developers to build and improve speech recognition models. The video also covers the technical aspects of the project, including the use of deep learning and TensorFlow.

Key Takeaways

Browse to forestock.mozilla.org to contribute to the project
Read and validate sentences without needing to contribute
Contribute to record and validate data
Use the project's dataset to train and fine-tune speech recognition models
Access the project's code base on GitHub
Use the Deep Speech project and its importers for training and data preparation

💡 The project's goal is to make speech recognition a utility that is available for free in software development, and to make it easier for independent researchers and developers to work on speech recognition projects.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Research Methods

View skill →

Mechanics of Materials III: Beam Bending

Mechanics of Materials III: Beam Bending

Inaugural Lecture: Juliane Reinecke

Inaugural Lecture: Juliane Reinecke

Saïd Business School, University of Oxford

Hands-On Learning: How and Why You Should Build a Home Lab

Hands-On Learning: How and Why You Should Build a Home Lab

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

Does Water Swirl the Other Way in the Southern Hemisphere?

Does Water Swirl the Other Way in the Southern Hemisphere?

Undergraduate Research Forum 2026

Undergraduate Research Forum 2026

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Beyond Big Vendors: ERP Systems Explained #shorts

Digital Transformation with Eric Kimberling