[MINI] Logistic Regression on Audio Data

Data Skeptic · Intermediate ·⚡ Algorithms & Data Structures ·9y ago

Skills: Supervised Learning80%ML Maths Basics70%Data Literacy60%

Key Takeaways

The video discusses the application of logistic regression to audio data analysis, specifically for speaker recognition, using techniques such as Fast Fourier Transform (FFT) and feature engineering.

Full Transcript

[Music] data skeptic features interviews with experts on topics related to data science all through the eye of scientific skepticism our topic for today is logistic regression all right quick update actually it's a correction well sort of a correction before we start Linda do you recall a little while ago you had said when I was talking about the pan distribution you said oh that means fish yeah you told me that someone send in a note and of course I was right cuz I took French yes we got a correction so thanks to at allori MD who uh wrote in and let me know that pan does mean fish in French but in my defense the pan distribution is named for Simeon pan a guy so his last name is pan I don't think that makes his last name translate to fish maybe he was a Fisher person but I was pretty indignant in telling you that it wasn't F I know I didn't want to correct you on your own show so I left it to your listeners to correct oh great so thanks for letting me put it out there yeah well anyway you know last week we were talking about Dropout and we're going to have a lot of deep learning topics in the coming weeks and months to come that's going to be a big focus of the mini episodes in 2017 so in order to get into deep learning I thought we should take a step back actually and talk about a much more simpler method that helps provide some of the Baseline for what deep learning is all about you know what deep learning is you know what Kyle has mentioned this topic to me many times times and he actually has a book right now on our one of our side tables that says deep learning yeah have you been reading it I don't know what deep learning is I just saw the title well we're going to get into it bit by bit because it's hard to jump into all once and we're going to start by talking about logistic regression logistic regression is very similar to linear regression that we've talked about before so I want to start there and talk about home prices also because our big episode on the openhouse project should be coming out in just about a month's time so we're going to be doing some regression on that data but if you had to guess how one arrives at the price of a home what are some of the things that go into it uh well how big is the home how many bedrooms bathrooms what air location is key how old is the home right so these are all good features that we'd like to use let's stick to the three basic ones number of beds number of baths and square footage well location yeah that's important you have to quantify it though in some way and that's a little tricky it definitely matters obviously but let's just say home buying was very simple oh and also you throw in an intercept which is you know sort of the Baseline price of any home the point being we have a model of how this should work it's W1 * the number of bedrooms plus W2 * the number of bathrooms plus W3 just keep adding weights to all these different features you have right now coming up with the best weights the best parameter values that's what logistic regression is going to help you do we're now actually at the point here where I think this is as deep as we go on day data skeptic mini episodes I would love to talk about wilk's theorem and the Quasi Newton method and some stuff like that I don't know that that goes great over the era the objective we want to get into is what is logistic regression not how do you do it there's good material already in existence on how so what is it it's the process of doing linear regression finding through maximum likelihood the best values of those parameters to suit your model and so what the algorithm will do for you is find the best possible values given your input data dat but that would be for like linear regression if you wanted to determine the price of the home so logistic regression takes that one step further by applying your output to a logistic function which is a special type of curve and I sent you a picture of it if you want to pull up that email and this will be in the show Notes too it's basically just a way of mapping any data any numeric value into a confined range between 0o and one you see that curve mhm looks like s can you tell what its minimum and maximum values are y AIS goes from 0 to one yep and on the x axis goes from -6 to positive 6 and that's just on the one I showed you there actually it could go negative anything to positive anything so the nice part about the logistic function is it Maps any numeric value to that range 0 to one and it maintains like the ordering and stuff like that it also kind of spreads out the data that are near zero so the the difference between two really big numbers will be very small whereas the difference between two small numbers will be kind of big so that's very useful to help the machine learning algorithms find the solution faster so I wanted to do a project here I wanted to kick something off that we're going to work on probably this whole year and make little improvements to it and uh you know we're an audio podcast so it's hard to do some visual stuff so I thought wouldn't it be cool if I built something that listens to this podcast and decides who's speaking me or you okay well I expected a bigger reaction out of you can you like maybe act really impressed wow I had problems distinguishing our voes I didn't know when you were talking it was actually me yeah you know a lot of people write in they say it's very confusing they can't tell who's talking so this is going to be a really great project clearly um I took a first stab at it this afternoon now let's talk through how I did this I sent you another image there can you describe what you see maybe uh say the name of that thing if you know what it is I don't know but it looks like little wavelengths like you know you know when an earthquake happens looks like that yes so you're describing wave forms can you tell in the picture you see there and I'll include this in the show Notes too which one of us is speaking the first line or the second line which is you well I don't know they don't give it X and Y AIS how am I supposed to know what it means oh so good question the x axis is time and the y- axis is amplitude meaning how loud the sound was at that moment well on average you're louder than me so I feel like maybe you're the top is that true or you're the bottom the bottom has more range which to me signifies volume so the bottom person was their max volume was greater than the top person but the top person looks like on average was louder generally so what's your final answer here I can't tell I don't know just guess it really doesn't matter it matters to me you don't have to overthink this it doesn't matter it matters to me oh on or we're keeping score then I'll tally it up at the end of the year how many of these arbitrary guesses you get correct I want to be smart so I managed to okay okay you're going to be the bottom one final answer fine man I had it first right the first time but you then gave me this suspicious look and I Kyle tricked me yeah well it's actually a little bit unfair of a question because the the reality is it's all you can't really tell who's talking from a waveform I mean you might in a tricky way like oh Kyle talks more so that one must be him or he shouts a lot so that must be him you know there could be some clue like that but in general when you talk pretty evenly and you normalize it as I do in post- production then people's waveforms pretty much look the same however as you were pointing out people can tell the difference between our voices right oh yeah yeah so what is it that the human ear is doing that maybe we can mimic in a machine learning algorithm I don't know what's the difference between your voice and mine well ours is connected to our brain so our brain is actually interpreting so it's like picking up the emotion MH just your exact mood and tone and all those things that humans are actually genetically programmed to pick up yeah so I like where you're going with this thinking that mentioning the brain because ultimately yeah I'm going to solve this with deep learning and we're going to get to that in future episodes but I want to do a quick and dirty thing because we just had yesterday afternoon to put together the show I wanted to use logistic regression so what I did I used the fast 4E transformation and uh look we can't do a mini episode on everything okay so if you want to learn about fft go check the datas skeptic.com blog because this week I'll put up a bunch of stuff for anyone that doesn't know what it is so you can learn about fft but basically fft converts a signal from the time domain as we see here these are the amplitudes of our speaking through time and it converts it into the frequency domain so we can see the different tones your voice uses okay what's your understanding of the word frequency in this situation is that volume no it's distinctively not volume what is it then frequency is like pitch so even though we're not singing you know we're not being like or whatever yeah well even without tuning and singing and all that we're still saying pitches right our voices are still at certain levels sure and mine is mine is you know very masculine and uh pretty tough and awesome voice you know and I'm an alto Now voice is also very complicated if you scroll down the other thing and again this will be in the show notes on the left are some more waveforms and on the right are their frequency Spectra the top one there is you is that's your frequency Spectra and the the bottom one is me now can you tell much difference between those two the top red one and the bottom red one well since I've been taking vocalists sounds like my voice has more Dynamics I'm just kidding maybe there are yeah visible differences in our frequencies of our voices so I thought hey without looking into the real things that make us different you know like the tamber of our voice or our Cadence and different things like that can I just do kind of a cheat can I do this quick and dirty predict is it Linda or is it not Linda meaning is it Kyle and can I do that just by frequencies alone so what I did what do you think I mean we're talking about it so obviously I did it but how successful was I I mean I don't know there isn't enough data for me to feel like I know what's going on well guess what kind of accuracy logistic aggression might give us it looks like at least 50% better than that I got 84% accuracy wow now there's a couple of caveats here cuz I I'm not yet convinced I'm still a little skeptical I may have overfit this data if the regression is picking up on some artifact of our recording like we use different mics you and I and maybe there's something about the frequency Spectra that it's it's sort of cheating in a way you know it's picking up an artifact because this should work no matter what microphones we use in theory at least that's the goal MH I don't want to just detect the hardware I want to detect our actual voices so I got a little bit more digging to do keep your eye on the datas skeptic.com blog I'm going I post a lot of cool stuff about how I did this feature engineering and stuff like that when we started talking about regression it was in terms of houses right and there were obvious things like the number of bedrooms number of bathrooms what do you think I use as my features to describe our voices I thought you said pitches you're right I did use pitches but do you have a sense of how I used it in like a formula no well I did in the first example it was beds times a weight plus baths times a weight I end up breaking down that frequency spectrum into little buckets and I said you know from this herts to some upper Herz that's the first bucket and then I made the second bucket and so on so forth so how did you pick these buckets so first thing I did was I trimmed down the whole frequency spectrum to anything under 1,000 Hertz here I'll put in a tone right here of what 1,000 HZ sounds like so that's a little bit high pitched right yeah and our voices even though they do resonate above that we don't talk in that high of a pitch so I knew all that information is not going to be very useful so I trimmed it down and then I made some little buckets and I experimented with this and I I ended up coming down to to 10 buckets so like 10 little bands that are just calculating how much of that frequency appears in each of our voices so then each of those bands is a numeric value that describes our voices and what I'm asking the logistic regression to do is to say hey if I tell you the values of how much of these frequency bands were being used can you tell me who the speaker is of who's most commonly in that bucket yeah pretty much yeah oh okay so then what it does is each of those gets a weight in the buckets where it's most characteristic of my voice since I'm trying to predict is it Linda's Voice or not if it's my voice then it goes negative right because it's like no this is very un Linda likee and in the buckets where your voice typically shows up it goes positive cuz they're very Linda like mhm and the the Spectrum where either neither of us speak or we both speak those end up around zero typically so yeah this ended up if uh you go to the show notes again at the site you can see one of those bands is very characteristic of saying this is probably Linda's voice so that's what it latched on to so how logistic regression solves this then is it finds the optimal weight for each of those bands and then it Maps it through the logistic function that S curve to say is it a towards a one or a zero and you just select a breaking point that says a Above This it looks like it's Linda below it it looks like it's Kyle so now I just think I should start talking in your podcast really funny like hello like that yeah and that would screw it up pretty bad yeah and then that was then I could confirm it was an overit or or maybe we can try and trick it maybe you talk like me and I talk like you let's do that for a minute you you pretend to be me I'm Kyle that didn't work hi I'm Linda this is how I talk this is Kyle P I'm I'm gonna go do yoga and sit with the bird sit with the bird you're not doing a very good impression of me I'm not hearing myself in you all right well I'm going to work on my impression of you you work on your impression of me uh everyone who's listening should go to datas skeptic.com check out the show notes for more details on this because this is going to be a little project we're going to Tinker with throughout the year I'm going to try different algorithms to see what works better we're going to mess around with the features and one also thing to note that I didn't do here was I only looked at single you know samples in time and said what are the frequencies here I didn't look at anything that's a chain of how the frequencies flow together that'll potentially help capture how our voice evolves you know like I tend to maybe get excited at the end of a sentence and there are features of our speech like that you can pick up on with something like a recurrent neural network which will be a future topic for us so we covered a lot of ground here tonight mainly talked about the project let's summarize logistic regression since that was supposed to be our purpose here it's basically linear regression it also involves this logistic curve so you can do a classification typically a binary classification what's a classification ah classification is when your output isn't just a number like in the case of the house like what's the best price for the house classification is like true false or is Linda is not Linda so binary usually binary but you can do some tricks to manage things that are not binary if it's true that you supply a data set that has a good information content in it then the algorithm should be able to find the best weights that it can multiply by that input data and run it through that logistic mapping to get this value between zero and one that says How likely it belongs to the class we're interested in so logistic regression always runs it between zero and one yep and that's important for classification because ultimately you have to say it's in bucket a or bucket B okay and that's why it's slightly different from just a a vanilla regression where You' get a number out logistic regression is kind of like your first algorithm you learn when you start doing machine learning which funnily enough we've never covered on the show yet but so there's a lot of topics I haven't touched on you know we're almost four years in here we can't do everything but all things in time maybe okay so logistic regression is the bad you run it through an equation and you get a number between zero and one and generally it can tell tell you if it's yes or no an answer to your question yep it's one of the most simple classification algorithms but a very effective one too and it's nice because in theory those coefficients are interpretable which means that in the case of our project here of determining who's speaking I know what the coefficients represent they represent how much it's relying on each of those frequency bands to predict Linda or Kyle so in an early attempt when I was working on this it was had the bands it was picking and how it weigh them were like all over the board there were some low some high they were like really noisy which told me that I had actually overfit the data in that case so I had to keep working because the values I got intuitively didn't make sense to meh and I'll talk more about this in some blog post but I ended up trimming it down that's where I got to the bucketing idea and that helped a little bit now it's still only you know 84% accurate it's not a great fit but I didn't necessarily expect a great fit because this is just a simple approach and it's our Baseline now so we can compare against it as we try other methods and stuff so what's another method that you're going to try well I've already tried XG boost which is another mini episode we're going to have to do at some point that's a good one it did better but mostly I want to get into using deep learning in particular I want to try out recurrent neural networks because those are good for data that has some sort of time component to it so wait is logistic regression the beginning of neural networks or no no logistic regression technically doesn't really have anything to do with neural networks aside from the fact fact that it's this basic linear optimization um using the uh logistic function is it deep learning no but there are some pieces of it that get used in deep learning so stay tuned everybody I'm I'm going to try and theme almost all the mini episodes for the next couple months around this project in fact next time you and I record Linda I'm going to have some samples of the mistakes that it makes so we'll listen to audio where it correctly said oh this is Linda talking and then we'll listen listen to some of the audio where you are talking and it thinks it's me maybe we can learn some cool stuff that way yeah I should throw my voice more what do you think it'll do with the Impressions M we we got to get good Impressions and see if we can fool the AUG I got to practice all right so we'll go practice everyone else will check out the site and we'll see you next Friday next time before we go I want to share a quick word from our sponsor this week which is the data science Association I'm here with Serene who's going to tell us about an exciting upcoming conference in Dallas in February Serene welcome to data skeptic hi Kyle thank you so much for having me on the show we are a data science Association which is a nonprofit organization working to accelerate growth in the data science Community last year we hosted the first SoCal data science conference and it was extremely successful in bringing together 20 guest speakers and 600 data science enthusiasts this year we are proud to kick off our data science conference series starting in Dallas it will be held at the University of Texas at Dallas on February 18th we have 200 participants already signed up to learn more about The Cutting Edge Technologies and the developments in data science we will also be hosting a panel to discuss the talent gap between Academia and Industry we have panelists from consulting firms and data science boot camps such as galvaniz and metas to share their point of views on what are the essential skills for success in this field we feel that this subject will be extremely valuable for career Changers and students seeking datadriven careers and right now we are offering our early bird tickets for $40 only until January 29th so if you're interested in joining you may register at Dallas datas science. eventbrite.com as an added bonus the First Data skeptic listener to email me a copy of their Eventbrite confirmation will get a free data skeptic t-shirt so hurry on over and make that happen once again that's Dallas data science. eventbrite.com you can find that in the show notes as well or at datas skeptic.com the conference once again is Saturday February 18th at the University of Texas at Dallas

Original Description

Logistic Regression is a popular classification algorithm. In this episode we discuss how it can be used to determine if an audio clip represents one of two given speakers. It assumes an output variable (isLinhda) is a linear combination of available features, which are spectral bands in the discussion on this episode.   Keep an eye on the dataskeptic.com blog this week as we post more details about this project.   Thanks to our sponsor this week, the Data Science Association.  Please check out their upcoming conference in Dallas on Saturday, February 18th, 2017 via the link below.   dallasdatascience.eventbrite.com The figures below are referenced during the episode.     The top waveform is Linh Da, the bottom is Kyle.  We use the same order below.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 4 of 60

← Previous Next →

Data Skeptic book giveaway contest winner selection

Data Skeptic book giveaway contest winner selection

OpenHouse - Front end and API overview

OpenHouse - Front end and API overview

OpenHouse Crawling with AWS Lambda

OpenHouse Crawling with AWS Lambda

[MINI] Logistic Regression on Audio Data

[MINI] Logistic Regression on Audio Data

Data Provenance and Reproducibility with Pachyderm

Data Provenance and Reproducibility with Pachyderm

[MINI] Primer on Deep Learning

[MINI] Primer on Deep Learning

Big Data Tools and Trends

Big Data Tools and Trends

[MINI] Automated Feature Engineering

[MINI] Automated Feature Engineering

The Data Refuge Project

The Data Refuge Project

[MINI] The Perceptron

[MINI] The Perceptron

[MINI] Feed Forward Neural Networks

[MINI] Feed Forward Neural Networks

Data Science at Patreon

Data Science at Patreon

[MINI] Backpropagation

[MINI] Backpropagation

[MINI] Generative Adversarial Networks

[MINI] Generative Adversarial Networks

[MINI] AdaBoost

[MINI] AdaBoost

[MINI] The Bootstrap

[MINI] The Bootstrap

[MINI] Gini Coefficients

[MINI] Gini Coefficients

[MINI] Random Forest

[MINI] Random Forest

[MINI] Heteroskedasticity

[MINI] Heteroskedasticity

Urban Congestion

Urban Congestion

[MINI] The CAP Theorem

[MINI] The CAP Theorem

Unstructured Data for Finance

Unstructured Data for Finance

Detecting Terrorists with Facial Recognition?

Detecting Terrorists with Facial Recognition?

Predictive Models on Random Data

Predictive Models on Random Data

[MINI] F1 Score

[MINI] F1 Score

Machine Learning on Images with Noisy Human-centric Labels

Machine Learning on Images with Noisy Human-centric Labels

The Library Problem

The Library Problem

Stealing Models from the Cloud

Stealing Models from the Cloud

Data Science at eHarmony

Data Science at eHarmony

Multiple Comparisons and Conversion Optimization

Multiple Comparisons and Conversion Optimization

Election Predictions

Election Predictions

[MINI] Calculating Feature Importance

[MINI] Calculating Feature Importance

MS Connect Conference

MS Connect Conference

The Police Data and the Data Driven Justice Initiatives

The Police Data and the Data Driven Justice Initiatives

Studying Competition and Gender Through Chess

Studying Competition and Gender Through Chess

[MINI] Goodhart's Law

[MINI] Goodhart's Law

Trusting Machine Learning Models with LIME

Trusting Machine Learning Models with LIME

Predictive Policing

Predictive Policing

Mutli-Agent Diverse Generative Adversarial Networks

Mutli-Agent Diverse Generative Adversarial Networks

[MINI] Convolutional Neural Networks

[MINI] Convolutional Neural Networks

Unsupervised Depth Perception

Unsupervised Depth Perception

[MINI] Max-pooling

[MINI] Max-pooling

Activation Functions

Activation Functions

[MINI] The Vanishing Gradient

[MINI] The Vanishing Gradient

Estimating Sheep Pain with Facial Recognition

Estimating Sheep Pain with Facial Recognition

[MINI] Conditional Independence

[MINI] Conditional Independence

MINI: Bayesian Belief Networks

MINI: Bayesian Belief Networks

Project Common Voice

Project Common Voice

[MINI] Recurrent Neural Networks

[MINI] Recurrent Neural Networks

This video teaches how to apply logistic regression to audio data analysis for speaker recognition, covering techniques such as FFT and feature engineering. The speaker shares their experience with the project, including the challenges and results. The video is useful for those interested in machine learning and audio data analysis.

Key Takeaways

Use Fast Fourier Transform (FFT) to convert audio signal from time domain to frequency domain
Apply logistic regression to predict speaker identity based on frequency spectra
Break down frequency spectrum into buckets
Assign weights to each bucket based on speaker's voice
Map feature values through logistic function to predict speaker

💡 Logistic regression can be used for speaker recognition by applying it to audio data analysis, specifically by using frequency spectra as features.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Supervised Learning

View skill →

Auto Machine Learning (AutoML) Using AutoGluon

Auto Machine Learning (AutoML) Using AutoGluon

Coding the SARIMA Model : Time Series Talk

Coding the SARIMA Model : Time Series Talk

Code With Me : Logistic Regression (from scratch) !

Code With Me : Logistic Regression (from scratch) !

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Predicting the Winning Team with Machine Learning

Predicting the Winning Team with Machine Learning

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Related AI Lessons

Bloom Filters, Explained Properly

Learn how Bloom filters work and their benefits, including tiny memory and blazing speed, in exchange for potential false positives.

Dev.to · Daksh Gargas

Prefix Sums: The Preprocessing Trick That Makes Range Queries Instant

Learn how prefix sums enable instant range queries in arrays, boosting performance in various applications

Medium · Programming

I Thought I Was Ready for the Interview — Then One Simple Math Question Destroyed Me

A simple math question can destroy a developer's interview, highlighting the importance of being prepared for unexpected questions

Medium · Programming

Week 2(Day 10): LeetCode Two Pointers(slow & fast): Remove Duplicates from Sorted Array (Brute…

Learn to remove duplicates from a sorted array using the two pointers technique, improving from brute force to optimized solutions

Medium · Python

Stump Grinder Carbide Wheel Grinds Hardwood To Chips

Innoforge Studio