Multiple Comparisons and Conversion Optimization

Data Skeptic · Advanced ·🔐 Cybersecurity ·9y ago

Skills: ML Pipelines70%

Key Takeaways

Multiple comparisons and conversion optimization are discussed with a focus on p-hacking and statistical analysis best practices

Full Transcript

[Music] latest skeptic features interviews with experts on topics related to data science all through the eye of scientific skepticism Chris stuio is a former highfrequency Trader physicist startup founder and bodyguard he's currently a gambler and the director of data science at wingify he's a strong believer in automated reasoning formal methods and the power of computers to liberate us from the tyranny of humans making decisions you might like his blog at Christ too.com which will be in the show notes Chris welcome to data skeptic hey how's it going very good happy to have you here so I caught your talk multiple comparisons which has the subtitle I like the most make your boss happy with false positives guaranteed and I thought it'd be really interesting to explore um some of the topics you covered there about how people can abuse their data but before we get into that I thought maybe it might be good to start out for the discussion of is sort of the proper ways of doing these sorts of things uh maybe could you give us a definition of what conversion rate optimization is to you okay so conversion rate optimization is typically done on the web so you're running a website let's say you're selling beer on the internet you have some cohort let's say 20,000 people visited your site you want each of these people to purchase beer but not everyone's going to actually purchase beer via your website some of them get bored some of them don't trust you there's various reasons why they some of them just can't figure out your checkout form there's various reasons why people are not buying beer on your site some do however the number of people who buy beer divided by the number of people who visit your site that's your conversion rate it's basically the fraction of visitors who actually purchase somewhere between zero and 100 I presume one would hope the conversion rate optimization is the process of making changes to your site to increase this number so typically it'll be things like making a user experience coming up with a compelling product image that kind of a thing how do you know the breath of options you might want to test for I could think of anything from colors to font size to page layout it seems sort of endless so generally speaking the best way to do this is to actually try to do some user research so for instance you may want to do things like ask customers or a focus group what they think of your site and then if they say something like it's hard to use that'll give you different ideas of what to change than if they say you seem really shady so that that's user research you can also watch visitor Behavior like here's the thing a lot of times people aren't going to say something mean like your site's really hard to use but if you watch them going through the checkout form then you might observe that it takes them seven minutes to actually enter a credit card for instance at wingify one of our products is visitor recording so you can actually watch people out in the wild and watch what they do and sometimes what they do is going to surprise you they'll just not be going through the flow you wanted them to or you expected like here's a concrete example you see people hunting around a page they're clicking everywhere they're trying really hard to actually purchase your product but they're just not finding that checkout button that gives you sort of the hint that you should make that checkout button big red Point arrows at it flash Etc whereas people are jumping away really fast maybe they just don't trust you that much or your product offering doesn't seem so compelling so that's the thing you need to change yeah I'm personally frustrated when I'm at a site and I'm trying to give them money and they're making that difficult for some reason can you tell me a little bit about some of the products and services wingify has that can help people to do site optimization of this nature so our primary product is the AB testing product so once you have an idea you can use our product to run an AB test which is to say you show half of your visitors the new modified version you show the other half of the visitors the old version and then we'll measure the conversion rate on each version of the page tell you which one has a higher conversion rate and then uh you were mentioning there that setting up kind of a proper control and test group I've also seen people who have just rolled out a blanket of changes like let's say we're uh we want to color a button so you want to try five different colors all at once how important is it to keep that control group that original example to compare against so ultimately it's in principle it's not important what's important is that of the things you're actively considering choosing you include them so if you're 100% certain you're not going to run the control then go ahead exclude it it doesn't matter whereas if you have a a site that's working for you and you want to know should I make this change to an already good site then the question is is my are my Chang is actually better than what I already have I mean look there are some websites that are just broken they're a disaster they're terrible and you know for certain you have to get rid of them if that's the case don't bother including it in the control since you know it's gone anyway good point so when it comes to the ab test you're going to conduct I've seen some AB tests where there's an obvious winner something is just you know 50% better went from broken to fixed but a lot of times these changes are subtle and might be in the single or or fractional digits of a percent Improvement and then of course we've got to use statistics to do a test generally I see people using this Alpha value of 0.05 can you share your perspective on on what that is and how a non-statistician should interpret that value so that's a tricky question for the most part I have not successfully explained that number to non-statisticians I'll tell you what that number means the number means the following imagine hypothetically you weren't running an AB test you're actually running an AA test which means that you have two groups a and green a like red a and green a where the web page is exactly the same so you just ran a real AB test which had I don't know 5,000 samples in each variation 5,000 samples for control 5,000 samples for the variation now what the P value tells you is imagine you you actually ran an AA test where the both web pages were identical the P value is the probability of seeing a result at least as Extreme as what you just saw If You observe that b was I don't know 2 and a half% better than a and that this 2 and a half% had let's say a 5% P value then essentially had you run the AA test the odds of seeing a result like this or maybe more extreme like six or 7% would have been only 5% it it's definitely a mouthful and it's kind of backwards from what you actually want yeah uh I think you were right earlier in saying it it is hard to explain it to non-statisticians I had someone once Tell Me That by the time they finished explaining it the person's a qualified statistician yeah so It's tricky because what people ask actually want is the probability that b is better than a and unfortunately the frequent is P value is only indirectly related to that so it's certainly the case that a a smaller P value means it's more likely to be true but nevertheless a 5% P value doesn't mean a 95% chance of that thing being true so there's a a famous quote I think everyone knows I had to Google its origin it's by a man named William Edward Hixon and it says if at first you don't succeed try try again so uh perhaps if was doing the ab test you described of the red a and the green a and I didn't find any effect I accepted the null hypothesis you know maybe I wasn't looking at the right group perhaps I should segment that by age or gender or income or any of the other variables I have after all we've got really fast computers that can slice and dice this whole data set why shouldn't I just test every possible segment I can conceive of so here's where it gets difficult had you been running a classical AA test there was a 5% chance of seeing a result at least this crazy this remember this is 5% of that result so now suppose you slice and dice your data so now you have the data sliced two ways if you run that exact same test you'll have a 5% chance of an error in group a and then another 5% chance of an error in group b so now the odds of you finding something interesting in one of these groups is now 10% and then if you want to slice and dice it further it'll go up 15 20 25% depending on how many different ways you slice it so the issue with slicing is that each time you run the test there's a chance you're making an error and if you do it too many times that chance just goes up and up seems like eventually yeah I would almost certainly be able to find something so I know Common techniques to deal with that situation might be to adjust that P value with something like the Bon feron or the sidak correction um what's your perspective on using techniques like that so so let me let me explain basically what bonferroni is sure that'd be great so let's say you have a P value cut off of 5% let's say you were running 10 tests you now have roughly speaking a 50% chance of seeing a false positive in at least one of these tests what bonferroni does is says aha let's start with a lower P value cut off so instead of having a 5% cuto off bonferoni says let's have a 0.5% cut off then if you add this 05% chance of an error across 10 tests it adds up to 5% now the odds of having a false positive in one of the tests adds up to five because any individual test has only a 05% % chance so the problem with this is that to get a 05% chance of having a false positive you then needed to use a lot more samples per variation let's say it took you 5,000 samples per variation to run a two-way test now if you wanted to run five two-way tests instead of needing 10,000 samples you need so so previously if if you ran one two-way test a versus B you needed 10,000 5,000 for a 5,000 for B now if you're running five of these a versus b a versus C all the way up to a versus F if you were to just do things naively without running bond feroni you would need 30,000 samples which is to say you've got six groups Each of which needs 5,000 however because you're imposing this sharper P value cut off of 0.5 instead of 5% you now need a lot more samples per variation instead of being 5,000 it's more like 8,000 or something of that nature so now instead of needing 6 * 5,000 samples you suddenly need 6 * 8,000 that's a 48,000 which is obviously a lot higher so if you're Google I highly recommend you do this kind of thing but lots of other people don't have quite that scale of traffic where 48,000 people is just something you can throw away on a whim therefore if you're one of those people then you got to be a little smarter about it in that case where I'm testing let's say the color of a button I might in my experiment uh need to accept the null hypothesis and uh if I've uh maybe listened to all the advice you've just given I know know not to go in there and necessarily just segment my data until I found something what about declaring the button color a failure and conducting a brand new analysis on a new data set that tests let's say fonts this time or size or spacing or something like that do you think I'm still running into the possibility of too many comparisons even when I generate an entirely new experiment you're always running into that possibility and essentially every time you run a test assuming the null hypothesis were true there's a 5% chance of an error each time you run a test there's always a chance you're going to get erroneous results I mean this is this is an important thing to recognize with all testing programs the goal is not to make the right decision every time the goal is in a sequence of 20 or 30 decisions your goal is to choose the right one most of the time I mean put it this way let's say you had half false positive rates half of them were false positives and the other half were true positives and you deployed all these variations that would be a perfectly acceptable outcome the reason is that the false positives aren't costing you money and the true positives are making you money so ultimately the goal is actually is less to avoid false positives that's not even possible the goal is fundamentally to come up with a decision-making procedure that gets the right answer most of the time and just increases your money in the long run so I myself have been thinking a lot about Network effects recently a network effect meaning like how much the people in a system value it so if we think of like Airbnb on the day it launched and it had let's say the very first listing that's not a very great site probably a low conversion rate cuz all the visitors won't necessarily want that exact one offering they have and at some point they hadit a critical mass where now most of the time you probably might find what you're looking for so there was presumably I don't know necessarily any insight or information about Airbnb but I would guess it ramped up and that the more available places to stay the better the site became and the conversion rate was kind of rising so if if a site like that was doing an AB test while they were growing how do they be careful that they're measuring an actual Improvement by let's say the size of that button not just measuring that a rising tide lifts All Ships so ultimately a network effect of this sort I mean I mean this is a tricky question oh absolutely and ultimately I guess to set up a good experiment you have to make stronger assumptions and just hope that the product manager who's coming up with these assumptions is both correct and not duding himself and not pcking first of all a rising tide in general is going to be okay the rising tide of having just more selection in the market that should affect A and B equally in principle what'll actually happen if you do this is that let's say you've made a change that causes this Rising tide you're actually going to undercount the effects of it in in a test like this let's say you make a change that causes listings to go up listings will go up in both version a and in version B now because they've gone up on both sides a is seeing an increase because of B so therefore the difference between a and b is going to be a lot smaller than it otherwise would have been had these two systems being completely isolated that actually just sort of makes the effects you see in the tests an underestimate of the true effect makes sense you mentioned a key term here I was hoping maybe you could Define what is packing so packing is the problem that arises when essentially humans get money from having a P value less than .005 so therefore they play games until they get one here's one example you've got scientists they are studying the effect of some policy or some medical treatment so they may look at 20 outcomes like they may look at whether this medicine reduces heart palpitations whether it reduces I don't know cardiac stress I'm not a doctor so I don't know many of these sure and then across all of these let's say two of them have a positive result so then what they report is a study saying guess what this medicine we tested it against these two effects and it had positive effects on both of them they don't actually report the fact that they ran the same test on 18 other categories and got nothing m so this kind of thing is packing so is setting up the methodology of a test in such a way that it gives an advantage to the version you want to win that type of thing here's actually a big issue that uh was pretty prevalent in the cro industry and actually still is so a fun fact about this conversion rate optimization industry is that you have agencies so these will be some experts in conversion rate optimization so let's say I'm running a website I'm really good at making chairs I run a website to sell my chairs I don't know a lot about about conversion rate optimization so I hire an agency sure the agencies claim to be good at this they'll typically have some model where if they get me X lift I have to pay them y money so what this means is that when the agency can report me lift with a statistically significant P value less than 0.005 then I have to pay them a certain amount of money so it's in the agency's best interest to P hack and do whatever they can to get me that number with that 5% P value yeah I would be quite afraid myself if I were to hire such a consultant because ultimately it's tough for someone to come back and say well I didn't make any Improvement and I'll pay me a a really big uh invoice I'm about to submit for all the time I spent so it seems like there's a certain amount of pressure here to deliver results and that people are very susceptible to packing now from my feedback uh about about the podcast I know that a lot of my listeners while I have some really technical smart data scientists I also have a lot of listeners that aren't necessarily that technical but they're uh business consumers that Leverage data scientists maybe even would be hiring a consultant like that how can someone who's in that position be appropriately skeptical of the results or ask the right questions to determine if the consultant has packed or not ultimately you have to be really careful uh and you kind of have to understand at least the basics of statistical methodology yourself so for instance when your consultant sets starts setting up a test for you you ask them like before the test even starts when are you going to end this and why if they come to you halfway through look it's winning we should declare a winner now I know you want to deploy that because that winner looks great but you have to say no run this to the end and let do what we said at the beginning similarly if the thing they gave you didn't win but then they come back and say well look it won for left-handed people on the west coast on warm days ignoring the fact that you can't actually Target that you should say but you didn't come to me early on and say it was going to win for this group if you want to do segmentation you really have to choose your seg before you start the test rather than at the end yeah good advice so I will link to in the show notes the video of your talk at crunch conference 2015 that I really enjoyed um I think everyone should go check that out as a good compliment to this interview one of the techniques you showed there that is something I do a lot and I appreciate that was included in your presentation was to generate a a random data set and then see how your methods work on it because by definition if you generated it randomly there should be no pattern I was curious if it was for demonstrative purposes or if you find that that's a common approach in your typical workflow that's for demonstrative purposes so if you have some methodology like like here's an example we have some cro agency who doesn't actually follow the stats they consider them informative but also they apply their own human layers on top of that generating random data is a great way to test a methodology whether that methodology is automated or involves humans so then what you would do is you would show these humans randomly generated graphs where you know there's no result or you know there is a result and see if their purported methodology actually works so if you generate a bunch of data where you know there's no difference and they say aha we would declare a winner right here you know that they're not being all that accurate so I know you've done a lot of work in uh high frequency trading uh so far we've talked about AB tests in this sort of I don't want to use the word simple because that makes it sound unvaluable it's definitely valuable but like test this color versus that color I could see where these sorts of tests are much more complicated in the stock field because you're probably looking for certain stocks that correlate that you know maybe all stocks in a certain industry rise and fall together and you're looking for how one predicts the other I could certainly do an exhaustive search of all possible pairs of all stocks in the stock market and look for a correlation which would be wrong how do I decide when to draw a line and what's an appropriate comparison in that context I mean now you're getting into a really tricky case the shares markets are a lot different than and conversion rate optimization first of all in the cro space in in almost all cases I shouldn't say all but in most cases you get the actual ingredients to do a statistical test you get independent identically distributed random variables what this means is that I'm browsing the website from my house you're browsing the website from your house so therefore what I do will not affect what you do you can see that's a little bit broken in a case like Airbnb but generally speaking it's true in the stock market you don't have that if I am purchasing Google Google I believe is included in spy so therefore I've affected the share price of spy now spy is a a broad index of 500 or so Securities so therefore what I just purchased now affects the market via other people purchasing those other Securities so you get this Cascade effect and this is why if you look at the stock market there's a tremendous correlation between almost all Securities essentially the whole Market moves up and down sometimes sectors move up and down so essentially you've given up independent identically distributed in the stock market you basically have to be a lot smarter and have to have a real opinion about what's going on I know that was a tricky question so I'll give you one more tricky one to just so we've got a pair when it comes to time series analysis like looking at stock market data how do you actually separate out a test and a training set um it seems like you're ultimately wanting to make some prediction about the stock but that involves something that's sequential in nature which is from like a conversion optimization which kind of has the filter to an endpoint so how do you decide how to build a test and training set when you've got time series data the simple way to do it is take all the data up to a year ago let's say maybe not a year ago but Ball Park then what you do is you run paper trading on that history so so you basically you fit the model to that history sorry you don't run paper trading on that you you fit the like you extract model parameters from that then you run run the strategy against the current year strictly paper trading and you see if you made money most places will then have a policy and this depends on where you're trading of course but then they'll say okay after you this you now have a thing that you think works so now we're going to do paper trading but on the future instead so you back test you had a strategy it seems to work you've back tested it against a year of data now they're going to run the strategy against new data over the next month and if it makes paper money over the next month then they'll say okay let's deploy this to production and will allocate a small amount of capital to it so then you see if it actually works with real money so you might be a trading desk at JP Morgan or Morgan Stanley and you'll have this strategy running with I don't know $200,000 an amount of money that no one gives it about so you check if it actually makes real money do that for a couple of months and then you slowly ramp up the amount of money in the strategy this is one way to go the other thing to do is you'll run stress tests against simulated data so you'll consider various things that are kind of plausible but haven't actually happened yet these are essentially sort of guesses about tail risk you'll want to see if I don't know a 2018 recession will totally destroy your model third thing so this is an economics concept but it's tremendously relevant to trading it's called the Lucas critique so what the Lucas critique says in economics is let's say you have a macro model this macro model should be derivable from a micro model when I say a macro model I mean something like a model of unemployment and inflation in the broader economy this should be derivable from some micro economics model of employment for software engineers and employment for construction workers and purchase of homes and so on so essentially what you want to determine is not only is this model not only have I fit two graphs together is there a reason why these graphs should fit together and then you ask yourself okay if this reason is true the model should be valid what are circumstances when this reason will stop being true ah it's really insightful so maybe I want should bring it back to the conversion rate optimization topic where touching on how do you deal with a circumstance in which there may be multiple things changing on a site all at once perhaps by Design um in which case maybe we can talk about if that's a good design or not or perhaps outside of the control of the person who's managing the actual AB test so ultimately these are all a judgment call so if you're running two simultaneous tests you should have compelling reasons to believe that they won't interact with each other and ultimately the value of your test is only as good as the value of that judgment if you're really not sure you should just delay them better to get solid results than to get wrong results quickly similarly things outside of your control can totally tweak things possibly you you started running a test and then on Wednesday you got featured on Oprah suddenly your traffic spiked all your traffic for the entire week is the Oprah crowd on Wednesday in this case you just got to throw it away and start over because fundamentally speaking people referred to your website by Oprah on a Wednesday day are not representative of your normal traffic so in that case you just got to accept that your test is ruin but you got a lot of traffic which is awesome start over try again you'd mention earlier a key point when we were talking about setting standards with that consultant so you ask them how long do they intend to run a test for how do they know when it completes it seems like time can play an important role in this take an e-commerce site I suspect that uh weekend purchases are behaviorally different from weekday purchases how do you balance those sorts of effects that might come in by day of week or perhaps even you know day of the month so these kinds of things ultimately you should always be running your tests like day of the week you should always run your tests for an integer number of weeks you start a test on Friday you should end it exactly seven days later and if you don't have enough visitors at that time then run it for one more week even if you get enough visitors the next day you're spending extra time but the fact is you do need to have the same number of Fridays and Thursday stays in the test one of the topics I wanted to ask you about was hierarchical models and how they can be applied in conversion rate optimization so hierarchical models are a fairly Advanced technique most people I would say should probably not do them unless you really know what you're doing basically what a hierarchical model does is it says essentially the following I have a major effect this could be variation a versus variation B now I also have sub effects that are caused by let's say five segment categories so I may have I don't know mobile desktop East Coast West Coast forign these might be my groups they're overlapping groups so I guess it would be mobile foreign desktop foreign and so on so what I'm saying is the effects that variation a has or variation B has on one of these groups is driven by a general factor which is specific only to A and B and then a special factor which is very specific to that group you then impose a Basi and prior that says most of the special factors are zero or close to it then you you'll typically run your Markov chain Monte Carlo most of the time what these results will say is the groups don't matter all there is is this General effect in the presence of strong evidence otherwise they may show that this special effect actually dominates so essentially hierarchical models are another way of saying go ahead and segment but we're going to sort of correct for the multiple comparison issue with these segments we're only going to accept that a segment has a real effect if there's strong evidence in favor of that much stronger than we would accept if you were just looking for the general effect so it seems to me that while conversion rate optimization is an important critical step in any business it's one that needs to be done with a lot of precision I appreciate you sharing all your insights on common mistakes people can make and how to avoid them I I think doing proper cro probably requires a good Suite of tools we touched earlier on some of the things wingify has could you uh maybe regroup or go over any ones we didn't touch on mention how some of your company these products could help a person who wants to get involved in cro okay well I I'll mention one thing we're doing which is sort of an opinionated Choice within the industry and it's actually lost us quite a few agency customers uh but we're sticking to it so one so here's a very important fact let's say I ran an experiment 10 times and I had one success the empirical conversion rate that is to say what happened during the test is a 10% conversion rate now let's say I were to run the experiment a million more times are you absolutely certain that out of the next million I'm going to get exactly 100,000 successes and the answer is of course not that would be crazy it could easily be 120,000 or even 200,000 what I'm trying to distinguish here between is the empirical conversion rate of what happened during the test and the true conversion rate which is what will happen if I run a million more experiments the actual answer you want out of a test is the true conversion rate not the empirical one the empirical one is evidence that the true one is a certain number but it's just evidence it's not an exact number so what this means is when you do the statistics there is absolutely uncertainty in the true conversion rates uncertainty in and uncertainty in the lift of variation a versus b or whatever so it's vitally important that your tool should report this uncertainty so this was a very opinionated choice we made where in our tool every place you would expect to see a number let's say B was 10% better than a we're actually giving you credible intervals which are the basy inv version of confidence inter interval we're saying B is between half a perent and 27% better than a this is actually kind of pissed off several of our agency customers because their clients don't want to see uncertainty their clients want to see an exact number and it's actually making it a lot harder for them to pack because sometimes agencies have been reporting scores when those actual numbers were like minus 10% to plus 11% which is actually not much of a win but the agency before before we were displaying that uncertainty they were just going to the client and saying here you go good number pay me now so reporting of uncertainty is truly I in my view the most important thing with any statistics you do where can people follow you online well of course mention your blog but Twitter anywhere else people can keep up with you uh yeah I'm on Twitter mostly that's just tweeting when I write a blog post yeah B basically just my blog sure um yeah if you would mind give us the URL and tell us a little bit about what people will find in your posts sure it's www. Chris sto.com that's CH r i s St uou CIO and it's basically a mix of programming mathematics and whatever else I happen to find interesting at the time excellent yeah I'm a frequent reader and really enjoy it so thanks for coming on the show thanks for having me and until next time I want to remind everyone to keep thinking skeptically of and with data for more on this episode visit datas skeptic.com if you enjoyed the show please give us a review on iTunes or Stitcher

Original Description

I'm joined by Chris Stucchio this week to discuss how deliberate or uninformed statistical practitioners can derive spurious and arbitrary results via multiple comparisons. We discuss p-hacking and a variety of other important lessons and tips for proper analysis. You can enjoy Chris's writing on his blog at chrisstucchio.com and you may also like his recent talk Multiple Comparisons: Make Your Boss Happy with False Positives, Guarenteed.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 36 of 60

← Previous Next →

Data Skeptic book giveaway contest winner selection

Data Skeptic book giveaway contest winner selection

OpenHouse - Front end and API overview

OpenHouse - Front end and API overview

OpenHouse Crawling with AWS Lambda

OpenHouse Crawling with AWS Lambda

[MINI] Logistic Regression on Audio Data

[MINI] Logistic Regression on Audio Data

Data Provenance and Reproducibility with Pachyderm

Data Provenance and Reproducibility with Pachyderm

[MINI] Primer on Deep Learning

[MINI] Primer on Deep Learning

Big Data Tools and Trends

Big Data Tools and Trends

[MINI] Automated Feature Engineering

[MINI] Automated Feature Engineering

The Data Refuge Project

The Data Refuge Project

[MINI] The Perceptron

[MINI] The Perceptron

[MINI] Feed Forward Neural Networks

[MINI] Feed Forward Neural Networks

Data Science at Patreon

Data Science at Patreon

[MINI] Backpropagation

[MINI] Backpropagation

[MINI] Generative Adversarial Networks

[MINI] Generative Adversarial Networks

[MINI] AdaBoost

[MINI] AdaBoost

[MINI] The Bootstrap

[MINI] The Bootstrap

[MINI] Gini Coefficients

[MINI] Gini Coefficients

[MINI] Random Forest

[MINI] Random Forest

[MINI] Heteroskedasticity

[MINI] Heteroskedasticity

Urban Congestion

Urban Congestion

[MINI] The CAP Theorem

[MINI] The CAP Theorem

Unstructured Data for Finance

Unstructured Data for Finance

Detecting Terrorists with Facial Recognition?

Detecting Terrorists with Facial Recognition?

Predictive Models on Random Data

Predictive Models on Random Data

[MINI] F1 Score

[MINI] F1 Score

Machine Learning on Images with Noisy Human-centric Labels

Machine Learning on Images with Noisy Human-centric Labels

The Library Problem

The Library Problem

Stealing Models from the Cloud

Stealing Models from the Cloud

Data Science at eHarmony

Data Science at eHarmony

Multiple Comparisons and Conversion Optimization

Multiple Comparisons and Conversion Optimization

Election Predictions

Election Predictions

[MINI] Calculating Feature Importance

[MINI] Calculating Feature Importance

MS Connect Conference

MS Connect Conference

The Police Data and the Data Driven Justice Initiatives

The Police Data and the Data Driven Justice Initiatives

Studying Competition and Gender Through Chess

Studying Competition and Gender Through Chess

[MINI] Goodhart's Law

[MINI] Goodhart's Law

Trusting Machine Learning Models with LIME

Trusting Machine Learning Models with LIME

Predictive Policing

Predictive Policing

Mutli-Agent Diverse Generative Adversarial Networks

Mutli-Agent Diverse Generative Adversarial Networks

[MINI] Convolutional Neural Networks

[MINI] Convolutional Neural Networks

Unsupervised Depth Perception

Unsupervised Depth Perception

[MINI] Max-pooling

[MINI] Max-pooling

Activation Functions

Activation Functions

[MINI] The Vanishing Gradient

[MINI] The Vanishing Gradient

Estimating Sheep Pain with Facial Recognition

Estimating Sheep Pain with Facial Recognition

[MINI] Conditional Independence

[MINI] Conditional Independence

MINI: Bayesian Belief Networks

MINI: Bayesian Belief Networks

Project Common Voice

Project Common Voice

[MINI] Recurrent Neural Networks

[MINI] Recurrent Neural Networks

This lesson covers the importance of proper statistical analysis in data-driven decision making, with a focus on avoiding p-hacking and false positives in multiple comparisons. It provides valuable insights for data analysts and cybersecurity professionals.

Key Takeaways

Understand the concept of multiple comparisons and its implications on statistical analysis
Learn to identify and avoid p-hacking in data analysis
Apply proper statistical techniques to avoid false positives
Consider the impact of statistical analysis on cybersecurity decisions

💡 Proper statistical analysis is crucial in avoiding false positives and ensuring reliable decision making in data-driven fields, including cybersecurity.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related AI Lessons

Dangerous Trends

Behavioral monitoring informs but cannot replace authorization boundaries, crucial for security

Dangerous Trends

Learn why behavioral monitoring is not a replacement for authorization boundaries in cybersecurity

Medium · Cybersecurity

Why Do Hackers and Cybersecurity Professionals Hide Their Identity?

Hackers and cybersecurity professionals hide their identities for security and privacy reasons, learn why and how it matters

Medium · Cybersecurity

Account Takeover Attacks: Why Authentication Isn’t the Real Problem

Learn why authentication isn't the main issue in account takeover attacks and how attackers steal trusted sessions

Dev.to · Sentinel Layer

You Think Your Card Declined by Mistake? It Might Be a 2026 Scam

Tolulope Michael