Model Explainability Forum

The TWIML AI Podcast with Sam Charrington · Advanced ·📐 ML Fundamentals ·5y ago

Skills: Research Methods90%ML Maths Basics80%Supervised Learning70%Unsupervised Learning60%

Key Takeaways

The video discusses the importance of model explainability in machine learning, particularly in critical environments, and explores various techniques and tools for achieving explainability, including counterfactual explanations, feature importance, and open-source toolkits like IBM 360 fairness 360 and explainability 360.

Full Transcript

all right everyone welcome to the broadcast we will be starting very shortly uh sit tight as we allow folks to join and uh get ready for a very exciting and interesting conversation alrighty hey everyone and welcome to the broadcast i'm sam charrington host of the twiml ai podcast i am super excited to be joined by an amazing panel to take on the topic of model explainability as the use of machine learning in critical environments like government and jurisprudence business and other settings has exploded over the past few years the requirement to understand the decisions that machine learning models are making has exploded as well this is accentuated by the increasing popularity of opaque models like deep learning and all of this has set the stage for a really thriving field of model explainability in both research and practice and in this panel discussion we're excited once again to bring together experts researchers and practitioners to share their unique perspectives and contributions in this field so thanks so much for joining us we'll be exploring a bunch of really interesting topics but before i introduce our panelists i want to send a huge thanks to our friends at ibm for supporting this discussion ibm is committed to educating and supporting data scientists and bringing them together to explore technical societal and career challenges through the ibm data science community site which has over 13 000 members they provide a place for data scientists to connect collaborate and empower one another ibm's data science community is a great place to engage with other practitioners and access information and resources that inspire creativity and innovation go to twimlai.com ibm community to join and when you're there and when you join you get a free month of select ibm programs on coursera so at this point i would like to introduce our panel you may recognize some of them from previous podcast interviews and if you are following along on youtube we will be dropping links to those uh in the chat but please join me in welcoming welcoming them first up is uh raid ghani raid is a professor in the machine learning department in the school of computer science uh and the heinz college of information systems and public policy at carnegie mellon university right thanks so much for joining us great i think we we fought over the mute but mute button right okay um are you introducing everyone or i am introducing everyone i just wanted to to say hi to you next up is uh solan barakas solomorocus solon is an assistant professor in the department of information science at cornell university as well as a principal researcher in the new york city lab of microsoft research and he'll be speaking on hidden assumptions behind counterfactual explanations i think i missed topic which is explainability use cases in public policy and beyond uh next up is cooch barney kush varshney koosh is a distinguished research staff member and manager at ibm's tj watson research center focused on artificial intelligence and koosh will be speaking on model explainability as a communications challenge next up we've got alyssa labs genova alyssa is the cto of a stealth startup and former cto in residence at the allen institute for artificial intelligence and she'll be speaking on stakeholder driven explainability and last but certainly not least is hema lakaraju hima is an assistant professor at harvard university with appointments in the business school and the department of computer science and she'll be speaking on adversarial attacks misleading explanations and solutions so a quick note about our format today each panelist will be presenting for about five to eight minutes on their topic following these presentations we'll be opening up the floor for discussion and audience questions and it's really my sincere hope for this session as with all of our discussion events that you the audience drive uh a good part of our discussion today we should have about 40 minutes for our for the audience driven segment of today's event so please be sure to note your questions in the youtube chat so that i can relay them to our panelists finally we are looking forward to bringing you more discussions like this on a wide range of topics to be notified when we schedule future discussions subscribe to our newsletter at twimlai.com newsletter so let's get started uh we'll be kicking things off with rayed ghani thank you sam and thanks to all of you who are here um hopefully this will be interesting in in discussion so i'm gonna sort of quickly frame some of my thoughts and then um leave a lot more time for for the more interesting conversations um a lot of so as sam mentioned i met carnegie mellon in the public policy college and in computer science machine learning department and a lot of the work that been focused on the last several years is really looking at how do we build um pick your favorite buzzword machine learning ai whatever is trendy and ibm wants to sell today uh and in human collaborative systems right how do we build systems that work together with humans and machines to solve social and policy problems that end up with um fair and equitable outcomes for people so that's kind of the framing um of a lot of the projects that that i work on in healthcare in education criminal justice policing economic development workforce and over the last many many years of working on this what keeps coming up is this issue of explainability um across different types of problems across different types of users um and if you sort of you know a lot of the research has been done we think of sort of explainability as this monolith concept where we're trying to explain what some mlai model is doing um in reality in actual you know you're trying to solve a specific problem you've got a lot of different use cases like there's a whole you know taxonomy of use cases that comes up um especially into the public policy problem right so most public policy problems are luckily not automated um but they're also not just humans because you know humans are not the greatest decision makers um neither are machines so most public policy problems you're sort of working interactively collaboratively and so what we've been doing over the last few years is is looking at you know specific use cases that show up and developing a taxonomy um that consists of what the use case is for each use case for explainability what the users are who the users are and based on that what these methods need to do and the idea behind that is that we can use this taxonomy to evaluate the applicability of existing methods so when a new method comes up and we can kind of figure out does it even apply to to these types of problems uh and if it doesn't what are the gaps that we need to focus on to kind of give the structure to to this type of work so i'll kind of give you some some of these use cases and they're kind of five big big use cases that involve different types of of users and goals for each of them so the first one really is um people who build machine learning ai data science systems or people like us who are building them the use case there is really debugging um where i'm building a system and i before i even show it to somebody else i want a sanity check i want to debug and a really common example of detecting leakage right we build a model and we look at some sort of explainability thing and we see ah there's a feature that's really important and it turns out it's a proxy for the outcome variable that that were and and we're not going to show this model to somebody because it would be really embarrassing uh we've all been there um but if we didn't have so that's a very simple use case where we want to debug the model that we're building um and and and we have to kind of deal with that a second use case is we actually need to instill trust and credibility to the the people who are making decisions with the model we need to kind of make sure they they trust it in order for them to use it and there the users are the policy makers often or the people who are who are managing teams who are going to use this so they need to trust it and in order for them to trust it um it needs to sort of give them um comfort um and and and some sort of understanding of what it's doing and why it's doing it it doesn't need to do things around you know individual explainability and local explainability it's kind of more of a macro and if they don't trust the reason we want to make sure we do that is they don't trust the system they're not going to use the system it's not going to use the system who cares how accurate it is it's not going to have any impact um so it's a waste of time building one that that that doesn't get used right um a third use case is in order to improve the performance of the system we need expandability and let me sort of describe what that means right um a lot of these systems for example one of the systems we were building a couple of years back was working with the health department and working with a hospital system to it to identify which hiv positive patients are not gonna come back for their next appointment if they don't come back and get their prescription um they uh become likely to spread uh and and and that's a problem and so in this case there's a person in the middle who was getting a recommendation saying this person is unlikely to come back um now in the in a lot of these real problems the the computer is mostly wrong it's better than a human it's better than random but it's still 30 40 let's say correct and so in that case we want the the person who's taking this recommendation to sometimes override the system and and sometimes agree with the system and ideally an explanation can help them figure out oh this explanation sounds a little fishy i think you're picking if it says you know this person is not likely to come back because they were born on a monday uh i don't don't really buy that that might be accurate to what the model thinks but the model is probably wrong in this case so if i can use this type of hints to override the model then the performance of the overall system increases and that's a different use case than the other two the fourth use case is very similar the same user but different goals the first use case is i i'm predicting what if somebody is going to come back to the doctor or not but my goal is not the prediction my goal is to assign them one of many interventions are they not going to come back because of transportation issues are they not going to come back because of um they they forget or because they don't think it's helping them or because they're just tired of coming back each of those reasons for not coming back has a different intervention and we need to help this person in the middle uh figure out which intervention is going to be most effective and an explanation is a really good way of doing that but the method you need to develop for that explanation is very different than the one for debugging because it's a different type of user and for trust because we're not just asking them to trust the system we're asking them to to assign an intervention um a fifth use case is recourse and that's been one that's been studied quite a bit where i'm not trying to to to help somebody make a decision i'm telling you you were denied a loan because of a b and c and if you could only change b if you can increase your income then we will be able to give you the loan and that's um the user there is the person who's being affected by the the ml decision um and and we need to develop something that can you know that's actually uh that can help with with the recourse there are other users as well these are the big ones other ones you know for example looking at bias and fairness can an explanation help us better understand what's going on so we can be more confident about fairness things we can figure out another couple things around how useful is this particular data source um but the idea is sort of behind these five use cases that each of these use case has different set of users different goals um and that means different methods for explainability need to be developed for each of those each of those use cases the problem is today a lot of the work is you know either very generic where the people are looking at sort of very you know it's not tested on mostly real data it's sort of standardized you know some standard data set it's already tested on real problems it's made up problems it's not tested on real people it's mostly mechanical turkers who are kind of real but they're not a proxy for social workers physicians um you know um anybody sort of doing real things and then they're not tested on real metrics they're tested on some useless metric like auc which nobody cares about in the real world um i'm being provocative on purpose uh just to poke uh people who are gonna then complain and hopefully the the idea behind doing this is that what we're trying to do is is set up um the this taxonomy so that people who are working in this area they start with a real problem if they have a real problem a real use case figure out which goal you're trying to achieve and build explanation methods for that goal or if you're starting from a method identify which use cases it fit into and then partner with people who have that type of a problem work with people problems and data that match that and then sort of hopefully set up a test bed where we can collaborately work on these types of problems because these are these require real partnerships these require access to real problems real people and real data and it's not a give me some data and work on it it requires access to people interactively and so i think what we need to do here is to sort of build a collaborative um setup where people interested in these areas have access to these things um and in return they have to build something that is useful for the organizations that are helping them so yeah so that's kind of my quick uh overview um and looking forward to hearing from from the other people and having a good discussion awesome uh right there is a a bunch of interest in learning more about your use case taxonomy is there a uh paper or a link that we can we are actually writing something up right now so maybe it's it's it's it's almost done so we'll post that in the probably next couple of weeks okay so uh raid will keep us posted and we will do our best to keep you posted uh following this event we will be posting up a a notes page with a bunch of resources and um we'll send you an update as to where to find that and just keep an eye on that all right uh next up is salon baroques solar thanks sam um i'm really pleased to be able to follow on from ryan's excellent taxonomy um what i'll talk about today um i think most closely relates to what ray had characterized as this recourse type of explanation um and i'll present some ideas from joint work that i've done with andrew selps who's a professor in the law school at ucla and manish raghavan who's a cs phd student at cornell um so in particular um we looked at this kind of increasingly popular i style explanations known as counterfactual explanations um and for those who are unfamiliar this is explaining a decision by sort of saying along the lines of what you've described you know what would have to have been different for you to have achieved a different outcome so if only you had made you know two thousand dollars more maybe you would have been able to uh obtain this loan style an explanation of that style um and although this is a sort of recent wave of work within computer science this style explanation has a much longer history actually in credit where at least in the united states there are um a set of laws that require that when people are subject to an adverse decision like a loan rejection uh that creditors actually need to give explanation for why that person was denied the loan um and these explanations take the style of offering so-called principal reasons so specific reasons for the denial um and that might be not specifically that you should have made more money but perhaps that you know your income was too low or maybe that you're you know you weren't at your job long enough for things of this sort um this particular style of explanation has become very popular for a number of reasons so first of all it sort of seems to sidestep what had at least for a couple years seemed like a potential catastrophe for machine learning where legal requirements for explanations were potentially seen as hard constraints on model complexity and if model complexity is one of the sources of model performance there seems to be a kind of strong tension between the requirement to explain and the possibility of developing highly accurate models and with this style explanation there is no constraint on model complexity because the model itself can be arbitrarily complex all that really matters is they'll be able to provide these specific kinds of explanations similarly there's a sense in which these styles of explanations can help protect intellectual property and limit gaming because there's no disclosure of the of the underlying model and so instead they sort of piecemeal small explanations of a particular decision not the model overall it could also provide what feels like a justification for an adverse decision or in the case of counterfactual explanations uh concrete instructions for what you would have to do to achieve a different outcome so not just an explanation but an explanation that gives you instructions for trying to change your your chances of succeeding in the future and finally it can automate the process of generating these explanations so rather than being something that someone has to work out manually there's now an automatic procedure that people using machine learning models can employ to produce these explanations and in some cases comply with the law and so in this paper we try to point out that while this is a very attractive proposition and there are good reasons to actually consider this given these desirable properties that there are a number of challenges here that actually make it difficult to work in practice and that we really need to think carefully about these challenges before we adopt this style of explanation with the expectation that it will serve the kinds of goals i just mentioned so i'll just talk a little bit um about those assumptions um and maybe touch on the end um some ways the law thinks about this as well okay um so the most basic point is that um these counterfactual explanations or principle reason explanations are essentially saying that if some future value had been different then you would have received a different outcome um and what's important to recognize here is that there's infinite number of ways to possibly generate explanations of that sort there's no right answer right you can give many possible explanations of that type which just happened to highlight a different feature that would need to change and so there's a lot of latitude on the part of the decision maker when it comes to which features to actually highlight and which ones to kind of instruct people to change and here the thinking is that we might want to choose those features which are sort of easiest for the person to change and in fact many of the papers that have proposed um this this kind of counterfactual explanation method have tended to adopt the heuristic which tries to find features that would be easiest to change um but the challenge here is that often the kind of feature values would have to change don't necessarily map on to discrete actions in the real world um so for example along the lines of the example i gave earlier let's say that a credit decision an adverse credit decision is explained by saying that you need to increase your income in order to obtain the loan in the future um but actually there's sort of you know multiple real world ways what you can go about increasing your income so one way might be getting a new higher paying job um or another way might be that you sort of stay on your job but wait for a raise you know work hard in order to get a raise the problem is that the model itself might actually be looking not just at your income but actually might also be looking at length of employment and in fact this is a common feature in credit models and it turns out that the actions i just described affect not only your income but also simultaneously affect your length of employment um and so on the one hand what this is showing is that somehow sometimes features might not be independent and explanations that choose to highlight one feature rather than another might not recognize that these features are in fact related to each other but more than that what this is showing is that in the real world the actions available for people to take to change a particular feature might not narrowly line up with the specific specific feature that is being highlighted and so this is actually a pretty tricky problem in part because we often try to kind of give these discrete subset of features that are kind of most easy for people to change without thinking about this possibility the second challenge we talk about in this paper is that again we might think that uh you know it would make most sense to tell people to change features which in some sense we think of as being the easiest change for them to make um but the way that we try to determine what would be easiest is often by just looking at the training data itself uh so we say you know it seems to be the case that um if we normalize the features these are the dimensions along which you in your particular case would have to make the smallest amount of movement in order to get to the other side of the decision boundary for example um but the problem here is that we're sort of normalizing these features according to the distribution of those features and the training data not according to actually in the real world what would make certain feature changes more or less costly or difficult and i think what's particularly challenging here is that although we might have some intuitions for certain kinds of feature changes that would be more difficult than others so for example getting a new job might seem to be more challenging than getting a raise perhaps in practice those costs of changes might vary considerably by person right so it's not just that these costs are sort of fixed and the same across all people that you might be subjecting to this decision costs are going to be relative given people's individual circumstances and it's going to be very difficult for any explanation to take that into account um and so while we might be uh sort of aiming to give explanations which seem easiest for a particular person also what we're doing is sort of giving explanations that seem easiest in general for the population even though we recognize probably if we think about it that circumstances and costs are going to vary dramatically by person and this again is a very serious challenge the third point we we raise is that often the features that figure into a model might be relevant not just to the decision that is being explained to the particular person but to many other decisions that person might care about so for example income might be an input to a credit model but income is often something that's relevant to many other domains in your life and so you might think that trying to explain to someone that in this kind of hypothetical example a credit model might say that actually you would be successful in your application had you earned less money right this is like unlikely to really exist it's probably the case that most people can strain their model not to learn such a lesson but if it were the case that a credit model expected you to earn less in order to be effective in your application um it might suggest that this is the explanation for what you receive you know earn a thousand dollars less and then you would have been successful the problem this of course is that income as i was mentioning is relevant to many other goals people might have in their life so it would often be irrational for a person to sort of narrowly choose to make less money in order to get credit even though they know that income is valuable with respect to many other goals they have in their in their life and so choosing what explanations to give really much might be challenging when we don't have a full view into the way that these features actually are important to other goals and other decisions in people's lives and the final point we raise um has to do with monotonicity as well as a number of other model properties but i'll focus on monotonicity so here um one of the challenges that you might say like okay make more money uh stay in your job a longer amount of time and so on and so you give some directionality for the chains that someone should be making and sometimes you even give a specific amount of change they should be making um and i think the kind of intuitive understanding such an explanation will be that if i'm incrementally advancing the value of this feature toward the goal that you've given me i expect that my chances of success are going to increase but if it's not monotonic it's very possible that actually as you kind of increase the value of this feature maybe your chances get worse right and similarly it's impossible to imagine that people struggle to kind of meet the mark that you've given them or maybe they even overshoot the mark because it's not possible to so precisely control the value of a certain feature and if they overshoot the mark maybe actually that's going to be worse for them than if they had you know kind of undershot it or something like that and so here um in the absence of these kind of monotonicity constraints uh you might have people engaging what seems like the rational behavior that you've instructed them to take but they end up being in a worse position than than they would have been have they may taken no action at all um and so in this paper which i hope there's a link to i think there is yep uh we try to explain how these are quite serious challenges or being asked to make give explanations of these decisions we'll lack the necessary information to take these kind of factors into account but there's no easiest fix for these that there's no easy way to avoid them and i'm happy hopefully to talk more about that in the q a thanks a lot yes all right um okay yeah thanks sam and uh yeah to start off um let me just say that i'm gonna repeat a few things that reid said but maybe with a different perspective so firstly why is trust becoming a top priority for companies so it's because there's an increasing need to deal with with more and more regulation companies are having to deal with increased and complexity of their machine learning deployments they want to maintain their brand reputation and in the current moment they're also focusing on social justice but when i talk about trust what is trust uh what does it take for someone or something to be trustworthy um so think about it for a moment everyone what what does trust mean to you so it turns out that in the organizational management literature um they've identified four attributes of being trustworthy um the first is being competent so you do the thing that you're supposed to do well the second is being reliable so that competence sticks around in different conditions and so forth the third attribute is being open intimate or communicative and the fourth is being selfless so working towards goals that go beyond yourself and uh all three all four of these attributes are more or less the same whether you're talking about trustworthy people or trustworthy machines um so for machine learning systems um what does that mean so we want high accuracy that's going to contribute to the competence we want fairness as well as robustness to distribution shifts in adversarial attacks and that leads to the reliability we want interpretable and explainable machine learning systems along with intent transparency achieved by mechanisms such as fact sheets for the openness and finally we want machine learning to be developed hand-in-hand with its use for uplifting humanity so one of ibm's early ceos thomas j watson senior had this quote that the toughest thing about the power of trust is that it's very difficult to build and very easy to destroy and i'm not going to talk about destroying trust here but let's i mean think about why is it difficult to build up the trust especially for a machine learning system right um so focusing on that third attribute of trustworthiness which was um the openness and communicativity so what we need is actually a strong relationship between the human and the machine and we're really dealing with a communication problem aimed at some sort of mutual understanding that we're trying to develop so we want the machine to understand us and for us to understand the machine and explainability helps us to understand the machine um and as right was saying i mean there's different use cases different personas um and everything has different needs there's many ways to explain and one size doesn't fit all so repeating a couple of his examples right so if you're an affected user about whom decisions are being made you're going to care about your own decision and so local post-hoc explanations tend to work well and the sort of uh contrastive explanations that swollen talked about um are relevant here but regulators are in a different words right so they need to understand the entire system and not leave opportunities for corner cases to wreak havoc so they would actually prefer to work with directly interpretable global models um so that uh everything is apparent in front of them and this is the approach that our group at ibm research actually took in winning the fico explainable machine learning challenge the direct interpretable approach and uh more importantly and most importantly i would say it's the decision makers themselves who are being supported by the machine so folks like doctors loan officers judges and so forth that have to understand how it is that the machine predictions came about so that they can assimilate those predictions with their own independent assessment to reach a final decision and it's the need for all these different ways of explaining which is why our group has created the open source ai explainability 360 toolkit as well so coming back to this decision maker persona so we should always remember that we're dealing with a communication problem okay so the individual accuracy of the machine learning model is only one part of the overall accuracy of the decision making system which involves both the machine and the human decision maker so in some of our past work we set up this explainability problem abstractly as a two node distributed detection problem with the machine communicating to the human right so we mathematically proved using turn off information that more explanation yields better overall system accuracy so this so-called accuracy explainability trade-off that some people refer to is actually false right so um in some even more recent work um we're we've been looking at things again abstractly through this sure enough information and have been able to show that fairness can be actually improved by giving more explanation for members of unprivileged groups this leads me to the final point that i want to make which is that the community has started thinking in terms of different worlds or different spaces so there's a construct space which is a pristine world free of biases and then we have the observed space in which we get the data and it can have many different biases and lastly we have a prediction space which is where machine learning model outputs live in um and most of the work that has that kind of deals with the first two attributes of trust including accuracy fairness and robustness try to make sure that the mappings between these spaces are mitigated or defended against biases but i would argue that there is a fourth space that i would add to the end and that's a perceived space of the human decision maker so the receiver of the model outputs is a person and that person has inherent cognitive biases and limitations and the information that they receive or receive from the machine is collared by anchoring bias confirmation bias weak evidence bias and so on and we actually should be compensating for it as we work on explainable machine learning so this is a sort of last mile communication problem bridging the channel between the machine output and the human perception and for intimacy this third attribute of trustworthiness we also have to go beyond looking at explanation it's simply a math problem and really work on how to get machines to speak to people in our language to speak to us and the language of causality might help in this regard um and to just to wrap up let me just say again repeating some things that that ray had mentioned that um having this broad-based perspective that touches on different ways of explaining different attributes of trust in different spaces needs to also be coupled with a tight loop of serving non-profit and for-profit partners conducting fundamental research and creating open source toolkits um because i think that really is the best way to make progress towards machine learning systems that we as people can really work with am i still muted was i still muted wow uh so i was saying kush uh that thank you for adding that i think the idea the perceived space is an interesting one and uh going beyond explanations as math as well and i think both of those are a perfect cue and tee up for the topic that alissa is going to be uh elaborating on and that is a stakeholder driven approach to explainability alrighty melissa the opportunity to participate in the discussion today it's a pleasure to share this virtual stage with you and other distinguished members of ai community and uh let me preface say that a lot of the things that i would like to say are very much uh dovetailing on what raid and kosh already said so we're very well aligned even though we didn't really prepare so my name is alyssa lapinova i'm the co-founder and ceo of a company currently in stealth that builds monitoring and debugging tools for enterprise ai applications and today i'll talk about why it is crucial to define your stakeholders before you embark on designing and implementing explainability tools for enterprise ai applications so a little bit for about where i'm coming from throughout my career i've been on a mission to build bridges between machine learning scientists engineers and executives i did this in numerous capacities at amazon and the allen institute for ai now doing this at the startup i spent essentially my days talking to engineers scientists and executives about deploying models to production and i encounter challenges with model explainability deployments very often and as you've heard from other panelists today there are many nuances to model explainability endeavors let's focus on how to think about stakeholder personas when you're embarking on an explainability endeavor and which personas can benefit from the techniques available to us today so who are these stakeholders well there are many types and the ones that come up more often in enterprise projects are builders researchers executives regulators domain experts and end users other panelists have touched on some of these stakeholders already a great example of a taxonomy of how to think about these different stakeholders is available in a paper called explainable machine learning and deployment by umang pot and his colleagues at cambridge and partnership on ai today i will briefly focus on three stakeholder personas that come up very often in enterprise ai and these are builders regulators and domain experts these three groups are broadly compatible they all deal with ai deployment in the wild however they present very different requirements for explanations moreover the explainability technology that is available today offers very different degrees of benefits and limitations to each of these stakeholder requirements so let's dive into each of these stakeholders very briefly first builders um builders are the data scientists and engineers building ai applications and in organizations that what that's one way to define them they look to explanations to aid them in validating testing debugging and monitoring models and predictions there are many open source and enterprise explainability tools built specifically for builder personas in mind builder stakeholders in mind uh the ibm 360 fairness 360 and explainability 360 was already mentioned onenotation in the existing techniques builder stakeholders that comes up very often is that explainability techniques typically have really high computational requirements and are infeasible to deploy at scale in real time systems so when you're building explainability tools for builders uh one thing to keep in mind is that you need to design to avoid real-time requirements however that's just one limitation and based on various industry surveys we see that organizations are quite successful at deploying explainability tools for these stakeholders so if you are considering an explainability initiative at your organization the builder stakeholders are a good starting point so the next group i mentioned of stakeholders are regulators regulators are either internal or external parties involved in audit and compliance activities for ai applications in highly regulated industries such as finance insurance and health in the many cases regulators look to explain ability to understand whether a model presents bias towards a specific group this requirement however poses a challenge because explainability techniques so far have proven unreliable in explaining models fairness and bias this limitation is really well demonstrated in a recent paper titled you shouldn't trust me very fitting by bori de modev and his cambridge colleagues so that's something to keep in mind when you're deploying explainability for regulators and stakeholders another challenge for this group of stakeholders uh is that techniques that rely on input perturbations are extremely vulnerable to adversarial attacks this limitation has been exposed by himalaya and her colleagues and the paper called fulling lyman chap hannah is here today and i'm excited to hear directly from her on this topic so given these limitations uh specifically for regulator uh stakeholders an organization might decide that ex that an explainability project for the stakeholders is not a feasible endeavor at the moment there are however many excellent uh research initiatives to address both of these shortcomings and i'm hopeful that we'll see solutions soon so the final group i'll talk about briefly are domain experts domain experts are individuals who are tasked with auditing the model behavior and ensuring that predictions align with expert intuition these stakeholders are most prevalent in human and loop ai applications as rayed mentioned earlier for domain experts a core challenge with explainability techniques that i see is that most uh explainability techniques do not have causality and uncertainty underpinnings this limitation makes it really hard to align explanations uh produced by explainability techniques with human intuition because it's hard for humans to think without causality and uncertainty so when building explanations uh for domain experts it's important to educate them about the lack of causality and as as far as uncertainty goes there are some emerging techniques such as clue that incorporates uncertainty you can check out a paper uh called getting a clue very fitting by javier antoran for more details on that technique and i'm hopeful they will see more explainability techniques that are intuitive and have uh causality and uncertainty underpinnings so in conclusion we live in an age of tremendous progress and ai we have incredible researchers uh working on moving the state of the art and explainability tools and techniques are being developed new ones every month however i think there's a disconnect between what existing techniques are capable of and what the practitioners require so in order for the state of the art to move in the right direction i believe organizations which are implementing explainability techniques internally need to define stakeholders carefully outline requirements and identify limitations with existing explainability techniques these requirements and limitations then can be shared with the research community and by facilitating such community engagements we can push the state of the art forward and better address the problems that are faced by practitioners today and i'm excited to talk more about this in the q a thank you awesome thanks so much alyssa i think that was a great uh great talk and also a great uh summary i think of some of what we've talked about thus far um and as well a great introduction to hima's uh talk who is up next yeah thank you sam hi everyone thank you so much for having me here i'm super excited to be part of this panel with other amazing uh guests here and i'm so glad that i'm going after alicia because she's given as sam pointed out a very nice introduction to some of the problems that i'll be uh talking about today uh so over the past year or so me and my group have been spending a lot of time thinking about understanding the vulnerabilities of post hoc explanation techniques and how these vulnerabilities can be exploited by adversarial entities to generate misleading explanations and how to mitigate these vulnerabilities alongside these we are also thinking a lot about what are the different ways in which domain experts in different domains might actually interpret explanations and does that match a designer's understanding of how an explanation would be interpreted so i'll just give a quick summary of some of the recent work that we did in this space and i think the links that i've already provided to sam should hopefully be up and you can know more about these topics through those links and documents okay so let me start with uh one of the pieces that alicia pointed out which is a lot of the explanations that we have today are not necessarily causal and they're counter factual that really affects how domain experts think about explanations and there is a risk that the explanations might mislead to trust a model that they should have not trusted to begin with right uh is this the case or are domain experts really understanding the models or the explanations the way they are supposed to be understood we were very curious about this question and we did like a user study with law school students about like 50 to 60 of them what we did at a high level was we constructed a classifier which is a rule based classifier which uses race and gender as predominant attributes when making decisions as to if somebody should be released on bail or not right so clearly we explicitly tried to design a biased classifier and then what we did was we constructed an explanation which is also the same kind of rule based model which means the functional forms are the same between the explanation and the actual biased classifier and this rule based explanation while it mimics the predictions of the biased classifier it basically visibly removes race and gender as the features in the explanation so potentially we had replaced those race and gender features with their correlates which means the explanation is doing exactly the same thing as what the model is doing in terms of predictions it's just that visibly the explanation does not show features like race and gender thus making it look innocuous then what we did was we recruited these 50 to 60 law school students and we basically randomly assigned each candidate or each user either to this actual true rule-based classifier which has race and gender in it and on the other side we assigned some users to the explanation which looks innocuous but basically does the same thing as the classifier right and when we ask these people a question which reads like here is an explanation provided by state of the art machine learning uh looking at this explanation would you trust the underlying model enough to deploy it uh when we asked this question to them of course people who saw the true bias classifier with race and gender as predominant attributes said this is not trustworthy we don't want to deploy it anywhere whereas people who looked at the explanation where race and gender were removed and in fact we had also added a couple of other attributes like prior arrests and so on which were often things that domain experts thought should uh you know appear in these kinds of models when making pain decisions in that case the overwhelming majority said oh this looks innocuous this seem to have a high accuracy this is mimicking the underlying model well so yeah we can trust it right so the simple kind of visual distortion has already confused people into trusting and potentially agreeing to deploy a model that should not have been deployed in the first place and the issue is as alicia pointed out uh that explanations are not causal though at least most of the explanations generated by techniques today are not causal and that needs to be communicated appropriately to the end users uh domain experts because for them the notion of explanation seems to be strongly coupled with thinking causally so that is one piece and then the second piece that i'm super excited about which alisha again uh give a quick uh peek into which is uh we also thought a lot about designing adversarial attacks for existing post hoc explanation techniques for example i'm sure a lot of family a lot of you are already familiar with techniques like lime and shaft uh so one of our works basically deals with how can an adversary attack methods like lyme and chap in order to generate explanations which uh look innocuous but actually they kind of are misleading so they are the explanation is kind of design the attack is designed to fool the explanation method to generate a misleading explanation and at a very high level the intuition for that for this work is also comes from some of the vulnerabilities of these methods the main one being methods like lime and sharp rely on what is called as perturbations which means uh methods like lyman shap typically try to explain individual predictions of a classifier so what these methods do is you pick a point that you're trying to explain perturb the point generate a bunch of instances and then build a linear model for example on top of this which is interpretable enough so that you can read the coefficients of the features and understand which feature is important and by how much and so on so that's the rough method that they use what we found during some of the experimentation was that these perturbations that are being generated by these methods are actually out of sample they don't correspond to the distribution of the actual data instances which means an adversary can potentially exploit this and basically design a simple wrapper which does the following to fool this exploration method what could the adversary do the adversary could basically say that if a data point is likely to have come from the distribution of the data then uh you know do something unfair with it like for example maybe use race or gender or other sensitive attribute and determining its uh label right or its prediction uh on the other hand if the data point is a perturbation which is kind of out of sample then what you do is you basically try to keep it very fair or like you know use some very innocuous attribute to uh give its prediction by doing this what happens is you are you are creating a classifier that is very fair on the perturbed points but very unfair on the actual instances since lime and sha rely a lot on perturbations to generate explanations they kind of this kind of drowns out the the fact that the model is being unfair towards the actual data instances drowns out completely and what you see in the explanation is the innocuous variables that are being used to uh uh you know make predictions on uh the perturbations or the perturbed data points so that's another very simple attack which breaks these kinds of methods and lastly we have also been thinking a lot about solutions to these problems and recently we have another work at icml where we basically uh thought about a rather you know proposed to bring ideas from adversarially robust classification are two explanation methods and use those ideas and of course adversarial training and so on to come up with explanations which are generally robust to shifts in the data and are also more stable so these are two problems as kush mentioned uh for the current explanation techniques to alleviate those we also proposed using uh adversarially robust classification techniques so with that i think i would like to stop there and i'll give it back to sam alrighty thanks so much ema all right uh so with that we are ready to transition to the discussion segment of our broadcast today please take this opportunity to get your questions into the chat you're probably watching this on youtube uh if you would take a moment to hit the like and subscribe button down below uh we would certainly appreciate that and uh to my fellow panelists this is an open discussion you're welcome to ask questions as well of your fellow panelists i will be relaying questions in from our audience and to get things started i wanted to maybe uh maybe just kind of recap a little bit i think the the key takeaway for me from the kind of the sum total of all of your your talks was that you know while it can be tempting to kind of take an explainability algorithm off the shelf and think we're just going to sprinkle it on our algorithm uh there's a lot of work and thinking that needs to go into making this process uh work in the real world um we need to be thinking about the use cases which raid uh provided a taxonomy for we need to be thinking about who the algorithms are for and uh alyssa's personas were very interesting there several of you touched on the forms of these explanations whether causal or otherwise the receiver of the explanations came up a couple of times the personas again as well as this idea of the perception so getting in the head of the person who's seeing the explanations and of course the issues around uh trust and vulnerability trust came up several times and of course hima uh focused on that um yeah one question that comes up quite a bit in this conversation around explainability is the uh idea of explainability versus interpretability so kind of an inherent interpretability of uh the models versus explain about it versus being able to explain them um and uh i'd love for you all to uh speak to to that issue there's a bunch of questions there your questions like should we be you know when and where should we be using models that aren't explainable or aren't interpretable inherently but i imagine this is something that you all have takes on so i'll just go around the horn with this one starting with uhima oh great uh thanks sam so i remember having this discussion with you it's like a deja vu but uh yeah so uh yes i think i mean both interpretability and explainability today clearly have a place in our literature because we see you know multiple research articles and papers being published on both these topics and i think my take on it and i would love to hear what others are thinking about this is since we are already i think the stock itself kind of exposed a bunch of you know vulnerabilities or things that we need to be mindful of when thinking about typically post hoc explanations right so because you know we are not exactly sure that uh current algorithms are or rather we are sure that most of the current algorithms are generating more correlational explanations and not causal ones whereas people are interpreting an explanation to be causal and not correlational so there are a bunch of different kinds of gaps uh there so if given a chance and if your domain is permitting you to come up with an interpretable model from scratch and you have the infrastructure to do so you have the data to do so your model is accurate as well as interpretable that is probably the best and you know most safest route to go on the other hand that is also probably somewhat idealistic given the various you know like different aspects in the setting like for example maybe you don't have enough data to train a model that is accurate or you're seeing that there is a trade-off between accuracy and interpretability and you're coming up with a complex model because you can't afford to not lose the accuracy right so in all and in some cases what you have is like probably a company or you know another organization has handed to you this black box that for whatever reason you know a doctor in a hospital is being asked to use because you know the organization has made that decision for instance and in this case also you will not have access to the internals of the black box and most likely the black boxes might also not be interpretable on its own right so in all these cases where you can't effort to build or train your interpretable models from scratch then the only option that is left with us is to fall back to post hoc explanations any reactions to that unmute if you've got something to tag on i i'd like to plus one this a few times and uh one thing that i might may add is when working with organizations and practitioners that are you know kind of asking themselves the same question uh very frequently what's what i feel is missing is exactly what image had been written up in a very accessible manner again as some kind of guide of how to think about these trade-offs i feel like a lot of people start with this question and go on kind of this endeavor reinventing the wheel and looking for the solution and this happens over and over and every organization i think it will be beneficial to have a resource for that i mean i think right so one one question is we we talk about these hypothetical trade-offs yet we've never really there's no good study that actually looks at a real problem real data looks at 20 problems and says here is the actual trade-off every paper that i've seen on sparse models is using some dummy data set on some fake problem and some fake metric so i think before we sort of even talk about a trade-off um we kind of need to figure out whether there is one and and if there is one then we should figure out how to balance it and talk about like right now we don't we we're sort of assuming that there exists some sort of a trade-off um and the other sort of misconception that i run into a lot is a lot of at least i i don't trust my ability to interpret models um even if i see a regularized regression with six features i have no idea how to play the menthol gymnastics of well this one is controlling for that and this one is controlling for that i have no idea even if i see a decision tree with you know more than more than three deep with more than six seven eight variables it's very hard to really understand what's going on and then how it generalizes over time like you can do it for one time period but but so so i think i think it's it's kind of hard i would claim no actual practical model using even the basic machine learning is actually interpretable uh so you need tools to give people to help them understand what it's going to do when is it going to do it what are the you know to this is your point of confidence around it uncertainty all sorts of things and it's a toolkit right it's not a thing that you said feature importance is not an explanation it's feature importances uh that's not you know so i think i think we kind of need a little bit of sort of let's step back and think about what are we trying to do uh we've got a lot of research that people spend years doing this maybe we need to kind of think about okay you know this problem is solved let's stop working on this let's go to this problem um so there's a kind of thing i think it's we sort of need to think about what do we need to achieve in these problems and and and then kind of focus the research on on filling those gaps right i was just go sorry sam i was going to say that it was funny because what ryan said is exactly what um happened when a few colleagues and i a couple years ago were trying to like look for papers that sort of showed this trade off empirically uh because it's sort of taken for granted and i think there is some basis in reality right like practitioners are certainly encountering this trade-off but the idea that there's like a formal proof that shows this is like not the case and there isn't any systematic empirical study that shows it which is very surprising um and i think um one like one of the challenges is like yeah having real cases to do this on but another point that i think i admit that i just want to emphasize is like there also isn't a clear definition of interpretability um so you know we have like again these heuristics that say like okay sparse models simplicity some measure simplicity maybe decision trees these things like strike us as being interpretable but again like there isn't some formal way to actually define this and often we use some measure of sparsity or simplicity but that's just a choice you know we could do it some other way and it was very surprising to kind of discover that even interpretability which has a much longer history actually and explainability doesn't really have that as well um so i think that really does suggest that like the way forward here is to think concretely about real world cases where people have very specific domain specific yeah domain specific concerns and then try to do some of this empirical evaluation because to answer this in the abstract just basically seems impossible yeah speaking of the the concerns of uh stakeholders uh there's a question from uh the audience do stakeholders have reasonable explanations when it comes to explainability uh alyssa you spoke a bit about kind of the stakeholder perspective and your experience building these systems for folks what's your take on that do stakeholders have an explanation for reasonable expectations when it comes to explainability i would say confidently no partly because what we just heard from both sullen and reid that you know the definition is even clear even if you're talking about a very simple model in your model if it takes in thousands of features even hundreds of features what does an explanation mean there how can you kind of comprehend with your human mind uh what's happening and what's important in in the prediction of that model so i think the expectations are not very clear and it's in part the responsibility of you know us as a community to set these expectations and i think media kind of makes it slightly difficult because whenever there's a new explainability technique that's super exciting we see articles saying like whoa everything is explainable hooray let's go and that makes the whole expectation setting process quite a bit of a challenge i'd love to hear how others in the panel think of how we can go about setting the right expectations chris any takes on that i think you're frozen uh anyone else on the panel have a take on that one i mean one thing to keep i think it's on the trust use case right a trivial way to get trust is to tell the user what they believe is already true that that that and you pass the trust test but that's clearly not what we want right because and i mean then you're a politician basically uh if that's and and we can do that i mean the other so that coupled with what are we comparing these systems against right what is our baseline in terms of explainability is it the human decision maker that we're comparing against uh of human makes a decision we asked them what did you you know why did you make the solution they give an explanation and then the computer makes the recommendation and we compare them and and again it's unclear whether in most machine learning problems a human existing baseline is a good baseline and in terms of ability i'm not sure if or explainability if that's the right baseline i think that's the other piece is um another one we do with baselines and evaluation is we we ask a person did you like the explanation i don't care if you like the explanation did you find it did you make a better decision using it in some use cases right in the recourse i don't care if you made a better decision i just want to know that if in that case the evaluation is if you modify this variable tomorrow the decision will change right so so i think even the evaluation piece in terms of each of these they're all different metrics that we have to use to kind of see how well something works and in some cases humans are you know in most prediction problems and this is something you know i think in in in the podcast we had over a year ago or so we talked about it if we're dealing with kind of the classification problems humans are good at some of those you know image classification and text and on all those things and so an explanation can be evaluated by a human is yeah i kind of agree with you in prediction humans are pretty bad and that's the reason we're using ml is because humans are pretty bad we're not using it to scale the computer we're using scale the human we're using it to improve over the human in which case the explanation is human may not be able to judge the explanation as being good or bad it's the decision they make that we judge the system by is this human computer does the overall system do better than just the computer or just the human and i think that's a little it's a nuanced difference that you can't you can't churn through that overnight on your computer and run experiments and produce results right it's it's sort of it's a much more complicated process and i think by defining it really well you know the new people who are getting into this area the new researchers i think that's where we need to kind of help guide and like okay if you're going to work us in this area then it's going to require it's not just a data set and run models and generate results and explanations you're going to have to work with with these people all right we had a couple of questions directed at you so i'm looking for additional examples one was use cases in which automatically generated counter factual explanations were acceptable under the law uh the person thought you mentioned that and wanted some examples and the other was an example of a causal-ish explanation without an explicit causal model i think also referring to uh something you mentioned sure so um in in this paper actually that uh i drew on to present today um we talk about two different areas of law where it seems like counterfactual explanations might satisfy the requirements so recently there's been a lot of interest in the european general data protection regulations apparent uh right to an explanation and there's a lot of debate about whether that right exists and what form it would take but generally speaking um there is kind of formal guidance by regulators that gives the sense that a counterfactual explanation would likely satisfy that requirement um and so here uh the the kind of law applies quite broadly so if there's an automated decision that kind of reaches some threshold of significance then you are entitled to such an explanation um it's not yet clear where those methods are being used and what regulators will really think about them it's still early days but that's the general thinking but actually in the u.s as i mentioned there are credit laws that are now over 40 years old that have so-called adverse action notice requirements i sometimes when presenting ask has anyone ever received one of these and i learned the hard way but no one wants to admit to being rejected when applying for a loan um but if you are what you do get is a kind of statement of reasons um and they'll often say things like i mentioned you know insufficient income or maybe there's some mark against you in your credit file like bankruptcy in the past or something like that um and um and there there's some thinking that perhaps counter factual explanations can be used to generate these so-called adverse action notices right identify the specific reasons why you've been rejected the one thing i'll note very quickly is that it's funny because as i mentioned this this law this requirement is quite old and predates the use of machine learning and many of these industries and the method that the regulator has proposed for generating these reasons is completely different actually than counter factuals what they actually tell people they could do is actually find those features that for the particular person that value that feature is furthest from what the average value of that future takes the entire population or in the population that would otherwise receive credit which is almost the flip of what you would have in counterfactual explanations where it's often trying to point out the things where you're kind of nearest the easiest change you can make and what this suggests is that it's almost serving a different purpose and this maybe builds on what raid was just mentioning right like it's almost trying to get people to accept their their fate right like you were so far away from the value that future needed to take that like that's it you know that's why we've ruled you out which is a completely different way of thinking about the purpose of explanations than the way at least counterfactual explanations more recently have formulated it which is often like here's the simplest and easiest thing for you to do um and so there's some interesting questions about whether these existing laws can in fact be satisfied by this method given that actually the proposed method for many years ago is quite different awesome uh a couple of questions for you hima uh the first is is there some way to evaluate the synthetic neighborhood generated during post hoc explanations yeah that's a very good question uh it i mean so yes like not you know there is no paper or anything like that that i can immediately point to uh but yes if you're somehow able to approximate like for example let's say if your data has in finite samples and you know the complete distribution and the associated likelihood of a point being generated from that distribution then clearly you know an answer for whether a perturbation is valid or not right whether it comes from the data distribution or not but that is often not the case so we'll have to like work with finite samples and come up with approximations to figuring out the log the likelihood that a point comes from the distribution or not for which there are ways to approximate it while there is no like i can't point you to one paper there are ways to do it this way great and then the other question is uh in addition to what you said about the out of distribution points are perturbation based methods also vulnerable to less meaningful explanations for certain regions of the data set such as protected classes uh that's a good question i don't have an answer off the top of my head but i i guess perturbation method based methods do have some of these instability problems and so on as well but that could potentially cause issues for like generically uh you know kind of a very sub sub groups with very small representation in the data uh so potentially minorities could be associated with such subgroups which are much smaller much kind of less represented in the data so there is that chance but i don't have a very strong answer to that question yet but that's a great question uh let's see if we can get a quick question in for kush while his internet is somewhat stable uh you mentioned explainability increasing fairness can you elaborate on that in this a possible solution to the impossibility results in observation based fairness uh yeah so apologies again i think it's remnants from last week's tropical storm that are still affecting some infrastructure here but um uh yeah so related to that point um so what we were able to show was again talking more in an abstract sense um not necessarily on specific algorithms but um we have the general idea that the more separation that you have among the predicted labels um the easier it is to uh to perform the prediction tasks and um uh when that's different across groups let's say there's a privilege in an unprivileged group that separation is different across those groups and uh what we were able to show was that in fact adding extra explanations and also actually adding extra features for the unprivileged groups increases their separation and allows you to actually have better prediction quality for the unprivileged group so that's kind of what we had been talking about in in the work that we did and um i think this points to i mean a lot of different issues um so is i mean what are we i mean hoping to gain um so first of all we have to make sure that we're talking about a setting where uh there's a human who's the final arbiter of um of the decision because if it's a machine acting autonomously um uh this isn't really a an issue that should uh be coming up but when a machine is taking um somewhat of a decision maybe it's a soft decision and passing it on to a human um that's where um there's um there's more of a a need for uh for this sort of explanation and uh that's where uh these sort of things help right and then another question for you because can you elaborate a bit more on the link between explanations in those four spaces that you identified and in particular are you suggesting that the explanations should contain elemental components from each of those spaces yeah so just to repeat um so we had the construct space we had the observed space we had the prediction space and then a perceptual perceived space so um there's not really much to be done in terms of explainability in my opinion going from the construct space to the observed space those are kind of data um sort of questions which there's i mean many other ways to uh to look at but um once you have your observed space um so that's the datas that that you have to work with um then moving into the prediction in the perceived space again where the perceived space is where the human is going to be looking at things and making their final decision um so the human gets some observation directly from the observed space and then they get the output from the machine learning model which is in the prediction space so um there's roles for explanation in both of those sort of arrows going from the observed space to the perceived space and from the prediction space to the perceived space and um i think uh i mean there's different ways of uh of doing things as i said before so uh model explanations are clearly useful for um going from the uh the prediction space to the perceived space but um uh when you're going from the uh the observed space to the perceived space that's where data explanations are are just as useful so um i think and then i mean what are specific methods for doing this um as hema was touching on on the very first question um so i think i mean there's different ways of explaining that are more or less relevant on what you have the ability to to touch so not every uh situation allows the user to actually touch the training data or touch the model or other parts of the life cycle so um i think it's i mean the sort of holistic question what can we change what can we affect and um having the big picture lets us actually make more reasoned and sensible judgments on that right great uh so you've all to some degree touched on um you know how the the vagaries around how we measure performance of these explanations whether you know we're asking humans to rate them uh or otherwise uh interesting question here suggests you know what can we do to kind of push this forward like is there an uh an imagenet moment for explainability where we you know really nail the the benchmark uh and if so what might that be any takers on that and and for that matter what you know how close are we is there you know what are the the metrics that folks are using and uh you know what are their standard benchmarks um what are they lacking that kind of thing um so i can i mean start and uh i think the best way to start is actually to point to a paper from 2017 by bien kim and finale doshi villas because they really i mean go through different levels of how to uh to judge interpretability and explainability um uh starting with uh i mean just assuming that something simple is uh interpretable going up to some proxy measures um that you compute something and then finally um having the human uh in the loop to actually give you user studies and judgments and um as you go up this ladder i mean there's more and more um cost involved it's much harder to do but i think at the end of the day ultimately it is the final human judgment and you have to have the right population uh to be doing this judgment as well so if you want a medical diagnosis system and you uh look at mechanical checkers as your population that also isn't going to work right so um uh really the ideal situation is to get to the uh to the user population that are the true consumers in the context that uh that they're going to be making those decisions but again you have to step back as needed for uh to make progress just continuing on that i'd like to add a couple of points i think the three levels that the paper mentions and what kush is talking about is one is you just have quantitative metrics like let's say you know number of rules or a number of uh you know the features with non-zero weights in a linear model and so on so these are metrics that are like super easily computable and then the next level is proxy tasks like while like let's say if i'm building uh an application for like a judicial system right so it is hard for the system to be deployed in the setting and get tested and so on so you kind of come up with some proxy tasks like if i you know probably use a turker like would the turker be able to simulate a prediction using the explanation we have uh or not so that's the second level which is proxy tasks for the actual task and then the third level which is most difficult to execute is in deployment and actually seeing how interpretability is improving the efficiency and accuracy of the decisions but i can see like while the third level is the most ideal i can see that getting there can be extremely difficult uh you know like for example especially through academic labs and so on and you know reaching like for example courts in the us or like you know getting a hospital to deploy your system and then study what its effects are these are all kind of much more challenging things to achieve in practice so people typically resort to either the proxy evaluations or uh they go to this like quantifiable metrics like the complexity or the size of a model and things like that great uh so when you had a bit more to add on the topic of interpretability uh chime in on that you know i think i share a human's view on this it's that um i think ideally these would all be kind of context-specific evaluations and um and and what kush was describing i think is obviously the kind of perhaps like gold standard the challenge is like being able to accomplish that in practice um i mean i think the harsh reality here is that like there is no perfect answer right and i think as a field of study uh this is maybe unlike other things where you can just max out accuracy we actually have to think very carefully about the domain we're working on um and that's just the nature of the the problem here so um there's never going yeah i think what i was saying there's never going to be like a universal definition of interpretable in the same way that there's never going to be like a universally good explanation these are only things that can be defined by looking at the context of you it's about time for us to wind down so then i will take that as your kind of parting thoughts on the comments uh and i'll go around the horn and get everyone's uh kind of final word and thoughts uh on the discussion today uh right um let's see uh so so two final thoughts one is um i think like any other component of you know the overall system interpretability interpretability is a is a piece that we have to evaluate and think of in context of how does it help us improve the overall goal of the system we're building uh it's not a thing by itself just like the model is not the thing by itself and just like the data is not living by itself it's so i think that's one thing is getting stuck in interpretability with that overall context um doesn't lead to productive outcomes the second thing i think which worries me a lot about this field right now is i don't want this to become an elitist field where only rich universities that have access to people and problems and data or large companies that have uh these systems can do this research um because we're all saying it needs to be situated in context with people with problems in high-tech situations well that's that's gonna be then you become you know economists who have privileged access to data and it's a totally unfair field and we don't want to get there uh in many ways right i'm going to get spam uh hate mail now from economists uh so i think i think we have to figure out of the field how do we to your point sam of how do we create some of these prog into himal proxy tasks that are correlated with good performance on the real task so that people are not completely wasting their time and then once they pass certain tests how do we give them a collaborative infrastructure where they can test out some of these things and low risk real problems and low risk real users um i don't have an answer but but i'm hoping you know people listening to it and and colleagues here on on the panel have ideas on how to achieve that great hema okay let's see uh i think my parting thought would be that in terms of the you know research and the challenging problems that are still out there to solve explainability is like i think one of the fields that's right up there at this point so that's like a call for all the young researchers out there and anyone who is kind of thinking about oh if i should venture into it or not uh i think i just want to like encourage them to also be a part of it firstly and second thing is just to practitioners and you know folks like who are also researchers and so on i have like few kind of nuggets of you know wisdom probably that i uh got to learn through trial and error in my own research and like seeing uh you know me fail in certain experiments and so on uh one is for practitioners i would uh kind of encourage them to uh engage more actively with the researchers and like vice versa uh when designing tools that are so to say interpretable or explainable and not to take anything uh on the face value or this is also for the practitioners both as well as the researchers like if a method claims to be explainable don't just like accept that it is going to do what it is telling that it would do like be somewhat skeptic and have your own checks in place uh the checks and balances in place uh but that said i think we do have some challenges i think among through all the questions that came out that you know there is still a long way to go in terms of coming up with the ideal uh you know view of like what interpretability or explainability could be but i think we are slowly getting there hopefully through trial and error great yeah so um i think i'll kind of uh echo i mean many of the things that uh swollen right and him have already said and um kind of say i think we're at a stage where i mean we have a good set of methods uh that fit a lot of the taxonomic sort of uh i thought we were good with chris to close out uh sub categories uh did i break up you did oh am i on uh you can't hear me now yes okay great um so very quickly so that i don't uh drop again so we need that next level up um and have some sort of meta algorithms that really help us as people understand the context uh better and uh are trying to i mean figure out and help us figure out i mean what is the best approach uh that's kind of my parting thing and i won't uh go into more detail for uh so that i don't drop off again got it meta algorithms to help us figure out what is the best approach alyssa uh i think this panel is ending at a beautiful moment uh uh the the amazing researchers that are here in the room are all looking for ways to engage more with practitioners to design experiments that actually touch on real world deployments uh to figure out how to get to that third stage of evaluation of actually seeing how explanations are built are being used in the deployment and from what i see from my side talking with practitioners every day they want the same thing so i think the problem that we got to solve is how do we all work together uh and i will volunteer uh some of my time and say that i talk to a lot of organizations literally every day that's my job and if i i mean i'd love to help facilitate some of these discussions given that we have good requirements for what type of organizations can participate in moving this research forward uh and both the panelists and the listeners please feel free to reach out to me i promise i will actually follow up and i would love to connect you to practitioners and organizations to facilitate moving the state of the art forward awesome well thank you all uh for uh participating in this panel a really great discussion uh thank you to our audience as well for your amazing questions uh and thanks once again to ibm for sponsoring uh this discussion uh be sure to subscribe visit our newsletter page to sign up for future notifications and uh this video will be immediately available on youtube so uh feel free to review it for any references you missed or share it with your friends thanks so much everyone thank you thanksgiving thank you sam thank you everyone bye

Original Description

The use of machine learning in business, government, and other settings that require users to understand the model’s predictions has exploded in recent years. This growth, combined with the increased popularity of opaque ML models like deep learning, has led to the development of a thriving field of model explainability research and practice. In this panel discussion, we bring together experts and researchers to explore the current state of explainability and some of the key emerging ideas shaping the field. Each guest will share their unique perspective and contributions to thinking about model explainability in a practical way. Join us as we explore concepts like stakeholder-driven explainability, adversarial attacks on explainability methods, counterfactual explanations, legal and policy implications, and more. Apple Podcasts: https://tinyurl.com/twimlapplepodcast Spotify: https://tinyurl.com/twimlspotify Google Podcasts: https://podcasts.google.com/?feed=aHR0cHM6Ly90d2ltbGFpLmxpYnN5bi5jb20vcnNz RSS: https://twimlai.libsyn.com/rss Full episodes playlist: https://www.youtube.com/playlist?list=PLILZm3MRkvH83C46bZ4rPmB-jKWBltWkP Subscribe to our Youtube Channel: https://www.youtube.com/channel/UC7kjWIK1H8tfmFlzZO-wHMw?sub_confirmation=1 Podcast website: https://twimlai.com Sign up for our newsletter: https://twimlai.com/newsletter Check out our blog: https://twimlai.com/blog Follow us on Twitter: https://twitter.com/twimlai Follow us on Facebook: https://facebook.com/twimlai Follow us on Instagram: https://instagram.com/twimlai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from The TWIML AI Podcast with Sam Charrington · The TWIML AI Podcast with Sam Charrington · 0 of 60

← Previous Next →

Engineering Practical Machine Learning Systems with Xavier Amatriain - #3

Engineering Practical Machine Learning Systems with Xavier Amatriain - #3

The TWIML AI Podcast with Sam Charrington

How to Build Confidence as an ML Developer with Siraj Raval - #2

How to Build Confidence as an ML Developer with Siraj Raval - #2

The TWIML AI Podcast with Sam Charrington

Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1

Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1

The TWIML AI Podcast with Sam Charrington

Interactive AI, Plus Improving ML Education with Charles Isbell - #4

Interactive AI, Plus Improving ML Education with Charles Isbell - #4

The TWIML AI Podcast with Sam Charrington

Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5

Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5

The TWIML AI Podcast with Sam Charrington

Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6

Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6

The TWIML AI Podcast with Sam Charrington

Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7

Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7

The TWIML AI Podcast with Sam Charrington

Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8

Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8

The TWIML AI Podcast with Sam Charrington

Emotional AI: Teaching Computers Empathy with Pascale Fung - #9

Emotional AI: Teaching Computers Empathy with Pascale Fung - #9

The TWIML AI Podcast with Sam Charrington

Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10

Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10

The TWIML AI Podcast with Sam Charrington

Building AI Products with Hilary Mason - #11

Building AI Products with Hilary Mason - #11

The TWIML AI Podcast with Sam Charrington

Reprogramming the Human Genome with AI, w/ Brendan Frey - #12

Reprogramming the Human Genome with AI, w/ Brendan Frey - #12

The TWIML AI Podcast with Sam Charrington

Understanding Deep Neural Networks with Dr. James McCaffery - #13

Understanding Deep Neural Networks with Dr. James McCaffery - #13

The TWIML AI Podcast with Sam Charrington

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14

The TWIML AI Podcast with Sam Charrington

Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15

Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15

The TWIML AI Podcast with Sam Charrington

Machine Learning in Cybersecurity with Evan Wright - #16

Machine Learning in Cybersecurity with Evan Wright - #16

The TWIML AI Podcast with Sam Charrington

Interactive Machine Learning Systems with Alekh Agarwal - #17

Interactive Machine Learning Systems with Alekh Agarwal - #17

The TWIML AI Podcast with Sam Charrington

Location-Based Intelligence for Smarter Marketing with Klustera - #18

Location-Based Intelligence for Smarter Marketing with Klustera - #18

The TWIML AI Podcast with Sam Charrington

AI-Powered Customer Support with HelloVera - #18

AI-Powered Customer Support with HelloVera - #18

The TWIML AI Podcast with Sam Charrington

Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18

Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18

The TWIML AI Podcast with Sam Charrington

Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18

Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18

The TWIML AI Podcast with Sam Charrington

Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18

Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18

The TWIML AI Podcast with Sam Charrington

From Particle Physics to Audio AI with Scott Stephenson - #19

From Particle Physics to Audio AI with Scott Stephenson - #19

The TWIML AI Podcast with Sam Charrington

Selling AI to the Enterprise with Kathryn Hume - #20

Selling AI to the Enterprise with Kathryn Hume - #20

The TWIML AI Podcast with Sam Charrington

Engineering the Future of AI with Ruchir Puri - #21

Engineering the Future of AI with Ruchir Puri - #21

The TWIML AI Podcast with Sam Charrington

Deep Neural Nets for Visual Recognition with Matt Zeiler - #22

Deep Neural Nets for Visual Recognition with Matt Zeiler - #22

The TWIML AI Podcast with Sam Charrington

Introducing Psycholinguistics into AI with Dominique Simmons- #23

Introducing Psycholinguistics into AI with Dominique Simmons- #23

The TWIML AI Podcast with Sam Charrington

Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24

Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24

The TWIML AI Podcast with Sam Charrington

Offensive vs Defensive Data Science with Deep Varma - #25

Offensive vs Defensive Data Science with Deep Varma - #25

The TWIML AI Podcast with Sam Charrington

Global AI Trends with Ben Lorica - #26

Global AI Trends with Ben Lorica - #26

The TWIML AI Podcast with Sam Charrington

Intelligent Autonomous Robots with Ilia Baranov - #27

Intelligent Autonomous Robots with Ilia Baranov - #27

The TWIML AI Podcast with Sam Charrington

Reinforcement Learning Deep Dive with Pieter Abbeel - #28

Reinforcement Learning Deep Dive with Pieter Abbeel - #28

The TWIML AI Podcast with Sam Charrington

Robotic Perception and Control with Chelsea Finn - #29

Robotic Perception and Control with Chelsea Finn - #29

The TWIML AI Podcast with Sam Charrington

Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30

Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30

The TWIML AI Podcast with Sam Charrington

The Power of Probabilistic Programming with Ben Vigoda - #33

The Power of Probabilistic Programming with Ben Vigoda - #33

The TWIML AI Podcast with Sam Charrington

Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31

Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31

The TWIML AI Podcast with Sam Charrington

Video Object Detection at Scale with Reza Zadeh - #34

Video Object Detection at Scale with Reza Zadeh - #34

The TWIML AI Podcast with Sam Charrington

Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35

Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35

The TWIML AI Podcast with Sam Charrington

Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32

Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32

The TWIML AI Podcast with Sam Charrington

Smart Buildings & IoT with Yodit Stanton - #36

Smart Buildings & IoT with Yodit Stanton - #36

The TWIML AI Podcast with Sam Charrington

Deep Robotic Learning with Sergey Levine - #37

Deep Robotic Learning with Sergey Levine - #37

The TWIML AI Podcast with Sam Charrington

Deep Learning for Warehouse Operations with Calvin Seward - #38

Deep Learning for Warehouse Operations with Calvin Seward - #38

The TWIML AI Podcast with Sam Charrington

Cognitive Biases in Data Science with Drew Conway - #39

Cognitive Biases in Data Science with Drew Conway - #39

The TWIML AI Podcast with Sam Charrington

Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41

Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41

The TWIML AI Podcast with Sam Charrington

Web Scale Engineering for Machine Learning with Sharath Rao - #40

Web Scale Engineering for Machine Learning with Sharath Rao - #40

The TWIML AI Podcast with Sam Charrington

Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42

Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42

The TWIML AI Podcast with Sam Charrington

Machine Teaching for Better Machine Learning with Mark Hammond - #43

Machine Teaching for Better Machine Learning with Mark Hammond - #43

The TWIML AI Podcast with Sam Charrington

LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44

LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44

The TWIML AI Podcast with Sam Charrington

Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup

Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup

The TWIML AI Podcast with Sam Charrington

Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46

Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46

The TWIML AI Podcast with Sam Charrington

Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47

Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47

The TWIML AI Podcast with Sam Charrington

Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup

Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup

The TWIML AI Podcast with Sam Charrington

Word2Vec & Friends with Bruno Gonçalves -#48

Word2Vec & Friends with Bruno Gonçalves -#48

The TWIML AI Podcast with Sam Charrington

Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49

Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49

The TWIML AI Podcast with Sam Charrington

Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50

Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50

The TWIML AI Podcast with Sam Charrington

Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51

Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51

The TWIML AI Podcast with Sam Charrington

AI-Powered Conversational Interfaces with Paul Tepper - #52

AI-Powered Conversational Interfaces with Paul Tepper - #52

The TWIML AI Podcast with Sam Charrington

Topological Data Analysis with Gunnar Carlsson - #53

Topological Data Analysis with Gunnar Carlsson - #53

The TWIML AI Podcast with Sam Charrington

ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54

ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54

The TWIML AI Podcast with Sam Charrington

Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55

Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55

The TWIML AI Podcast with Sam Charrington

This video teaches the importance of model explainability in machine learning and explores various techniques and tools for achieving explainability. It discusses the challenges of explainability, including the lack of causality and uncertainty, and the need for emerging techniques like Clue that incorporate uncertainty.

Key Takeaways

Understand the concept of model explainability
Apply counterfactual explanations
Use open-source toolkits like IBM 360 fairness 360 and explainability 360
Evaluate model performance using metrics like accuracy and fairness
Design studies for explainability and analyze results for model interpretability

💡 Model explainability is crucial in machine learning, particularly in critical environments, and requires a combination of techniques and tools to achieve.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Research Methods

View skill →

Mechanics of Materials III: Beam Bending

Mechanics of Materials III: Beam Bending

Inaugural Lecture: Juliane Reinecke

Inaugural Lecture: Juliane Reinecke

Saïd Business School, University of Oxford

Hands-On Learning: How and Why You Should Build a Home Lab

Hands-On Learning: How and Why You Should Build a Home Lab

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

Does Water Swirl the Other Way in the Southern Hemisphere?

Does Water Swirl the Other Way in the Southern Hemisphere?

Undergraduate Research Forum 2026

Undergraduate Research Forum 2026

Related AI Lessons

10 Python Concepts You Must Know Before Calling Yourself Advanced

Learn 10 essential Python concepts to take your skills to the advanced level and stand out as a developer

10 Python Concepts You Must Know Before Calling Yourself Advanced

Learn 10 crucial Python concepts to elevate your skills from intermediate to advanced and become a proficient developer

Medium · Data Science

10 Python Concepts You Must Know Before Calling Yourself Advanced

Learn 10 essential Python concepts to take your skills to the advanced level and stand out as a developer

Medium · Programming

10 Python Concepts You Must Know Before Calling Yourself Advanced

Learn 10 essential Python concepts to take your skills to the advanced level and separate yourself from beginner developers

Medium · Python

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB