Clear Boxes over Black Boxes

When Exactly to Prioritize Interpretability in Machine Learning

5 min readSep 27, 2021

As an emerging data scientist, I feel I am getting well-acquainted with the little jolt of adrenaline that comes with experiencing an increase in a model’s performance. I now understand how easy it is to, for example, add just a couple more features to eke just a little bit more out of my my R-squared score. Tweaking a model to make accuracy metrics tick up higher and higher can get addictive!

Of course, increasing performance almost inevitably means increasing the complexity of my model, oftentimes exponentially. It’s somewhat of a buzzkill realization, but the fancier and more accurate your model is, the more challenging it becomes to explain the inner workings of your model.

This “explainability” factor is referred to as “interpretability,” and is one of those concepts that I heard referred to a lot as I was starting to build my first few models .As is shown in the visualization below, there is an inevitable trade-off between accuracy and interpretability in model building.

As a a beginner Data Science student, I struggled with this quite a bit. Intuitively, accuracy seems to be the thing that should trump all other qualities. There’s a reason why neural nets and XGBoost ensemble methods win Kaggle competitions, right? If a model is able to predict with close to 100% accuracy, who cares about being able to explain how it works?

The concept of interpretability can be a bit fuzzy, and can also be highly subjective. When I first learned of the concept, I thought, “Interpretable for whom?” After all, someone with a Ph.D in computational science will find a K-Nearest Neighbors model a lot more easily “interpretable” than a marketing executive who’s hired you to build a one-off recommendation system.

However, I came to realize “interpretability” has a less to do with understandability and a lot more to do with trust. At its most basic level, interpretability means being able to see why a model made a certain prediction. Interpretability isn’t synonymous with simplicity — an “interpretable” model doesn’t necessarily mean that you’re able to grasp 100% of the mathematical theory contained within its algorithm.

Distinct from the “black box” world of neural nets, an interpretable model is instead a “clear” (or at least an opaque) box. When you “interpret” a model, you are able to use validation steps in order to justify that your model is making decisions in the “right” way. By “right,” I mean that the model is placing the right level of value on the correct variables to make decisions, so you can confidently say that your model is unbiased, practical, and yes, truly accurate.

Interpretability is manifested throughout the modeling process:

Explaining the preprocessing steps you took to build your model
Showing the top features that impact the model’s outcome, allowing you to explain how your model came to its conclusions and which features were the most (and least) impactful
Comparing predicted and actual values from your model, and being able to explain the fit of your model to each feature

After a bit more reading and experience, I’ve come to realize that in the real world, and to most of the clients and stakeholders that you will serve as a real-world data scientist, interpretability should be thought of as less something to “balance” against accuracy and as more something to simply prioritize.

The call to prioritize interpretability is nothing new; Yves Kodratoff drafted a “Comprehensibility Manifesto” in 1994. The need for interpretable models has been maintained up to and including the present day; with computer scientist Cynthia Rudin writing, “The way forward is to design models that are inherently interpretable […] trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practice and can potentially cause great harm to society.” In late 2019, Andreas Muller, a Core Developer of Scikit-Learn, stated in an interview, “Your goal is never accuracy and it’s never ROC-AUC. That’s not what your application is about. You should think about what does [your model] mean in the context of your application to make a particular outcome.” Coming from a Scikit-Learn background, this view is unsurprising — but I think Muller has a point.

Why is interpretability so important? Most fundamentally, Data Science is, after all, a science. The goal of scientific disciplines (biology, sociology, psychology) is to find patterns in naturally occurring data, in order to form and test hypotheses about why these patterns occur. How lame would it have been if Isaac Newton had predicted that an apple, like all the apples before it, would fall downward from an apple tree — and then called it quits?

In addition to its importance for the basic principles of scientific pursuit, interpretability also carries ethical implications. In medicine, generalizability is critical. Even if a model is 100% accurate, turning every person in the population into a data point to feed into a model is (at least for now) impractical. Furthermore, being able to distinguish between the importance of various risk factors for the onset of specific diseases is arguably more important than increasing a model’s accuracy for predicting that disease. Telling a person you are 65% vs. 70% vs. 75% sure that they will be diagnosed with cancer at age 60 carries little difference practically, and brings little comfort. However, what if you could tell them with 100% certainty the three most impactful lifestyle changes they could make in order to significantly lower their risk?

Given these scenarios, it’s easy to grasp the importance of interpretability. In his book Interpretable Machine Learning, Christoph Molnar outlines only three situations in which one does not need interpretability in their model:

If the analysis has no significant practical impact (e.g., a model that you’re building for fun).
If the problem is well-studied and no additional insights need to be gleaned (Molnar gives the example of a model that extracts addresses from images of handwritten envelopes).
If additional interpretability might lead people to (inadvertently or on purpose) “game” the system. As a somewhat innocuous example, online content creators might take advantage of the way a social media platform’s algorithm weighs certain features (e.g. keywords, hashtags) to enhance their visibility.

End of list! For all other scenarios, interpretability should be at the forefront. As tempting as it is to zero in on scores as the end-all-be-all of your model, I’ve learned that accuracy should rarely be the #1 thing that you aim to improve in your model. As Haebichan Jung so aptly points out, real life is not a Kaggle competition. In prioritizing interpretability, you will ensure your models behave ethically and legally, and can be trusted by clients and those whom you seek to serve.

Clear Boxes over Black Boxes

When Exactly to Prioritize Interpretability in Machine Learning

Written by Alexis Kedo

No responses yet