Explainability, Interpretability and Observability in Machine Learning | by Jason Zhong

Explainability has no standard definition, but rather is generally accepted to refer to “the movement, initiatives, and efforts made in response to AI transparency and trust concerns” (Adadi & Berrada, 2018). Bibal et al. (2021) aimed to produce a guideline on the legal requirements, concluding that an explainable model must be able to “(i) [provide] the main features used to make a decision, (ii) [provide] all the processed features, (iii) [provide] a comprehensive explanation of the decision and (iv) [provide] an understandable representation of the whole model”. They defined explainability as providing “meaningful insights on how a particular decision is made” which requires “a train of thought that can make the decision meaningful for a user (i.e. so that the decision makes sense to him)”. Therefore, explainability refers to the understanding of the internal logic and mechanics of a model that underpin a decision.

A historical example of explainability is the Go match between AlphaGo, a algorithm, and Lee Sedol, considered one of the best Go players of all time. In game 2, AlphaGo’s 19th move was widely regarded by experts and the creators alike as “so surprising, [overturning] hundreds of years of received wisdom” (Coppey, 2018). This move was extremely ‘unhuman’, yet was the decisive move that allowed the algorithm to eventually win the game. Whilst humans were able to determine the motive behind the move afterward, they could not explain why the model chose that move compared to others, lacking an internal understanding of the model’s logic. This demonstrates the extraordinary ability of machine learning to calculate far beyond human ability, yet raises the question: is this enough for us to blindly trust their decisions?

Whilst accuracy is a crucial factor behind the adoption of machine learning, in many cases, explainability is valued even above accuracy.

Doctors are unwilling, and rightfully so, to accept a model that outputs that they should not remove a cancerous tumour if the model is unable to produce the internal logic behind the decision, even if it is better for the patient in the long run. This is one of the major limiting factors as to why machine learning, even despite its immense potential, has not been fully utilised in many sectors.