Campus Event Calendar

Event Entry

What and Who

Towards Designing Inherently Interpretable Deep Neural Networks for Image Classification

Moritz Böhle
Max-Planck-Institut für Informatik - D2
AG 1, INET, AG 5, RG1, SWS, AG 2, AG 4, D6, AG 3  
Public Audience

Date, Time and Location

Friday, 3 May 2024
60 Minutes
E 1.4


Over the last decade, Deep Neural Networks (DNNs) have proven successful in a wide range of applications and hold the promise to have a positive impact on our lives. However, especially in high-stakes situations in which a wrong decision can be disastrous, it is imperative that we can understand and obtain an explanation for a model’s ‘decision’. This thesis studies this problem for image classification models from three directions. First, we evaluate methods that explain DNNs in a post-hoc fashion and highlight promises and shortcomings of existing approaches. Second, we study how to design inherently interpretable DNNs. In contrast to explaining the models post hoc, this approach not only takes the training procedure and the DNN architecture into account, but also modifies them to ensure that the decision process becomes inherently more transparent. In particular, two novel DNN architectures are introduced: the CoDA and the B-cos Networks. For every prediction, the computations of those models can be expressed by an equivalent linear transformation. As the corresponding linear matrix is optimised during training to align with task-relevant input patterns, it is shown to localise relevant input features well and thus lends itself to be used as an explanation for humans. Finally, we investigate how to leverage explanations to guide models during training, e.g., to suppress reliance on spuriously correlated features or to increase the fidelity of knowledge distillation approaches.


Connie Balzert
+49 681 9325 2000
--email hidden

Virtual Meeting Details

632 3152 1065
passcode not visible
logged in users only

Connie Balzert, 04/23/2024 09:48 -- Created document.