world and have poten-tial impact on peoples lives: autonomous cars,
assisted medical diagnosis or social scoring. Attributable to training
increasingly complex models on increasingly large datasets, these
applications have become useful since they are trained to be accurate
prediction machines. Typically, this is achieved by optimizing for
accuracy only while disregarding two critical weak points of deep
learning persisting since its inception. Firstly, the complexity of used
models render it difficult to explain and understand causes for
incorrect predictions – coined as black box property. Secondly, models
are susceptible to adversarial examples – slight input pertur-bations
otherwise imperceptible to humans – that can result in dramatic changes
in predictions. While mitigation approaches exist, these are often
expensive to train and hence are not deployed by default in practice.
Both issues reduce the trustworthiness of deep learning and could dampen
further adoption for real world problems. In this thesis defense, I
discuss mitigations for both issues in two parts. In the first part, I
discuss our proposed Semantic Bottlenecks that explicitly align
intermediate representations to human meaningful concepts like feet,
leg, wood, etc. while reducing dimensionality to address the black-box
issue and show that these bottlenecks can be useful for error analysis.
In the second part, I discuss two ways to mitigate the risk to
adversarial examples with a focus on reducing the computational
over-head of conventionally used adversarial training:
(i) training on data subsets and
(ii) utilize Lipschitz bounds to enable certification.