In this talk, I will give an overview of the principles underlying my work on building robust deep neural networks for computer vision. My working hypothesis is that vision systems need a causal 3D understanding images by following an analysis-by-synthesis approach. I will discuss a new type of neural network architecture that implements such an approach, and I will show that these generative neural network models are vastly superior to traditional models in terms of robustness, learning efficiency and because they can solve many vision tasks at once. Finally, I will give a brief outlook on current projects of mine and future research directions.