How to use discrete mathematics and theoretical computer science to understand neural networks? Guided by this question, I will focus on neural networks with rectified linear unit (ReLU) activations, a standard model and important building block in modern machine learning pipelines. The functions represented by such networks are continuous and piecewise linear. But how does the set of representable functions depend on the architecture? And how difficult is it to train such networks to optimality? In my talk I will answer fundamental questions like these using methods from polyhedral geometry, combinatorial optimization, and computational complexity.