Measurements, predictions, and the puzzle of machine learning: what data from 10 million hosts can teach us about security
Tudor Dumitras
University of Maryland
SWS Colloquium
Tudor Dumitraș is an Assistant Professor in the Electrical & Computer Engineering Department at the University of Maryland, College Park.
His research focuses on data-driven security: he studies real-world adversaries empirically, he builds machine learning systems for detecting attacks
and predicting security incidents, and he investigates the security of machine learning in adversarial environments. In his previous role at
Symantec Research Labs he built the Worldwide Intelligence Network Environment (WINE) - a data analytics platform for security research.
His work on the effectiveness of certificate revocations in the Web PKI was featured in the Research Highlights of the Communications of the
ACM in 2018, and his measurement of the duration and prevalence of zero-day attacks received an Honorable Mention in the NSA competition
for the Best Scientific Cybersecurity Paper of 2012. He also received the 2011 A. G. Jordan Award from the ECE Department at
Carnegie Mellon University, the 2009 John Vlissides Award from ACM SIGPLAN, and the Best Paper Award at ASP-DAC'03.
Tudor holds a Ph.D. degree from Carnegie Mellon University.
What are the odds that you will get hacked tomorrow? To answer this question, it is not enough to reason about the state of your host -- we must also understand how easy it is for adversaries to exploit software vulnerabilities and what helps them distribute malware around the world. Moreover, the machine learning techniques that drive the success of such prediction tasks in non-adversarial domains, like computer vision or autonomous driving, face new challenges in security.
In this talk I will discuss my work, combining machine learning with global-scale measurements, that has exposed critical security threats and has guided industrial practices. First, I will present the Worldwide Intelligence Network Environment (WINE), an analytics platform that has enabled systematic security measurements across more than 10 million hosts from around the world. Second, I will use WINE as a vehicle for exploring open research questions, such as the duration and impact of zero-day attacks, the weaknesses in public key infrastructures (PKIs) that allow malware to masquerade as reputable software, and how we can use machine learning to predict certain security incidents. I will conclude by discussing the impact of these predictions on the emerging cyber insurance industry and the lessons we learned about using machine learning in the security domain.