Campus Event Calendar

Event Entry

What and Who

Protecting Correctness in Adaptive Data Analysis

Dr. Moritz Hardt
IBM Almaden Research Center, San Jose, California, USA
MPI Colloquium Series Distinguished Speaker

Moritz Hardt is a research staff member at the IBM Almaden
Research Center. He received his PhD in 2011 from Princeton University
with a dissertation on differential privacy. His research interests
include algorithms for machine learning, privacy-preserving data
analysis, social question in computation, and the role computational
complexity plays in the sciences. He served on the program committee
of STOC 2013, FOCS 2013 and STOC 2014. When not on program committees,
he enjoys a diverse set of hobbies including road cycling, mountain
biking and occasional cyclocross.
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience

Date, Time and Location

Monday, 30 June 2014
60 Minutes
E1 4


False discovery is a growing problem in scientific research. Despite
sophisticated statistical techniques for avoiding overfitting and
controlling the false discovery rate, there is significant evidence
that many published scientific papers contain incorrect conclusions.

In this talk we consider the role that adaptivity has in this problem.
A fundamental disconnect between the theory of controlling false
discovery and the practice of science is that most theorems assume a
fixed collection of hypotheses to be tested, selected non-adaptively
before the data is gathered. Science however is an adaptive process,
in which data is shared and re-used, while hypotheses are formed after
seeing previous results.

We show that remarkably there is a general approach that allows an
adaptive analyst to evaluate a large number of statistics on a single
data set, while guaranteeing that with high probability, all of the
conclusions she draws generalize to the underlying distribution. This
technique counter-intuitively involves actively perturbing the
statistics using techniques developed for privacy preservation---but
in our application, the perturbations are added entirely to increase
the utility of the data. We also present a new computational hardness
result that closely characterizes the limits of our algorithmic

Based on joint works with Cynthia Dwork, Vitaly Feldman, Omer
Reingold, Aaron Roth and Jon Ullman.


Jennifer Müller
--email hidden
passcode not visible
logged in users only

Jennifer Müller, 06/23/2014 13:19 -- Created document.