Exploratory Data Analysis

Jilles Vreeken
Wednesday, 4 June 2014
The goal of exploratory data analysis ---or, data mining--- is making sense of data.

We develop theory and algorithms that help you understand your data better, with
the lofty goal that this helps formulating (better) hypotheses. More in particular, our
methods give detailed insight in how data is structured: characterising distributions
in easily understandable terms, showing the most informative patterns, associations,
correlations, etc.

My talk will consist of three parts. I will start my talk with a quick overview of the
techniques we have on offer---in terms of data analysis problems, data types, as well
as my general approach of using information theory to formalise informativeness.

In the second part I will discuss how to find the most significant patterns from event sequence
data. In short, our goal is mining a small set of patterns (serial episodes) that together
characterise the data well. I will go into the formulation of the problem, as well as the fast
heuristics we developed to find good solutions.

Third, I will discuss work in progress: can we derive a causal inference rule to tell
whether X causes Y or whether they are merely correlated? Can we do so in a principled
manner, without parameters, and without having to assume distributions? Can we do so
when X and Y are not iid sets of observations, but objects in general? Hopefully not killing
any excitement, but so far the answer seems yes.

Besides this all, I will also answer the frequently asked question of how to pronounce my name.


