MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Discovering Robust Dependencies from Data

Panagiotis Mandros
Cluster of Excellence - Multimodal Computing and Interaction - MMCI
Promotionskolloquium
AG 1, AG 2, AG 3, INET, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Thursday, 4 March 2021
11:00
90 Minutes
Virtual talk
Virtual talk
Saarbrücken

Abstract

Nowadays, scientific discoveries are fueled by data-driven approaches. As a motivating example from Materials Science, using appropriate feature selection techniques the physicochemical process of compound semiconductor crystallization can now be described with only a few atomic material properties. The dissertation investigates this direction further, that is, we aim to develop interpretable knowledge discovery techniques that can uncover and meaningfully summarize complex phenomena from data.

Such feature selection tasks are very demanding. The dependencies in data can be of any type (e.g., linear, non-linear, multivariate), while the data can be of any type as well (i.e., mixtures of discrete and continuous attributes). We have to accommodate all types in our analysis, because otherwise we can miss important interactions (e.g., linear methods cannot find non-linear relationships). For interpretability, the degree of dependence should be meaningfully summarized (e.g., “target Y depends 50% on X”, or “the variables in set X exhibit 80% dependence”). From a statistical perspective, the degree of dependence should be robustly measured from data samples of any size to minimize spurious discoveries. Lastly, the combinatorial optimization problem to discover the top dependencies out of all possible dependencies is NP-hard, and hence, we need efficient exhaustive algorithms. For situations where this is prohibitive (i.e., large dimensionalities), we need approximate algorithms with guarantees such that we can interpret the quality of the solution.

In this dissertation, we solve the aforementioned challenges and propose effective algorithms that discover dependencies in both supervised and unsupervised scenarios. In particular, we employ notions from Information Theory to meaningfully quantify all types of dependencies (e.g., Mutual Information and Total Correlation), which we then couple with notions from statistical learning theory to robustly estimate them from data. Lastly, we derive tight and efficient bounding functions that can be used by branch-and-bound, enabling exact solutions in large search spaces. Case studies on Materials Science data confirm that we can indeed discover high-quality dependencies from complex data.

-------------------------------------------------------------------------------------------------------------------------

The zoom link:

https://cispa-de.zoom.us/j/94010943330?pwd=VkcvMS93Y0V0eGxHb2oxMTNzZjI5Zz09
Meeting ID: 940 1094 3330
Passcode: Dr-1ng

Contact

Petra Schaaf
+49 681 9325 5000
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 02/19/2021 11:46
Petra Schaaf, 02/19/2021 11:45 -- Created document.