In this work we study subgroup discovery, a theoretical framework for finding subgroups in data—i.e., named sub-populations—whose behaviour with respect to a specified target concept is exceptional when compared to the rest of the dataset.
This is a powerful tool that conveys crucial information to a human audience, but despite past advances has been limited to simple target concepts. In this work we propose algo-rithms that bring this framework to novel application domains.
Namely, we show how to efficiently discover representative subgroups, that ensure en-sure fairness of the sub-population with regard to a sensitive trait; for entities with addi-tional pairwise relational information we introduce a novel measure of
connectedness beyond established measures, and use it to discover named robustly con-nected sub-populations. Further, we introduce kernelised subgroup discovery, which gen-eralises this task to entities of arbitrary structure and with the additional flexibility of only requiring a Gramian appropriate for the given entities.
To use within kernelised subgroup discovery, but also on any other kind of kernel method, we additionally introduce a novel random walk graph kernel, which allows the fine tuning of the alignment between the vertices of the two compared graphs, during the count of the random walks, and propose a meaningful structure-aware vertex labels to utilise this new capability.
With these contributions we thoroughly extend the applicability of subgroup discovery and ultimately re-define it as a kernel-based method.