MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Parameter-free Clustering

Claudia Plant
UMIT - Private Universität für Gesundheitswissenschaften, Medizinische Informatik und Technik, Hall i. Tirol
Talk
AG 1, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Tuesday, 10 March 2009
11:15
45 Minutes
E1 4
024
Saarbrücken

Abstract

Technological progress opens up novel possibilities in many applications. For example in biology and medicine, it is possible to study cells, tissue and organisms in unprecedented accuracy with high‐throughput mass spectrometry or high resolution imaging. To make best use of the information potentially contained in the data, effective and efficient data mining methods are required. Clustering is among the most important tasks within unsupervised data mining.


Clustering algorithms generate a partitioning of the data into groups or clusters, such that the data objects assigned to a common cluster are as similar as possible and the data objects assigned to different clusters differ as much as possible. By performing a cluster analysis, the user can ideally gain an overview on the major characteristics of a data set without any previous knowledge. However, in practice, performing a cluster analysis is often not easy, since most clustering algorithms require numerous input parameters. Without background knowledge on the data, it is often difficult to find a suitable parameterization. Often, parameters need to be adjusted in a time consuming trial and error procedure. It cannot be guaranteed that a useful parameterization can be detected by doing so. Outliers and noise points in real‐world data additionally complicate the search for a suitable parameterization.

In this talk, I will discuss some novel approaches which are important milestones on the way to parameter‐free clustering. The basic idea of these techniques is to relate clustering to data compression. A good clustering is a clustering summarizing the major characteristics in the data and thus allows for effectively compressing the data. Based on this principle also known as Minimum Description Length, the algorithm RIC (Robust information‐theoretic Clustering) introduces a quality criterion for clustering to improve an arbitrary initial clustering, for example an imperfect clustering obtained with inappropriate parameterization. In addition, RIC provides effective and efficient algorithms for identification of noise points and outliers. The algorithm OCI (Outlierrobust Clustering using Independent Components) is a standalone algorithm for parameter‐free clustering. OCI relies on a very general cluster notion supported by the Exponential Power Distribution and Independent Component Analysis and provides effective clustering of non‐Gaussian data.

A brief survey on my further research areas including semi‐supervised and supervised learning concludes this talk.

Contact

Conny Liegl
302-70150
--email hidden
passcode not visible
logged in users only

Conny Liegl, 03/09/2009 13:42
Conny Liegl, 03/09/2009 13:42 -- Created document.