MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Smoothed Analysis of k-Means Clustering

Bodo Manthey
Max-Planck-Institut für Informatik - D1
AG1 Mittagsseminar (own work)
AG 1, AG 3, AG 5, SWS, AG 2, AG 4, RG1, MMCI  
AG Audience
English

Date, Time and Location

Tuesday, 2 December 2008
14:00
45 Minutes
E1 4
024
Saarbrücken

Abstract

The k-means method is one of the most popular algorithms for clustering

high-dimensional data into a given number k of classes. However, its good
performance in praxis is at stark contrast to its theoretical performance: In
the worst case, the running-time can be exponential. The goal of smoothed
analysis is to provide an explanation for the good practical performance of
algorithms with poor worst-case performance.

We present a smoothed analysis of the k-means method: An adversary specifies a
point set, and then the points are perturbed with a Gaussian with standard
deviation \sigma. We prove two bounds on the smoothed running-time of the
k-means algorithm: Our first bound is polynomial in n^{\sqrt k} and 1/\sigma,
where n is the number of points. Our second bound is k^{kd} poly(n, 1/\sigma),
where d is the dimension of the instance. This yields a polynomial bound if both
k and d are small compared to n.

Finally, we generalize the analysis of k-means to a large class of other
distance measures. The exponential lower bound for squared Euclidean distances
(Arthur, Vassilvitskii 2006) can be transferred to most Bregman divergences. The
smoothed analysis can also be adapted to (almost) arbitrary Bregman divergences.
Examples of Bregman divergences are Kullback-Leibler divergence or Mahalanobis
distances, the latter includes squared Euclidean distances.

Based on joint work with Heiko Roeglin.

Contact

Khaled Elbassioni
--email hidden
passcode not visible
logged in users only

Khaled Elbassioni, 11/21/2008 12:30 -- Created document.