Campus Event Calendar: Frans Schalekamp (02/05/2010 in E1 4/024)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

New for: D3

What and Who

Clustering with or without the Approximation

Frans Schalekamp

Institute for Theoretical Computer Science Tsinghua University, Beijing

AG1 Mittagsseminar (own work)

AG 1, AG 4, RG1, MMCI, AG 3, AG 5, SWS

AG Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Friday, 5 February 2010

13:00

45 Minutes

E1 4

024

Saarbrücken

Abstract

We study algorithms for clustering data that were recently proposed by Balcan, Blum and Gupta in SODA'09 ([BBG'09]) and that have already given rise to two follow-up papers. The input for the clustering problem consists of points in a metric space and a number $k$, specifying the desired number of clusters. The algorithms find a clustering that is provably close to a target clustering, provided that the instance has the ``$( 1+\alpha, \epsilon )$-property'', which means that the instance is such that all solutions to the $k$-median problem for which the objective value is at most $(1+\alpha)$ times the optimal objective value correspond to clusterings that misclassify at most an $\epsilon$ fraction of the points with respect to the target clustering. We investigate the theoretical and practical implications of their results.

Our main contributions are as follows. First, we show that instances that have the $( 1+\alpha, \epsilon )$-property and for which, additionally, the clusters in the target clustering are large, are easier than general instances: the algorithm proposed in [BBG'09] is a constant factor approximation algorithm with an approximation guarantee that is better than the known hardness of approximation for general instances. Further, we show that it is $NP$-hard to check if an instance satisfies the $( 1+\alpha, \epsilon )$-property for a given $(\alpha, \epsilon)$; the algorithms in [BBG'09] need such $\alpha$ and $\epsilon$ as input parameters, however. Second, we show how to use their algorithms even if we do not know values of $\alpha$ and $\epsilon$ for which the assumption holds. Finally, we implement these methods and other popular methods, and test them on real world data sets. We find that on these data sets there are no $\alpha$ and $\epsilon$ so that the dataset has both $( 1+\alpha, \epsilon )$-property and sufficiently large clusters in the target solution. For the general case, we show that on our data sets the performance guarantee proved by [BBG'09] is meaningless for the values of $\alpha, \epsilon$ such that the data set has the $( 1+\alpha, \epsilon )$-property. The algorithm nonetheless gives reasonable results, although it is outperformed by other methods.

Joint work with Michael Yu and Anke van Zuylen.

Contact

Julian Mestre

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Julian Mestre, 02/04/2010 15:14 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis