MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Multiple pass streaming algorithms for clustering and machine learning

Kevin Chang
Max-Planck-Institut für Informatik - D 1
AG1 Mittagsseminar (own work)

Newly arrived postdoc in the algorithms and complexity group.
AG 1, AG 2, AG 3, AG 4, AG 5, SWS  
AG Audience
English

Date, Time and Location

Thursday, 21 September 2006
13:30
30 Minutes
E1 4
024
Saarbrücken

Abstract

Streaming algorithms are designed to model the rigors of computation over massive data sets. The input to a streaming algorithm is a read-only array that may only be accessed sequentially in a small, constant number of passes. The algorithm may use a small, sublinear amount of extra memory to perform its calculations. The resource requirements to be optimized are the number of passes and the amount of extra memory.

In this talk, we will first review theoretical models for computation over massive data sets. We will then discuss small space, multiple pass algorithms for a class of clustering problems from machine learning, called learning mixtures of distributions, and corresponding nearly-tight lower bounds on resource usage. We will briefly sketch the general framework of these clustering algorithms, which can also be adapted to solve clustering problems in combinatorial optimization, such as facility location and k-median.

Lastly, we will explore a new problem in convex/concave optimization over massive data sets, motivated by the problem of parameter estimation of general statistical models.

Contact

Kevin Chang
--email hidden
passcode not visible
logged in users only

Tags, Category, Keywords and additional notes

streaming algorithms; clustering; massive data sets; statistics

Kevin Chang, 09/15/2006 11:13 -- Created document.