MPI-INF Logo
Publications

Server    domino.mpi-inf.mpg.de

Proceedings Article, Paper
@InProceedings
Beitrag in Tagungsband, Workshop

Author, Editor
Author(s):
Chang, Kevin L.
Kannan, Ravi
dblp
dblp
Not MPG Author(s):
Kannan, Ravi
Editor(s):
BibTeX cite key*:
Chang2006
Title, Booktitle
Title*:
The Space Complexity of Pass-Efficient Algorithms for Clustering
census-soda.ps (423.74 KB)
Booktitle*:
Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'06
Event, URLs
Conference URL::
http://www.siam.org/meetings/da06/
Downloading URL:
Event Address*:
Miami, USA
Language:
English
Event Date*
(no longer used):
Organization:
Event Start Date:
21 February 2007
Event End Date:
21 February 2007
Publisher
Name*:
ACM / Siam
URL:
Address*:
New York, USA
Type:
Vol, No, Year, pp.
Series:
Volume:
Number:
Month:
January
Pages:
1157-1166
Year*:
2006
VG Wort Pages:
ISBN/ISSN:
0-89871-605-5
Sequence Number:
DOI:
Note, Abstract, ©
Note:
This is the conference version. Full version under submission to journal.
(LaTeX) Abstract:
We present multiple pass streaming algorithms for a basic clustering problem for massive data sets. If our algorithm is allotted 2l passes, it will produce an approximation with error at most ε using Õ(k3/ε2/l) bits of memory, the most critical resource for streaming computation. We demonstrate that this tradeoff between passes and memory allotted is intrinsic to the problem and model of computation by proving lower bounds on the memory requirements of any l pass randomized algorithm that are nearly matched by our upper bounds. To the best of our knowledge, this is the first time nearly matching bounds have been proved for such an exponential tradeoff for randomized computation.In this problem, we are given a set of n points drawn randomly according to a mixture of k uniform distributions and wish to approximate the density function of the mixture. The points are placed in a datastream (possibly in adversarial order), which may only be read sequentially by the algorithm. We argue that this models, among others, the datastream produced by a national census of the incomes of all citizens.
Keywords:
algorithms, streaming, massive data set, clustering, statistics
Copyright Message:
Copyright 2006 ACM. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Download
Access Level:
Public

Correlation
MPG Unit:
Max-Planck-Institut für Informatik
MPG Subunit:
Algorithms and Complexity Group
Appearance:
MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort



BibTeX Entry:

@INPROCEEDINGS{Chang2006,
AUTHOR = {Chang, Kevin L. and Kannan, Ravi},
TITLE = {The Space Complexity of Pass-Efficient Algorithms for Clustering},
BOOKTITLE = {Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'06},
PUBLISHER = {ACM / Siam},
YEAR = {2006},
PAGES = {1157--1166},
ADDRESS = {Miami, USA},
MONTH = {January},
ISBN = {0-89871-605-5},
NOTE = {This is the conference version. Full version under submission to journal.},
}


Entry last modified by Christine Kiesel, 04/30/2007
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Kevin Chang
Created
02/21/2007 12:08:17 PM
Revisions
4.
3.
2.
1.
0.
Editor(s)
Christine Kiesel
Regina Kraemer
Christine Kiesel
Kevin Chang
Kevin Chang
Edit Dates
30.04.2007 18:26:08
04/16/2007 10:03:37 AM
10.03.2007 07:58:18
03/01/2007 05:12:38 PM
02/21/2007 12:08:17 PM


File Attachment Icon
census-soda.ps