MPI-INF Logo
Publications

Server    domino.mpi-inf.mpg.de

Proceedings Article, Paper
@InProceedings
Beitrag in Tagungsband, Workshop

Author, Editor
Author(s):
Bast, Holger
Majumdar, Debapriyo
Schenkel, Ralf
Theobald, Martin
Weikum, Gerhard
dblp
dblp
dblp
dblp
dblp
Editor(s):
Dayal, Umeshwar
Whang, Kyu-Young
Lomet, David B.
Alonso, Gustavo
Lohman, Guy M.
Kersten, Martin L.
Cha, Sang Kyun
Kim, Young-Kuk
dblp
dblp
dblp
dblp
dblp
dblp
dblp
dblp
Not MPII Editor(s):
Dayal, Umeshwar
Whang, Kyu-Young
Lomet, David B.
Alonso, Gustavo
Lohman, Guy M.
Kersten, Martin L.
Cha, Sang Kyun
Kim, Young-Kuk
BibTeX cite key*:
BastMSTW2006a
Title, Booktitle
Title*:
IO-Top-k: Index-Access Optimized Top-k Query Processing
Booktitle*:
Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006
Event, URLs
Conference URL::
http://aitrc.kaist.ac.kr/~vldb06/
Downloading URL:
http://www.vldb.org/conf/2006/p475-bast.pdf
http://www.cs.uiuc.edu/homes/hanj/refs/vldb06/contents/p475-bast.pdf
Event Address*:
Seoul, Korea
Language:
English
Event Date*
(no longer used):
Organization:
Event Start Date:
12 September 2006
Event End Date:
15 September 2006
Publisher
Name*:
ACM
URL:
Address*:
New York, USA
Type:
Vol, No, Year, pp.
Series:
Volume:
Number:
Month:
Pages:
475-486
Year*:
2006
VG Wort Pages:
ISBN/ISSN:
1-59593-385-9
Sequence Number:
DOI:
Note, Abstract, ©
(LaTeX) Abstract:
Top-$k$ query processing is an important building block for ranked retrieval,
with applications ranging from text and data integration to distributed
aggregation of network logs and sensor data.
Top-$k$ queries operate on index lists for a query's elementary conditions
and aggregate scores for result candidates. One of the best implementation
methods in this setting is the family of threshold algorithms, which aim
to terminate the index scans as early as possible based on lower and upper
bounds for the final scores of result candidates. This procedure
performs sequential disk accesses for sorted index scans, but also has the option
of performing random accesses to resolve score uncertainty. This entails
scheduling for the two kinds of accesses: 1) the prioritization of different
index lists in the sequential accesses, and 2) the decision on when to perform
random accesses and for which candidates.

The prior literature has studied some of these scheduling issues, but only for each of the two access types in isolation.
The current paper takes an integrated view of the scheduling issues and develops
novel strategies that outperform prior proposals by a large margin.
Our main contributions are new, principled, scheduling methods based on a Knapsack-related
optimization for sequential accesses and a cost model for random accesses.
The methods can be further boosted by harnessing probabilistic estimators for scores,
selectivities, and index list correlations.
In performance experiments with three different datasets (TREC Terabyte, HTTP server logs, and IMDB),
our methods achieved significant performance gains compared to the best previously known methods.
Download
Access Level:
Internal

Correlation
MPG Unit:
Max-Planck-Institut für Informatik
MPG Subunit:
Databases and Information Systems Group
Appearance:
MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort



BibTeX Entry:

@INPROCEEDINGS{BastMSTW2006a,
AUTHOR = {Bast, Holger and Majumdar, Debapriyo and Schenkel, Ralf and Theobald, Martin and Weikum, Gerhard},
EDITOR = {Dayal, Umeshwar and Whang, Kyu-Young and Lomet, David B. and Alonso, Gustavo and Lohman, Guy M. and Kersten, Martin L. and Cha, Sang Kyun and Kim, Young-Kuk},
TITLE = {{IO-Top-k}: Index-Access Optimized Top-k Query Processing},
BOOKTITLE = {Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006},
PUBLISHER = {ACM},
YEAR = {2006},
PAGES = {475--486},
ADDRESS = {Seoul, Korea},
ISBN = {1-59593-385-9},
}


Entry last modified by Uwe Brahm, 07/07/2007
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Ralf Schenkel
Created
06/09/2006 15:35:14
Revisions
11.
10.
9.
8.
7.
Editor(s)
Uwe Brahm
Regina Kraemer
Christine Kiesel
Christine Kiesel
Christine Kiesel
Edit Dates
07/07/2007 00:46:41
04/16/2007 09:59:02 AM
12.02.2007 09:55:54
09.02.2007 15:55:48
08.02.2007 10:24:39