Technical, Research Report
@TechReport
Technischer-, Forschungsbericht


Show entries of:

this year (2023) | last year (2022) | two years ago (2021) | Notes URL

Action:

login to update

Options:









Author, Editor
Author(s):
Bast, Holger
Majumdar, Debapriyo
Schenkel, Ralf
Theobald, Martin
Weikum, Gerhard
dblp
dblp
dblp
dblp
dblp
Editor(s):

BibTeX Citekey*:

TechReportGDF2006

Language:

English

Title, Institution

Title*:

IO-Top-k: Index-Access Optimized Top-k Query Processing

Institution*:

Max-Planck-Institut for Informatics

Publishers or Institutions Address*:

Saarbrücken, Germany

Type:

Research Report

No, Year, pp.,

Number*:

MPI–I–2006–5-002

Pages*:

43

Month:

March

VG Wort
Pages*:

.

Year*:

2006

ISBN/ISSN:

0946-011X





DOI:




Note, Abstract, ©

Note:


(LaTeX) Abstract:

Top-k query processing is an important building block for ranked retrieval,with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k queries operate on index lists for a query's elementary conditions and aggregate scores for result candidates. One of the best implementation methods in this setting is the family of threshold algorithms, which aim
to terminate the index scans as early as possible based on lower and upper bounds for the final scores of result candidates. This procedure performs sequential disk accesses for sorted index scans, but also has the option of performing random accesses to resolve score uncertainty. This entails
scheduling for the two kinds of accesses: 1) the prioritization of different index lists in the sequential accesses, and 2) the decision on when to perform random accesses and for which candidates.

The prior literature has studied some of these scheduling issues, but only for each of the two access types in isolation.
The current paper takes an integrated view of the scheduling issues and develops novel strategies that outperform prior proposals by a large margin. Our main contributions are new, principled, scheduling methods based on a Knapsack-related
optimization for sequential accesses and a cost model for random accesses. The methods can be further boosted by harnessing probabilistic estimators for scores, selectivities, and index list correlations. We also discuss efficient implementation techniques for the underlying data structures.
In performance experiments with three different datasets (TREC Terabyte, HTTP server logs, and IMDB), our methods achieved significant performance gains compared to the best previously known methods: a factor of up to 3 in terms of execution costs, and a factor of 5 in terms of absolute run-times of our implementation. Our best techniques are close to a lower bound for the execution cost of the considered class of threshold algorithms.

Categories / Keywords:


Copyright Message:


HyperLinks / References / URLs:


Personal Comments:


File Upload:


Download
Access Level:

MPG

Correlation
MPG Unit:
Max-Planck-Institut für Informatik
MPG Subunit:
Databases and Information Systems Group
Audience:
popular
Appearance:
MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort


BibTeX Entry:
@TECHREPORT{TechReportGDF2006,
AUTHOR = {Bast, Holger and Majumdar, Debapriyo and Schenkel, Ralf and Theobald, Martin and Weikum, Gerhard},
TITLE = {{IO-Top-k}: Index-Access Optimized Top-k Query Processing},
YEAR = {2006},
TYPE = {Research Report},
INSTITUTION = {Max-Planck-Institut for Informatics},
NUMBER = {MPI–I–2006–5-002},
PAGES = {43},
ADDRESS = {Saarbr{\"u}cken, Germany},
MONTH = {March},
ISBN = {0946-011X},
}


Entry last modified by Holger Bast, 09/19/2006
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Matthias Bender
Created
06/09/2006 15:35:14
Revisions
5.
4.
3.
2.
1.
Editor(s)
Holger Bast
Ralf Schenkel
Petra Schaaf
Petra Schaaf
Petra Schaaf
Edit Dates
09/19/2006 04:20:22 AM
05/31/2006 09:09:06 AM
30.03.2006 09:08:49
27.03.2006 12:00:21
27.03.2006 11:59:06