Technical, Research Report
@TechReport
Technischer-, Forschungsbericht


Show entries of:

this year (2019) | last year (2018) | two years ago (2017) | Notes URL

Action:

login to update

Options:









Author, Editor

Author(s):

Bender, Matthias
Michel, Sebastian
Triantafillou, Peter
Weikum, Gerhard

dblp
dblp
dblp
dblp

Not MPG Author(s):

Triantafillou, Peter

Editor(s):





BibTeX Citekey*:

TechReportGDF2006

Language:

English

Title, Institution

Title*:

Overlap-Aware Global df Estimation in Distributed Information Retrieval Systems

Institution*:

Max-Planck-Institut for Informatics

Publishers or Institutions Address*:

Saarbrücken, Germany

Type:

Research Report

No, Year, pp.,

Number*:

MPI-I-2006-5-001

Pages*:

28

Month:

January

VG Wort
Pages*:

55

Year*:

2006

ISBN/ISSN:

0946-011X





DOI:




Note, Abstract, ©

Note:


(LaTeX) Abstract:

Peer-to-Peer (P2P) search engines and other forms of distributed information retrieval (IR)
are gaining momentum. Unlike in centralized IR, it is difficult and expensive to compute
statistical measures about the entire document collection as it is widely
distributed across many computers in a highly dynamic network.
On the other hand, such network-wide statistics, most notably, global document frequencies of the individual terms,
would be highly beneficial for ranking global search results that are compiled from different
peers. This paper develops an efficient and scalable method for estimating global document frequencies in
a large-scale, highly dynamic P2P network with autonomous peers.
The main difficulty that is addressed in this paper is that the local collections of different peers
may arbitrarily overlap, as many peers may choose to gather popular documents that
fall into their specific interest profile.
Our method is based on hash sketches as an underlying technique for compact data synopses,
and exploits specific properties of hash sketches for duplicate elimination in the counting process.
We report on experiments with real Web data that demonstrate the accuracy of our estimation method
and also the benefit for better search result ranking.

Categories / Keywords:

peer-to-peer, distributed information retrieval, global document frequency estimation

Copyright Message:


HyperLinks / References / URLs:


Personal Comments:


File Upload:


Download
Access Level:

MPG

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

Databases and Information Systems Group

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort


BibTeX Entry:
@TECHREPORT{TechReportGDF2006,
AUTHOR = {Bender, Matthias and Michel, Sebastian and Triantafillou, Peter and Weikum, Gerhard},
TITLE = {Overlap-Aware Global df Estimation in Distributed Information Retrieval Systems},
YEAR = {2006},
TYPE = {Research Report},
INSTITUTION = {Max-Planck-Institut for Informatics},
NUMBER = {MPI-I-2006-5-001},
PAGES = {28},
ADDRESS = {Saarbr{\"u}cken, Germany},
MONTH = {January},
ISBN = {0946-011X},
}


Entry last modified by Adriana Davidescu, 01/19/2007
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Matthias Bender
Created
02/21/2006 11:05:29 AM
Revisions
4.
3.
2.
1.
0.
Editor(s)
Adriana Davidescu
Uwe Brahm
Adriana Davidescu
Adriana Davidescu
Adriana Davidescu
Edit Dates
19.01.2007 18:33:17
2007-01-19 11:25:45
17.01.2007 15:53:50
21.08.2006 12:29:11
02/21/2006 11:05:30 AM