MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Highly Discriminative Keys for Collection Selection in Distributed Retrieval

Djoerd Hiemstra
University of Twente
Talk
AG 1, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Tuesday, 17 March 2009
15:00
60 Minutes
E1 4
024
Saarbrücken

Abstract

Distributed information retrieval is a well researched sub area of
information retrieval, but it has not resulted in practical solutions
for large scale search problems because of high administration costs of
setting up large numbers of installations and because it turns out to be
hard in practice to direct queries to the appropriate local search
systems. However, large-scale distributed search will solve many
scalibility problems of today's search engines, for instance by
providing an infrastructure that does not need crawling of web pages to
keep the index up-to-date. I will investigate the following distributed
information retrieval scenario: Suppose every web server on the world
wide web has its own site search engine that provides local search
engine results pages in some standard (XML) format; A search provider
provides a layer on top of that, providing distributed search over the
local search engines. I present Sophos, a prototype search provider that
uses so-called highly discriminative keys for database selection in
distributed search. Sophos was evaluated on different aspects, such as
collection selection performance and its index size. The performance of
Sophos is compared to a baseline that applies a standard language
modeling approach on the merged documents in collections. The results
show that Sophos can outperform the baseline on the TREC web track data
set. So-called query-driven indexing is able to substantially reduce
index sizes of Sophos against a small loss in collection selection
performance. We believe the approach followed by Sophos shows potential,
and we will pursue the idea further in a new project on Distributed
Search that we've just started (see our recent job advertisements).

Contact

Gerhard Weikum
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 03/13/2009 09:35
Petra Schaaf, 03/11/2009 09:36 -- Created document.