MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Result Merging and Index Processing in Peer-to-Peer Web search engine

Sergey Chernov
IMPRS
Masters' Lunch
AG 1, AG 2, AG 3, AG 4, AG 5  
MPI Audience

Date, Time and Location

Tuesday, 23 November 2004
12:59
-- Not specified --
46.1 - MPII
024
Saarbrücken

Abstract

Tremendous amount of information in Internet determined
importance of search engines as most widely used tools.
Currently, only commercial and centralized search engines
like Google can process terabytes of web documents.
Even now this approach fails in indexing the "hidden web"
located in intranets and local databases.

The scalability, self-organization and fault tolerance are
important properties of popular Peer-to-Peer systems which we want to exploit.
Minerva project is a collaboration of web search engines based
on Peer-to-Peer architecture.

Search engines on several selected peers process their inverted
indexes with Fagin's threshold algorithm to obtain top-k highly
ranked documents for current query. Best top-k results from these
peers are collected by query initiator and merged into one top-k list,
this problem is known as result merging task. Quality of the final top-k
list depends heavily on scoring function on peers and merging algorithm,
whereas speed is mostly depends on local index processing scheme.

To address issue of quality we experimented with different known
scoring functions in Minerva system. Also new preference-based
language modeling scoring scheme was proposed.
We also considered index processing problem and described modified
Fagin's threshold algorithm with communication between peers.
New algorithm accelerates index processing on some of selected
peers using additional information about index processing on other peers.

Contact

Kerstin Kathy Meyer-Ross
226
--email hidden
passcode not visible
logged in users only

Kerstin Meyer-Ross, 11/16/2004 16:07
Kerstin Meyer-Ross, 11/16/2004 16:06 -- Created document.