Tremendous amount of information in Internet determined
importance of search engines as most widely used tools.
Currently, only commercial and centralized search engines
like Google can process terabytes of web documents.
Even now this approach fails in indexing the "hidden web"
located in intranets and local databases.
The scalability, self-organization and fault tolerance are
important properties of popular Peer-to-Peer systems which we want to exploit.
Minerva project is a collaboration of web search engines based
on Peer-to-Peer architecture.
Search engines on several selected peers process their inverted
indexes with Fagin's threshold algorithm to obtain top-k highly
ranked documents for current query. Best top-k results from these
peers are collected by query initiator and merged into one top-k list,
this problem is known as result merging task. Quality of the final top-k
list depends heavily on scoring function on peers and merging algorithm,
whereas speed is mostly depends on local index processing scheme.
To address issue of quality we experimented with different known
scoring functions in Minerva system. Also new preference-based
language modeling scoring scheme was proposed.
We also considered index processing problem and described modified
Fagin's threshold algorithm with communication between peers.
New algorithm accelerates index processing on some of selected
peers using additional information about index processing on other peers.