MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Index-based Snippet Generation

Gabriel Manolache
International Max Planck Research School for Computer Science
Talk
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, RG2  
MPI Audience
English

Date, Time and Location

Tuesday, 8 July 2008
10:00
120 Minutes
E1 1 - Informatik
407
Saarbrücken

Abstract

Ranked result lists with query-dependent snippets have become state of

the art in text search. They are typically implemented by searching,
at query time, for occurrences of the query words in the top-ranked
documents. This document-based approach has three inherent problems:
(i) when a document is indexed by terms which it does not contain
literally (e.g., related words or spelling variants), localization of
the corresponding snippets becomes problematic; (ii) each query
operator (e.g., phrase or proximity search) has to be implemented
twice, on the index side in order to compute the correct result set,
and on the snippet generation side to generate the appropriate
snippets; and (iii) in a worst case, the whole document needs to be
scanned for occurrences of the query words, which is problematic for
very long documents.
We present an alternative index-based approach that localizes snippets
by information
solely computed from the index, and that overcomes all three problems.
We show how to achieve this at essentially no extra cost in query
processing time, by a technique we call query rotation. We also show
how the index-based approach allows the caching of individual segments
instead of complete documents, which enables a signifcantly larger
cache hit ratio as compared to the document-based approach. We have
fully integrated our implementation with the CompleteSearch engine.

Contact

imprs
225
--email hidden
passcode not visible
logged in users only

Jennifer Gerling, 07/02/2008 11:25
Jennifer Gerling, 07/01/2008 15:34 -- Created document.