the art in text search. They are typically implemented by searching,
at query time, for occurrences of the query words in the top-ranked
documents. This document-based approach has three inherent problems:
(i) when a document is indexed by terms which it does not contain
literally (e.g., related words or spelling variants), localization of
the corresponding snippets becomes problematic; (ii) each query
operator (e.g., phrase or proximity search) has to be implemented
twice, on the index side in order to compute the correct result set,
and on the snippet generation side to generate the appropriate
snippets; and (iii) in a worst case, the whole document needs to be
scanned for occurrences of the query words, which is problematic for
very long documents.
We present an alternative index-based approach that localizes snippets
by information
solely computed from the index, and that overcomes all three problems.
We show how to achieve this at essentially no extra cost in query
processing time, by a technique we call query rotation. We also show
how the index-based approach allows the caching of individual segments
instead of complete documents, which enables a signifcantly larger
cache hit ratio as compared to the document-based approach. We have
fully integrated our implementation with the CompleteSearch engine.