More precisely, we make the following contributions:
* We present a comprehensive comparative analysis of proximity score models and a rigorous analysis of the potential of phrases and adapt a leading proximity score model for XML data
* We discuss the feasibility of all presented proximity score models for top-k query processing and present a novel index combining a content and proximity score that helps to accelerate top-k query processing and improves result quality.
* We present a novel, distributed index tuning framework for term and term pair index lists that optimizes pruning parameters by means of well-defined optimization criteria under disk space constraints. Indexes can be tuned with emphasis on efficiency or effectiveness: the resulting indexes yield fast processing at high result quality.
* We show that pruned index lists processed with a merge join outperform top-k query processing with unpruned lists at a high result quality.
* Moreover, we present a hybrid index structure for improved cold cache run times.