The thesis begins with investigating the different statistical assumptions of term distribution, where both fitness and stability of the parameters in distributions are tested. We compare the fitness among different distributions to examine which assumption best describes the actual term frequency distribution. Moreover, we test the stability of the parameters when fitting distributions on different datasets, which might imply the robustness of our proposed models.
Based on the findings in this investigation, a variety of weighting models, called NG (stands for no global statistics) models, are derived from the Divergence from Randomness framework, in which only the within-document statistics are used in the relevance weighting. We further simplify our proposed models to achieve better performance and robustness. Compared to the state-of-the-art weighting models in extensive experiments on various standard TREC test collections, our proposed NG models can provide acceptable retrieval performance in ad-hoc search, without the use of global statistics