MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Relevance Weighting in Information Retrieval using Within-document Term Statistics

Kai Hui
University of Chinese Academy of Sciences – China
PhD Application Talk
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Monday, 25 February 2013
11:00
90 Minutes
E1 4
R024
Saarbrücken

Abstract

Global statistics of corpus play an important role in probabilistic weighting models in information retrieval. Though achieved robust retrieval effectiveness as shown in the previous works, with the rapid development of the information technology, it is difficult or even infeasible to obtain or maintain the global statistics in deploying the state-of-the-art probabilistic models, such as BM25, KL-divergence language model and PL2 from Divergence from Randomness, in environments such as peer-to-peer networks and pervasive computing. The novel models in our thesis, created within the Divergence from Randomness framework, do not include global statistics, and are based on investigation in the validity of different statistical assumptions of term distributions.

The thesis begins with investigating the different statistical assumptions of term distribution, where both fitness and stability of the parameters in distributions are tested. We compare the fitness among different distributions to examine which assumption best describes the actual term frequency distribution. Moreover, we test the stability of the parameters when fitting distributions on different datasets, which might imply the robustness of our proposed models.
Based on the findings in this investigation, a variety of weighting models, called NG (stands for no global statistics) models, are derived from the Divergence from Randomness framework, in which only the within-document statistics are used in the relevance weighting. We further simplify our proposed models to achieve better performance and robustness. Compared to the state-of-the-art weighting models in extensive experiments on various standard TREC test collections, our proposed NG models can provide acceptable retrieval performance in ad-hoc search, without the use of global statistics

Contact

IMPRS Office Team
0681 93251800
--email hidden
passcode not visible
logged in users only

Tags, Category, Keywords and additional notes

Stephanie Jörg, 02/22/2013 12:17
Stephanie Jörg, 02/22/2013 12:13 -- Created document.