Campus Event Calendar: Kai Hui (02/25/2013 in E1 4/R024)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Relevance Weighting in Information Retrieval using Within-document Term Statistics

Kai Hui

University of Chinese Academy of Sciences – China

PhD Application Talk

AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI

Public Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Monday, 25 February 2013

11:00

90 Minutes

E1 4

R024

Saarbrücken

Abstract

Global statistics of corpus play an important role in probabilistic weighting models in information retrieval. Though achieved robust retrieval effectiveness as shown in the previous works, with the rapid development of the information technology, it is difficult or even infeasible to obtain or maintain the global statistics in deploying the state-of-the-art probabilistic models, such as BM25, KL-divergence language model and PL2 from Divergence from Randomness, in environments such as peer-to-peer networks and pervasive computing. The novel models in our thesis, created within the Divergence from Randomness framework, do not include global statistics, and are based on investigation in the validity of different statistical assumptions of term distributions.

The thesis begins with investigating the different statistical assumptions of term distribution, where both fitness and stability of the parameters in distributions are tested. We compare the fitness among different distributions to examine which assumption best describes the actual term frequency distribution. Moreover, we test the stability of the parameters when fitting distributions on different datasets, which might imply the robustness of our proposed models.
Based on the findings in this investigation, a variety of weighting models, called NG (stands for no global statistics) models, are derived from the Divergence from Randomness framework, in which only the within-document statistics are used in the relevance weighting. We further simplify our proposed models to achieve better performance and robustness. Compared to the state-of-the-art weighting models in extensive experiments on various standard TREC test collections, our proposed NG models can provide acceptable retrieval performance in ad-hoc search, without the use of global statistics

Contact

IMPRS Office Team

0681 93251800

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Tags, Category, Keywords and additional notes

Attachments, File(s):

Stephanie Jörg, 02/22/2013 12:17
Stephanie Jörg, 02/22/2013 12:13 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis