MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Unsupervised Query Segmentation: Algorithms and Evaluation

Dr. Rishiraj Saha Roy
Adobe Research Lab India and formerly IIT Kharagpur
AG5 Talk
AG 5  
AG Audience
English

Date, Time and Location

Monday, 22 June 2015
11:00
60 Minutes
E1 4
533
Saarbrücken

Abstract

Query segmentation is one of the first steps towards query understanding where complex queries are partitioned into semantically coherent word sequences. In this talk, we will first discuss our proposed query segmentation algorithm which uses query logs as the only input resource. Next, we will show how we can enhance the segmentation using Wikipedia titles and part-of-speech sequence information. In the past, segmentation strategies were mainly validated against manual annotations. We will present the first evaluation framework for Web search query segmentation based directly on IR performance. Our work shows that the goodness of a segmentation algorithm as judged through evaluation against a handful of human annotated segmentations hardly reflects its effectiveness in an IR-based setup. We then highlight the challenges of granularity in traditional segmentation, and consequent difficulties in IR application. Subsequently, we explore nested or hierarchical query segmentation, where segments are defined recursively as consisting of contiguous sequences of segments or query words, as an effective and powerful alternative representation of a query. We design a lightweight and unsupervised nested segmentation scheme, and propose how to use the tree arising out of the nested representation of a query to improve IR performance. We examine several aspects of the IR application framework and show that nested segmentation can be suitably exploited for the re-ranking of documents leading to significant gains over baselines that include the state-of-the-art "flat" segmentation strategies.

Contact

Petra Schaaf
5000
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 06/18/2015 10:44 -- Created document.