Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Unsupervised Query Segmentation: Algorithms and Evaluation
Speaker:Dr. Rishiraj Saha Roy
coming from:Adobe Research Lab India and formerly IIT Kharagpur
Speakers Bio:
Event Type:AG5 Talk
We use this to send out email in the morning.
Level:AG Audience
Date, Time and Location
Date:Monday, 22 June 2015
Duration:60 Minutes
Building:E1 4
Query segmentation is one of the first steps towards query understanding where complex queries are partitioned into semantically coherent word sequences. In this talk, we will first discuss our proposed query segmentation algorithm which uses query logs as the only input resource. Next, we will show how we can enhance the segmentation using Wikipedia titles and part-of-speech sequence information. In the past, segmentation strategies were mainly validated against manual annotations. We will present the first evaluation framework for Web search query segmentation based directly on IR performance. Our work shows that the goodness of a segmentation algorithm as judged through evaluation against a handful of human annotated segmentations hardly reflects its effectiveness in an IR-based setup. We then highlight the challenges of granularity in traditional segmentation, and consequent difficulties in IR application. Subsequently, we explore nested or hierarchical query segmentation, where segments are defined recursively as consisting of contiguous sequences of segments or query words, as an effective and powerful alternative representation of a query. We design a lightweight and unsupervised nested segmentation scheme, and propose how to use the tree arising out of the nested representation of a query to improve IR performance. We examine several aspects of the IR application framework and show that nested segmentation can be suitably exploited for the re-ranking of documents leading to significant gains over baselines that include the state-of-the-art "flat" segmentation strategies.
Name(s):Petra Schaaf
EMail:--email address not disclosed on the web
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Attachments, File(s):
  • Petra Schaaf, 06/18/2015 10:44 AM -- Created document.