Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Entity Linking with Document Retrieval and Vice Versa
Speaker:Laura Dietz
coming from:UMass
Speakers Bio:

Laura Dietz is a post-doctoral researcher / research scientist working
with Bruce Croft at the Center for Intelligent Information Retrieval
(CIIR) at the University of Massachusetts. Before that she was working
with Andrew McCallum. She obtained her doctoral degree with a thesis on
topic models for networked data from Max Planck Institute for Informatik
in early 2011, being supervised by Tobias Scheffer and Gerhard Weikum.

Event Type:Talk
Visibility:D5, MMCI
We use this to send out email in the morning.
Level:AG Audience
Date, Time and Location
Date:Monday, 5 May 2014
Duration:60 Minutes
Building:E1 4
I will discuss the connection between two seemingly unrelated tasks:
Entity linking and ad hoc document retrieval which turn out to be
mutually benefiting each other. Entity Linking is the task of
disambiguating entity mentions to a knowledge base -- a task typically
discussed in NLP literature. In contrast, ad hoc document retrieval is
the task of ranking documents by relevance to a given keyword query,
which is a core task in IR and search engines.

In the first half of the talk I will focus on how to solve the entity
linking task. A famous approach is a joint inference approach which is
usually carried on a heuristically selected subset of the knowledge
base. In work with Jeff Dalton, I could show how an approximation to the
joint inference approach can be implemented with IR to optimize over the
whole knowledge base at once -- no heuristics necessary!

In the second half I will explore the inverse direction and show that
entity links in the corpus can be effectively leveraged to solve the
document retrieval problem. This is harder than generally believed in
NLP, simply using the disambiguated entities is not sufficient. We use a
combination of different approaches that are based on pseudo relevance
feedback to inspect first pass rankings from the text corpus and the
knowledge base. From these feedback sets we derive a relevance
indicating distribution over entities, types, categories, and keyword
terms that allow us to significantly improve on two IR benchmark
Name(s):Petra Schaaf
EMail:--email address not disclosed on the web
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Attachments, File(s):
  • Petra Schaaf, 04/30/2014 10:32 AM -- Created document.