MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Entity Linking with Document Retrieval and Vice Versa

Laura Dietz
UMass
Talk

http://ciir.cs.umass.edu/~dietz/
Laura Dietz is a post-doctoral researcher / research scientist working
with Bruce Croft at the Center for Intelligent Information Retrieval
(CIIR) at the University of Massachusetts. Before that she was working
with Andrew McCallum. She obtained her doctoral degree with a thesis on
topic models for networked data from Max Planck Institute for Informatik
in early 2011, being supervised by Tobias Scheffer and Gerhard Weikum.
AG 5, MMCI  
AG Audience
English

Date, Time and Location

Monday, 5 May 2014
11:00
60 Minutes
E1 4
433
Saarbrücken

Abstract

I will discuss the connection between two seemingly unrelated tasks:
Entity linking and ad hoc document retrieval which turn out to be
mutually benefiting each other. Entity Linking is the task of
disambiguating entity mentions to a knowledge base -- a task typically
discussed in NLP literature. In contrast, ad hoc document retrieval is
the task of ranking documents by relevance to a given keyword query,
which is a core task in IR and search engines.

In the first half of the talk I will focus on how to solve the entity
linking task. A famous approach is a joint inference approach which is
usually carried on a heuristically selected subset of the knowledge
base. In work with Jeff Dalton, I could show how an approximation to the
joint inference approach can be implemented with IR to optimize over the
whole knowledge base at once -- no heuristics necessary!

In the second half I will explore the inverse direction and show that
entity links in the corpus can be effectively leveraged to solve the
document retrieval problem. This is harder than generally believed in
NLP, simply using the disambiguated entities is not sufficient. We use a
combination of different approaches that are based on pseudo relevance
feedback to inspect first pass rankings from the text corpus and the
knowledge base. From these feedback sets we derive a relevance
indicating distribution over entities, types, categories, and keyword
terms that allow us to significantly improve on two IR benchmark
collections.

Contact

Petra Schaaf
5000
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 04/30/2014 10:32 -- Created document.