he task of Named Entity Disambiguation (NED), which maps mentions of
ambiguous names in natural language onto a set of known entities, has been
an important issue in many areas including machine translation and
information extraction. Working with a huge amount of data (e.g. more than
three million entities in Yago), some parts in an NED system which
estimate the probability of a mention matching an entity, the similarity
between a mention and an entity and the coherence among entity candidates
for all mentions together might become bottlenecks. Thus, it is
challenging for an interactive NED system to reach not only high accuracy
but also efficiency.
This talk presents an efficient way of disambiguating named entities by
similarity hashing. Our framework is integrated with AIDA which is an
on-line tool for entity detection and disambiguation developed at
Max-Planck Institute for Informatics. We apply various state-of-the-art
approaches, for example Locality Sensitive Hashing (LSH) and Spectral
Hashing, to some forms of similarity search problem such as near-duplicate
search for mention-entity matching, and especially related pair detection
for entity-entity mapping which is not the default application of using
hashing techniques due to the usually low similarities between entities.