MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D1, D2, D3, D4, D5

What and Who

Automatic Population of Knowledge Bases with Multimodal Data about Named Entities

Bilyana Taneva
Max-Planck-Institut für Informatik - D5
Promotionskolloquium
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Monday, 12 August 2013
14:00
60 Minutes
E1 4
024
Saarbrücken

Abstract

Knowledge bases are of great importance for Web search, recommendations, and many Information Retrieval tasks. However, maintaining them for not so popular entities is often a bottleneck. Typically, such entities have limited textual coverage and only a few ontological facts. Moreover, these entities are not well populated with multimodal data, such as images, videos, or audio recordings.


The goals in this thesis are (1) to populate a given knowledge base with multimodal data about entities, such as images or audio recordings, and (2) to ease the task of maintaining and expanding the textual knowledge about a given entity, by recommending valuable text excerpts to the contributors of knowledge bases.

The thesis makes three main contributions. The first two contributions concentrate on finding images of named entities with high precision, high recall, and high visual diversity. Our main focus are less popular entities, for which the image search engines fail to retrieve good results. Our methods utilize background knowledge about the entity, such as ontological facts or a short description, and a visual-based image similarity to rank and diversify a set of candidate images.

Our third contribution is an approach for extracting text contents related to a given entity. It leverages a language-model-based similarity between a short description of the entity and the text sources, and solves a budget-constraint optimization program without any assumptions on the text structure. Moreover, our approach is also able to reliably extract entity related audio excerpts from news podcasts. We derive the time boundaries from the usually very noisy audio transcriptions.

Contact

5000
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 08/28/2013 09:50 -- Created document.