Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Generating and Grounding of Natural Language Descriptions for Visual Data
Speaker:Anna Rohrbach
coming from:Max-Planck-Institut für Informatik - D2
Speakers Bio:
Event Type:Promotionskolloquium
Visibility:D1, D2, D3, D4, D5, RG1, SWS, MMCI
We use this to send out email in the morning.
Level:Public Audience
Date, Time and Location
Date:Monday, 15 May 2017
Duration:60 Minutes
Building:E1 4
Generating natural language descriptions for visual data links computer vision and computational linguistics. Being able to generate a concise and human-readable description of a video is a step towards visual understanding. At the same time, grounding natural language in visual data provides disambiguation for the linguistic concepts, necessary for many applications. This thesis focuses on both directions and tackles three specific problems. First, we develop recognition approaches to understand video of complex cooking activities. We propose an approach to generate coherent multi-sentence descriptions for our videos. Furthermore, we tackle the new task of describing videos at variable level of detail. Second, we present a large-scale dataset of movies and aligned professional descriptions. We propose an approach, which learns from videos and sentences to describe movie clips relying on robust recognition of visual semantic concepts. Third, we propose an approach to ground textual phrases in images with little or no localization supervision, which we further improve by introducing Multimodal Compact Bilinear Pooling for combining language and vision representations. Finally, we jointly address the task of describing videos and grounding the described people. To summarize, this thesis advances the state-of-the-art in automatic video description and visual grounding and also contributes large datasets for studying the intersection of computer vision and

computational linguistics.

Name(s):Connie Balzert
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Attachments, File(s):

Connie Balzert/MPI-INF, 05/04/2017 03:11 PM
Last modified:
Uwe Brahm/MPII/DE, 05/15/2017 07:01 AM
  • Connie Balzert, 05/04/2017 03:12 PM
  • Connie Balzert, 05/04/2017 03:11 PM -- Created document.