Campus Event Calendar: Anna Rohrbach (05/15/2017 in E1 4/024)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Generating and Grounding of Natural Language Descriptions for Visual Data

Anna Rohrbach

Max-Planck-Institut für Informatik - D2

Promotionskolloquium

AG 1, AG 2, AG 3, AG 4, AG 5, RG1, SWS, MMCI

Public Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Monday, 15 May 2017

16:00

60 Minutes

E1 4

024

Saarbrücken

Abstract

Generating natural language descriptions for visual data links computer vision and computational linguistics. Being able to generate a concise and human-readable description of a video is a step towards visual understanding. At the same time, grounding natural language in visual data provides disambiguation for the linguistic concepts, necessary for many applications. This thesis focuses on both directions and tackles three specific problems. First, we develop recognition approaches to understand video of complex cooking activities. We propose an approach to generate coherent multi-sentence descriptions for our videos. Furthermore, we tackle the new task of describing videos at variable level of detail. Second, we present a large-scale dataset of movies and aligned professional descriptions. We propose an approach, which learns from videos and sentences to describe movie clips relying on robust recognition of visual semantic concepts. Third, we propose an approach to ground textual phrases in images with little or no localization supervision, which we further improve by introducing Multimodal Compact Bilinear Pooling for combining language and vision representations. Finally, we jointly address the task of describing videos and grounding the described people. To summarize, this thesis advances the state-of-the-art in automatic video description and visual grounding and also contributes large datasets for studying the intersection of computer vision and

computational linguistics.

Contact

Connie Balzert

9325-2000

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Connie Balzert, 05/04/2017 15:12
Connie Balzert, 05/04/2017 15:11 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis