MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

*Remote Talk* Learning efficient representations for image and video understanding

Yannis Kalantidis
Facebook AI
SWS Colloquium

Yannis Kalantidis was a research scientist at Facebook AI in California for the last three years. He got his PhD on large-scale visual search and clustering from the National Technical University of Athens in 2014. He was a postdoc and research scientist at Yahoo Research in San Francisco for from 2015 until 2017, leading the visual similarity search project at Flickr and participated in the Visual Genome dataset efforts with Stanford. At Facebook Research he was part of the video understanding group, conducting research on representation learning, video understanding and modeling of vision and language. He is further leading the Computer Vision for Global Challenges Initiative (cv4gc.org) that has organized impactful workshops at top venues like CVPR and ICLR. Personal website:  https://www.skamalas.com/
AG 1, AG 2, AG 3, INET, AG 4, AG 5, SWS, RG1, MMCI  
AG Audience
English

Date, Time and Location

Wednesday, 18 March 2020
10:00
60 Minutes
E1 5
029
Saarbrücken

Abstract

Two important challenges in image and video understanding are designing more efficient deep Convolutional Neural Networks and learning models that are able to achieve higher-level understanding. In this talk, I will present some of my recent works towards tackling these challenges. Specifically, I will introduce the Octave Convolution [ICCV 2019], a plug-and-play replacement for the convolution operator that exploits the spatial redundancy of CNN activations and can be used without any adjustments to the network architecture. I will also present the Global Reasoning Networks [CVPR 2019], a new approach for reasoning over arbitrary sets of features of the input, by projecting them from a coordinate space into an interaction space where relational reasoning can be efficiently computed.  The two methods presented are complementary and achieve state-of-the-art performance on both image and video tasks. Aiming for higher-level understanding, I will also present our recent works on vision and language modeling, specifically our work on learning state-of-the-art image and video captioning models that are also able to better visually ground the generated sentences with [CVPR 2019] or without [arXiv 2019] explicit localization supervision. The talk will conclude with current research and a brief vision for the future.

Contact

Annika Meiser
93039105
--email hidden

Video Broadcast

Yes
Kaiserslautern
G26
111
SWS Space 2 (6312)
passcode not visible
logged in users only

Annika Meiser, 03/13/2020 09:30
Annika Meiser, 03/10/2020 11:37 -- Created document.