MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Leveraging Vision-Language Models for Efficient Task-Specific Knowledge Distillation

Arda Baris Basaran
École polytechnique fédérale de Lausanne (EPFL)
PhD Application Talk
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI  
AG Audience
English

Date, Time and Location

Wednesday, 29 January 2025
12:30
30 Minutes
Virtual talk
zoom

Abstract

Recent advancements in large vision-language models (VLMs), such as CLIP, have revolutionized the field with their ability to generalize across diverse visual tasks. However, their computational overhead and rigidity pose challenges for real-world deployment in resource constrained environments requiring real time processing. This research introduces a novel model distillation framework that addresses these challenges by combining category expansion with learned image augmentation to transfer the capabilities of large scale VLMs into compact student models, all without relying on human labeled data. By harnessing the text encoder of VLMs, our method broadens the scope of task-relevant categories, enabling the student model to represent a richer set of visual concepts. Simultaneously, we employ a policy for learning image augmentations that align with these expanded categories. The similarities between augmented images and expanded categories are then distilled into the student model through a trainable projection head. Extensive evaluation on small-scale datasets demonstrates that our approach achieves competitive or superior performance compared to existing distillation and self-supervised techniques. This presentation will focus on the methodology behind this label-free distillation framework, emphasizing how leveraging linguistic guidance from VLMs improves the efficiency and effectiveness of knowledge transfer to student models in low-data scenarios.

Contact

Ina Geisler
+49 681 9325 1802
--email hidden

Virtual Meeting Details

Zoom
passcode not visible
logged in users only

Ina Geisler, 01/27/2025 09:26
Ina Geisler, 01/24/2025 12:23 -- Created document.