concepts after observing only few or zero examples of them. Deep
learning, however, often requires a large amount of labeled data to
achieve a good performance. Labeled instances are expensive, difficult
and even infeasible to obtain because the distribution of training
instances among labels naturally exhibits a long tail. Therefore, it is
of great interest to investigate how to learn efficiently from limited
labeled data. This thesis concerns an important subfield of learning
from limited labeled data, namely, low-shot learning. The setting
assumes the availability of many labeled examples from known classes and
the goal is to learn novel classes from only a few~(few-shot learning)
or ze-ro~(zero-shot
learning) training examples of them. To this end, we have developed a
series of multi-modal learning approaches to facilitate the knowledge
transfer from known classes to novel classes for a wide range of visual
recognition tasks including image classifica-tion, semantic
image segmentation and video action recognition. More specifically, this
thesis mainly makes the following contributions. First, as there is no
agreed upon zero-shot image classification benchmark, we define a new
benchmark by unifying both the evaluation protocols and data
splits of publicly available datasets. Second, in order to tackle the
labeled data scarcity, we propose feature generation frameworks that
synthesize data in the visual feature space for novel classes. Third, we
extend zero-shot learning and few-shot learning to the semantic
segmentation task and propose a challenging benchmark for it. We show
that incorporating semantic information into a semantic segmentation
network is effective in segmenting novel classes. Finally, we develop
better video representation for the few-shot video classification task
and leverage weakly-labeled videos by an efficient retrieval method.