Campus Event Calendar

Event Entry

What and Who

Statistical Learning Techniques for Text Categorization with Sparse Labeled Data

Georgiana Ifrim
Max-Planck-Institut für Informatik - D5
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience

Date, Time and Location

Friday, 27 February 2009
60 Minutes
E1 1 - Informatik


Many applications involve learning a supervised classifier from very
few explicitly labeled training
examples, as the cost of manually labeling the training data is often
prohibitively high. For
instance, we expect a good classifier to learn our interests from a
few example books or movies
we like, and recommend similar ones in the future.
In this talk we present two approaches for overcoming the bottleneck
of sparse labeled data.
We first discuss a new probabilistic model for text documents designed
to facilitate
the integration of background knowledge (e.g., unlabeled documents,
ontologies of concepts,
encyclopedia) into the process of learning from small training data.
Second, we present a new coordinate-wise gradient ascent technique for
learning logistic regression
in the space of all (word or character) sequences in the training data.

Our experimental study shows the advantage of using these techniques
as compared
to other state-of-the-art approaches.


Petra Schaaf
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 02/11/2009 10:53
Petra Schaaf, 02/09/2009 12:36 -- Created document.