Much of the research in supervised machine learning has focused on
learning scenarios in which the response variable to be predicted is
either a simple real-valued output or a – often binary – class
variable. In contrast, structured classification deals with a family of
problems where the goal is to predict outputs possessing some
non-trivial internal structure. This includes prediction problems
involving strings, trees, or graphs as well as collective
classification problems, i.e. problems where multiple correlated
outputs have to be jointly predicted. To address the challenges of
structured classification, we propose a general framework that combines
the effectiveness of discriminative methods such as Support Vector
machines with the elegance and benefits of probabilistic graphical
models for representing structural dependencies. In particular, we have
developed a generic sparse approximation algorithm – called SVMstruct –
that is based on an efficient variable selection strategy and that
depends on a specific application only through the choice of an
appropriate (black-box) variable selection mechanism. Whereas a naïve
formulation leads to an exponential blow-up of the problem size,
SVMstruct requires solving a sequence of Quadratic Programs in which
the number of variables grows only quadratically in some tolerance
parameter. Our framework significantly extends the standard
classification setting, leading to learning algorithms that have
numerous interesting applications, ranging from information retrieval,
natural language processing and speech recognition to computer vision
and bioinformatics.