segmentation. The training images are labeled only by the
classes they contain, not by their location in the image. On
test images instead, the method must predict a class label
for every pixel. Our goal is to enable segmentation algorithms
to use multiple visual cues in this weakly supervised
setting, analogous to what is achieved by fully supervised
methods. However, it is difficult to assess the relative usefulness
of different visual cues from weakly supervised training
data. We define a parametric family of structured models,
where each model weighs visual cues in a different way. We
propose a Maximum Expected Agreement model selection
principle that evaluates the quality of a model from the family
without looking at superpixel labels. Searching for the
best model is a hard optimization problem, which has no
analytic gradient and multiple local optima. We cast it as
a Bayesian optimization problem and propose an algorithm
based on Gaussian processes to efficiently solve it. Our second
contribution is an Extremely Randomized Hashing Forest
that represents diverse superpixel features as a sparse
binary vector. It enables using appearance models of visual
classes that are fast at training and testing and yet accurate.