The main contributions are:
(1) Statistical generalization error bounds that scale *logarithmically* in the number of classes (previously best scaling: linear), allowing us to learn even when
#classes > #instances.
(2) Optimization algorithms that distribute the training of all-in-one multi-class SVMs over the classes, which makes them appealing for the use in extreme classification.