Automatic document classification and clustering are useful for a wide
range of applications such as
organizing Web, intranet, or portal pages into topic directories, filtering
news feeds or mail,
focused crawling on the Web or in intranets, and many more.
This talk presents restrictive methods and ensemble-based meta methods for
supervised learning
(i.e. classification based on a small amount of hand-annotated training
documents). In addition,
we show how these techniques can be carried forward to clustering based on
unsupervised learning
(i.e. automatic structuring of document corpora without training data).