Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society


Goal-oriented methods and meta methods for document classification and their parameter tuning

Siersdorfer, Stefan and Sizov, Sergej and Weikum, Gerhard

MPI-I-2004-5-001. 2004, 32 pages. | Status: available - back from printing | Next --> Entry | Previous <-- Entry

Abstract in LaTeX format:
Automatic text classification methods come with various
calibration parameters such as thresholds for probabilities in
Bayesian classifiers or for hyperplane distances in SVM
classifiers. In a given application context these parameters
should be set so as to meet the relative importance of various
result quality metrics such as precision versus recall. In this
work we consider classifiers that can accept a document for a
topic, reject it, or abstain. We aim to meet the application's
goals in terms of accuracy (i.e., avoid false acceptances or
rejections) and loss (i.e., limit the fraction of documents for which no
decision is
To this end we investigate restrictive forms
of SVM classifiers and we develop meta
methods that split the training data into subsets for
independently trained classifiers and then combine the results of
these classifiers. These techniques tend to improve accuracy at
the expense of document loss. We develop estimators that help to
predict the accuracy and loss for a given setting of the methods'
tuning parameters, and a methodology for efficiently deriving
a setting that meets the application's goals. Our experiments
confirm the practical viability of the approach.
References to related material:

To download this research report, please select the type of document that fits best your needs.Attachement Size(s):
MPI-I-2004-5-001.ps29924 KBytes
Please note: If you don't have a viewer for PostScript on your platform, try to install GhostScript and GhostView
URL to this document:
Hide details for BibTeXBibTeX
  AUTHOR = {Siersdorfer, Stefan and Sizov, Sergej and Weikum, Gerhard},
  TITLE = {Goal-oriented methods and meta methods for document classification and their parameter tuning},
  TYPE = {Research Report},
  INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
  ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany},
  NUMBER = {MPI-I-2004-5-001},
  YEAR = {2004},
  ISSN = {0946-011X},