MPI-INF Logo
MPI-INF/SWS Research Reports 1991-2021

2. Number - only D5

MPI-I-2006-5-005

A neighborhood-based approach for clustering of linked document collections

Angelova, Ralitsa and Siersdorfer, Stefan

September 2006, 32 pages.

.
Status: available - back from printing

This technical report addresses the problem of automatically structuring linked document collections by using clustering. In contrast to traditional clustering, we study the clustering problem in the light of available link structure information for the data set (e.g., hyperlinks among web documents or co-authorship among bibliographic data entries). Our approach is based on iterative relaxation of cluster assignments, and can be built on top of any clustering algorithm (e.g., k-means or DBSCAN). These techniques result in higher cluster purity, better overall accuracy, and make self-organization more robust. Our comprehensive experiments on three different real-world corpora demonstrate the benefits of our approach.

  • MPI-I-2006-5-005.pdf
  • Attachement: MPI-I-2006-5-005.pdf (548 KBytes)

URL to this document: https://domino.mpi-inf.mpg.de/internet/reports.nsf/NumberView/2006-5-005

Hide details for BibTeXBibTeX
@TECHREPORT{AngelovaSiersdorfer2006,
  AUTHOR = {Angelova, Ralitsa and Siersdorfer, Stefan},
  TITLE = {A neighborhood-based approach for clustering of linked document collections},
  TYPE = {Research Report},
  INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
  ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany},
  NUMBER = {MPI-I-2006-5-005},
  MONTH = {September},
  YEAR = {2006},
  ISSN = {0946-011X},
}