Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society


A neighborhood-based approach for clustering of linked document collections

Angelova, Ralitsa and Siersdorfer, Stefan

MPI-I-2006-5-005. September 2006, 32 pages. | Status: available - back from printing | Next --> Entry | Previous <-- Entry

Abstract in LaTeX format:

This technical report addresses the problem of automatically structuring
linked document collections by using clustering. In contrast to
traditional clustering, we study the clustering problem in the light of
available link structure information for the data set
(e.g., hyperlinks among web documents or co-authorship among
bibliographic data entries).
Our approach is based on iterative relaxation of cluster assignments,
and can be built on top of any clustering algorithm (e.g., k-means or
DBSCAN). These techniques result in higher cluster purity, better
overall accuracy, and make self-organization more robust. Our
comprehensive experiments on three different real-world corpora
demonstrate the benefits of our approach.

References to related material:

To download this research report, please select the type of document that fits best your needs.Attachement Size(s):
MPI-I-2006-5-005.pdf548 KBytes
Please note: If you don't have a viewer for PostScript on your platform, try to install GhostScript and GhostView
URL to this document:
Hide details for BibTeXBibTeX
  AUTHOR = {Angelova, Ralitsa and Siersdorfer, Stefan},
  TITLE = {A neighborhood-based approach for clustering of linked document collections},
  TYPE = {Research Report},
  INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
  ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany},
  NUMBER = {MPI-I-2006-5-005},
  MONTH = {September},
  YEAR = {2006},
  ISSN = {0946-011X},