MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

Xia Hu
Beihang University, China
PhD Application Talk
AG 1, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Monday, 19 October 2009
09:00
60 Minutes
E1 4
024
Saarbrücken

Abstract

Clustering of short texts, such as snippets, presents great challenges in existing aggregated search techniques due to the problem of data sparseness and the complex semantics of natural language. As short texts do not provide sufficient term co-occurrence information, traditional text representation methods, such as "bag of words" model, have several limitations when directly applied to short text tasks. In this paper, we propose a novel framework to improve the performance of short text clustering by exploiting the internal semantics from the original text and external concepts from world knowledge. The proposed method employs a hierarchical three-level structure to tackle the data sparsity problem of original short texts and reconstruct the corresponding feature space with the integration of multiple semantic knowledge bases -- Wikipedia and WordNet. Empirical evaluation with Reuters and real web dataset demonstrates that our approach is able to achieve significant improvement as compared to the state-of-the-art methods.

Contact

--email hidden
passcode not visible
logged in users only

Jennifer Gerling, 10/07/2009 17:07
Heike Przybyl, 10/07/2009 16:32 -- Created document.