Campus Event Calendar

Event Entry

New for: D1, D2, D3, D4, D5

What and Who

Exploiting FrameNet Annotations for Automatic Text Classification

Georgiana Ifrim
IMPRS Masters' Lunch
AG 1, AG 2, AG 3, AG 4, AG 5  
AG Audience

Date, Time and Location

Tuesday, 15 June 2004
60 Minutes
46.1 - MPII


For both classification and retrieval of natural language text documents, the standard document representation is a term vector where

a term is simply a morphological normal form of the corresponding word. A potentially better approach would be to map every word
onto a concept, the proper word sense, based on the word's context in the document and an ontological knowledge base with concept
descriptions and semantic relationships among concepts.

The key problem to be solved in this approach is the disambiguation of polysems, words that have multiple meanings.
To this end, several approaches can be pursued at different levels of modeling and computational complexity.
The simplest one is constructing feature vectors for both the word context and the potential target concepts, and using vector
similarity measures to select the most suitable concept. A more refined approach would be to use supervised or semisupervised
learning techniques, based on hand-annotated training data. Even more ambitiously, linguistic techniques could be used to extract
a more richly annotated word context, e.g. identifying the corresponding verb or even its FrameNet class for a noun that is to be mapped
onto the ontology.

The goal of this thesis is to develop a practically viable method for how to exploit linguistic features for the disambiguation
and mapping of words onto concepts, and to systematically study its performance in comparison with other approaches.


Kerstin Meyer-Ross
0681 - 9325 226
--email hidden
passcode not visible
logged in users only

Christine Kiesel, 06/22/2004 10:12
Christine Kiesel, 06/14/2004 10:57
Christine Kiesel, 06/02/2004 11:12
Christine Kiesel, 05/26/2004 10:09
Christine Kiesel, 05/10/2004 12:50
Christine Kiesel, 05/10/2004 12:48 -- Created document.