New for: D1, D2, D3, D4, D5
a term is simply a morphological normal form of the corresponding word. A potentially better approach would be to map every word
onto a concept, the proper word sense, based on the word's context in the document and an ontological knowledge base with concept
descriptions and semantic relationships among concepts.
The key problem to be solved in this approach is the disambiguation of polysems, words that have multiple meanings.
To this end, several approaches can be pursued at different levels of modeling and computational complexity.
The simplest one is constructing feature vectors for both the word context and the potential target concepts, and using vector
similarity measures to select the most suitable concept. A more refined approach would be to use supervised or semisupervised
learning techniques, based on hand-annotated training data. Even more ambitiously, linguistic techniques could be used to extract
a more richly annotated word context, e.g. identifying the corresponding verb or even its FrameNet class for a noun that is to be mapped
onto the ontology.
The goal of this thesis is to develop a practically viable method for how to exploit linguistic features for the disambiguation
and mapping of words onto concepts, and to systematically study its performance in comparison with other approaches.