MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Dictionary-Based Named Entity Recognition

Artem Boldyrev
Fachrichtung Informatik - Saarbrücken
PhD Application Talk

Masters Student
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Monday, 10 February 2014
08:50
90 Minutes
E1 4
024
Saarbrücken

Abstract

An important task in information extraction is the recognition of named entities in natural language texts, NER for short. A named entity is a phrase presenting an item of a class. In this talk I will represent a dictionary-based NER framework. It uses multiple dictionaries, which are freely available on the Web. A dictionary is a collection of phrases that describe named entities. The framework is composed of two stages: (1) detection of named entity candidates using dictionaries for lookups and (2) filtering of false positives based on a part-of-speech tagger. Dictionary lookups are performed using an efficient prefix-tree data structure. Optionally, additional filters using word-form-based evidence can be applied to increase precision and recall of the recognition. Most of the existing approaches for NER use machine learning techniques. The main drawback of these systems is the manual effort needed for the creation of labeled training data. Our dictionary-based recognizer does not need labeled text as training data. Furthermore, the dictionary-based framework can be applied to any language that is supported by a part-of-speech tagger. Our dictionary-based recognizer performs on German with up to 89.01% precision at 77.64% recall and 81.60% F1 score, improving Stanford’s NER by five percentage points for precision, recall, and F1 score.

Contact

--email hidden
passcode not visible
logged in users only

Aaron Alsancak, 02/06/2014 10:24 -- Created document.