Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Dictionary-Based Named Entity Recognition
Speaker:Artem Boldyrev
coming from:Fachrichtung Informatik - Saarbrücken
Speakers Bio:Masters Student
Event Type:PhD Application Talk
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Public Audience
Language:English
Date, Time and Location
Date:Monday, 10 February 2014
Time:08:50
Duration:90 Minutes
Location:Saarbrücken
Building:E1 4
Room:024
Abstract
An important task in information extraction is the recognition of named entities in natural language texts, NER for short. A named entity is a phrase presenting an item of a class. In this talk I will represent a dictionary-based NER framework. It uses multiple dictionaries, which are freely available on the Web. A dictionary is a collection of phrases that describe named entities. The framework is composed of two stages: (1) detection of named entity candidates using dictionaries for lookups and (2) filtering of false positives based on a part-of-speech tagger. Dictionary lookups are performed using an efficient prefix-tree data structure. Optionally, additional filters using word-form-based evidence can be applied to increase precision and recall of the recognition. Most of the existing approaches for NER use machine learning techniques. The main drawback of these systems is the manual effort needed for the creation of labeled training data. Our dictionary-based recognizer does not need labeled text as training data. Furthermore, the dictionary-based framework can be applied to any language that is supported by a part-of-speech tagger. Our dictionary-based recognizer performs on German with up to 89.01% precision at 77.64% recall and 81.60% F1 score, improving Stanford’s NER by five percentage points for precision, recall, and F1 score.
Contact
Name(s):
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Note:
Attachments, File(s):
  • Aaron Alsancak, 02/06/2014 10:24 AM -- Created document.