Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Adapting Named Entity Disambiguation for Arabic Text
Speaker:Mohamed Gad-Elrab
coming from:International Max Planck Research School for Computer Science - IMPRS
Speakers Bio:
Event Type:PhD Application Talk
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Public Audience
Date, Time and Location
Date:Monday, 4 May 2015
Duration:75 Minutes
Building:E1 4
Named Entity Disambiguation (NED) is the problem of mapping mentions of ambiguous names in natural language text onto canonical entities like people or places, registered in a knowledge base. Recent advances in this field enables semantically understanding content in different types of text. While the problem had been extensively studied for the English text, the support for other languages and -in particular- Arabic is still in its infancy.  In addition, Arabic Web content (e.g. in social media) has been increasing dramatically over the last years. Therefore, we see a great potential for endeavors that support an entity-level analytics of these data. AIDArabic is the first work in that direction that used evidences from  both English and Arabic Wikipedia to enrich existing AIDA system and allowing the disambiguation of Arabic content to an automatically generated knowledge base from Wikipedia.

The contributions in this work are three fold: 1) we introduce techniques for automatically augmenting AIDArabic’s entities catalog and disambiguation ingredients using information beyond interwiki links. We achieved that by fusing the output of a lightweight machine translation, transliteration and web external sources. 2) We introduced a language-specific input processing module to handle the language specific differences in the Arabic language. 3) We automatically built test corpus from other parallel corpora to overcome the absence of standard benchmarks Arabic NED systems.  We evaluated single components as well as the full pipeline using a mix of manual and automatic assessment. Initial enrichment statistics show that our system can disambiguate mentions to one of 2.4 M entities instead of only 140K in the original AIDArabic.

Name(s):Jennifer Gerling
EMail:--email address not disclosed on the web
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Attachments, File(s):
  • Jennifer Gerling, 05/02/2015 06:55 PM -- Created document.