Campus Event Calendar: Mohamed Gad-Elrab (05/04/2015 in E1 4/024)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Adapting Named Entity Disambiguation for Arabic Text

Mohamed Gad-Elrab

International Max Planck Research School for Computer Science - IMPRS

PhD Application Talk

AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI

Public Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Monday, 4 May 2015

09:15

75 Minutes

E1 4

024

Saarbrücken

Abstract

Named Entity Disambiguation (NED) is the problem of mapping mentions of ambiguous names in natural language text onto canonical entities like people or places, registered in a knowledge base. Recent advances in this field enables semantically understanding content in different types of text. While the problem had been extensively studied for the English text, the support for other languages and -in particular- Arabic is still in its infancy. In addition, Arabic Web content (e.g. in social media) has been increasing dramatically over the last years. Therefore, we see a great potential for endeavors that support an entity-level analytics of these data. AIDArabic is the first work in that direction that used evidences from both English and Arabic Wikipedia to enrich existing AIDA system and allowing the disambiguation of Arabic content to an automatically generated knowledge base from Wikipedia.

The contributions in this work are three fold: 1) we introduce techniques for automatically augmenting AIDArabic’s entities catalog and disambiguation ingredients using information beyond interwiki links. We achieved that by fusing the output of a lightweight machine translation, transliteration and web external sources. 2) We introduced a language-specific input processing module to handle the language specific differences in the Arabic language. 3) We automatically built test corpus from other parallel corpora to overcome the absence of standard benchmarks Arabic NED systems. We evaluated single components as well as the full pipeline using a mix of manual and automatic assessment. Initial enrichment statistics show that our system can disambiguate mentions to one of 2.4 M entities instead of only 140K in the original AIDArabic.

Contact

Jennifer Gerling

1800

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Jennifer Gerling, 05/02/2015 18:55 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis