Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Knowledge-driven Entity Recognition and Disambiguation in Biomedical Text
Speaker:Amy Siu
coming from:Max-Planck-Institut für Informatik - D5
Speakers Bio:
Event Type:Promotionskolloquium
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Public Audience
Language:English
Date, Time and Location
Date:Monday, 4 September 2017
Time:16:00
Duration:60 Minutes
Location:Saarbrücken
Building:E1 4
Room:024
Abstract
Entity recognition and disambiguation (ERD) for the biomedical domain
are notoriously difficult problems due to the variety of entities and
their often long names in many variations. Existing works focus heavily
on the molecular level in two ways. First, they target scientific
literature as the input text genre. Second, they target single, highly
specialized entity types such as chemicals, genes, and proteins.
However, a wealth of biomedical information is also buried in the vast
universe of Web content. In order to fully utilize all the information
available, there is a need to tap into Web content as an additional
input. Moreover, there is a need to cater for other entity types such as
symptoms and risk factors since Web content focuses on consumer health.
The goal of this thesis is to investigate ERD methods that are
applicable to all entity types in scien-tific literature as well as Web
content. In addition, we focus on under-explored aspects of the
bio-medical ERD problems -- scalability, long noun phrases, and
out-of-knowledge base (OOKB) enti-ties.
This thesis makes four main contributions, all of which leverage
knowledge in UMLS (Unified Med-ical Language System), the largest and
most authoritative knowledge base (KB) of the biomedical domain. The
first contribution is a fast dictionary lookup method for entity
recognition that maximiz-es throughput while balancing the loss of
precision and recall. The second contribution is a semantic type
classification method targeting common words in long noun phrases. We
develop a custom set of semantic types to capture word usages; besides
biomedical usage, these types also cope with non-biomedical usage and
the case of generic, non-informative usage. The third contribution is a
fast heu-ristics method for entity disambiguation in MEDLINE abstracts,
again maximizing throughput but this time maintaining accuracy. The
fourth contribution is a corpus-driven entity disambiguation method that
addresses OOKB entities. The method first captures the entities
expressed in a corpus as latent representations that comprise in-KB and
OOKB entities alike before performing entity disam-biguation.
Contact
Name(s):Daniela Alessi
Phone:5000
EMail:--email address not disclosed on the web
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Note:
Attachments, File(s):
Created:
Daniela Alessi/MPI-INF, 08/25/2017 10:03 AM
Last modified:
halma/MPII/DE, 05/31/2018 12:00 AM
  • Daniela Alessi, 08/25/2017 10:06 AM -- Created document.