Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text
Speaker:Luciano Del Corro
coming from:Max-Planck-Institut für Informatik - D5
Speakers Bio:
Event Type:Promotionskolloquium
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Public Audience
Language:English
Date, Time and Location
Date:Monday, 11 January 2016
Time:16:15
Duration:90 Minutes
Location:Saarbrücken
Building:E1 5
Room:029
Abstract
Natural language text has been the main and most comprehensive way of expressing and storing
knowledge. A long standing goal in computer science is to develop systems that automatically understand
textual data, making this knowledge accessible to computers and humans alike. We conceive automatic text
understanding as a bottom-up approach, in which a series of interleaved tasks build upon each other. Each
task achieves more understanding over the text than the previous one. In this regard, we present three
methods that aim to contribute to the primary stages of this setting.

Our first contribution, ClausIE, is an open information extraction method intended to recognize textual
expressions of potential facts in text (e.g. “Dante wrote the Divine Comedy”) and represent them with
an amenable structure for computers [(“Dante”, “wrote”, “the Divine Comedy”)]. Unlike previous
approaches, ClausIE separates the recognition of the information from its representation, a process
that understands the former as universal (i.e., domain-independent) and the later as application-dependent.
ClausIE is a principled method that relies on properties of the English language and thereby avoids the
use of manually or automatically generated training data.

Once the information in text has been correctly identified, probably the most important element in a structured
 fact is the relation which links its arguments, a relation whose main component is usually a verbal phrase.
Our second contribution, Werdy, is a word entry recognition and disambiguation method.
It aims to recognize words or multi-word expressions (e.g., “Divine Comedy” is a multi-word expression)
in a fact and disambiguate verbs (e.g., what does “write” mean?). Werdy is also an unsupervised approach,
mainly relying on the semantic relation established between a verb sense and its arguments.

The other key components in a structured fact are the named entities (e.g., “Dante”) that often appear in
the arguments. FINET, our last contribution, is a named entity typing method. It aims to understand the types
or classes of those names entities (e.g., “Dante” refers to a writer). FINET is focused on typing named entities
in short inputs (like facts). Unlike previous systems, it is designed to find the types that match the entity mention
context (e.g., the fact in which it appears). It uses the most comprehensive type system of any entity typing
method to date with more than 16k classes for persons, organizations and locations.

These contributions are intended to constitute constructive building blocks for deeper understanding tasks in a
bottom-up automatic text understanding setting.
Contact
Name(s):Daniela Alessi
Phone:5000
EMail:--email address not disclosed on the web
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Note:
Attachments, File(s):
Created by:Daniela Alessi/MPI-INF, 01/05/2016 01:14 PMLast modified by:Uwe Brahm/MPII/DE, 11/24/2016 04:13 PM
  • Daniela Alessi, 01/05/2016 01:26 PM -- Created document.