Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:IBEX: Id-Based Entity Extraction
Speaker:Aliaksandr Talaika
coming from:Fachrichtung Informatik - Saarbr├╝cken
Speakers Bio:Master of Science
Event Type:PhD Application Talk
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Public Audience
Language:English
Date, Time and Location
Date:Monday, 10 February 2014
Time:10:50
Duration:90 Minutes
Location:Saarbr├╝cken
Building:E1 4
Room:024
Abstract
Several academic and industrial projects have started extracting entities from the Web. In this thesis, we show that a certain subclass of entities, namely those that have unique identifiers, can be extracted at large scale with high precision from Web data. This applies most notably to commercial products, but also to email addresses, scientific publications, chemical substances, and a wide variety of other entities. By making systematic use of the identifiers, our algorithm can leapfrog page segmentation, complex named entity recognition, or table alignment. Our method can extract millions of items, each disambiguated to a canonical entity, with a precision of 73-96%. This yields a database of unique entities at Web scale. It allows us detailed statistics on the presence of commercial products, people, and other objects on the Internet.
Contact
Name(s):
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Note:
Attachments, File(s):
  • Aaron Alsancak, 02/06/2014 10:38 AM -- Created document.