Title:IBEX: Id-Based Entity Extraction
Speaker:Aliaksandr Talaika
coming from:Fachrichtung Informatik - Saarbr├╝cken
Speakers Bio:Master of Science
Event Type:PhD Application Talk
Level:Public Audience
Date:Monday, 10 February 2014
Duration:90 Minutes
Building:E1 4
Several academic and industrial projects have started extracting entities from the Web. In this thesis, we show that a certain subclass of entities, namely those that have unique identifiers, can be extracted at large scale with high precision from Web data. This applies most notably to commercial products, but also to email addresses, scientific publications, chemical substances, and a wide variety of other entities. By making systematic use of the identifiers, our algorithm can leapfrog page segmentation, complex named entity recognition, or table alignment. Our method can extract millions of items, each disambiguated to a canonical entity, with a precision of 73-96%. This yields a database of unique entities at Web scale. It allows us detailed statistics on the presence of commercial products, people, and other objects on the Internet.
Video Broadcast:No
