MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D1, D2, D3, INET, D4, D5

What and Who

Understanding Quantities in Web Tables and Text

Yusra Ibrahim
MMCI
Promotionskolloquium
AG 1, AG 2, AG 3, INET, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Tuesday, 8 October 2019
15:00
60 Minutes
E1 4
024
Saarbrücken

Abstract

There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text.


In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures–for example “48 km/h” is equivalent to “30 mph”. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales.

In this thesis, we target these challenges through the following contributions:

• We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources.

• We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines.

• We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities.

• We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.

Contact

Petra Schaaf
5000
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 10/08/2019 09:42 -- Created document.