What and Who
Title:Series Discovery with Missing and Erroneous Values
Speaker:Dr. Pei Li
coming from:University of Zurich
Speakers Bio:
Event Type:AG5 Talk
Level:AG Audience
Level:AG Audience
Date, Time and Location
Date:Thursday, 9 April 2015
Duration:60 Minutes
Building:E1 4
A series of real-world data, such as a series of music records, is often generated with order
dependency semantics; for example, music records in a series with larger catalog numbers are
usually released later in years. In this talk, I will discuss how order dependencies can be
exploited to discovery series as well as to repair missing and erroneous values of ordered
attributes in a dataset. The problem is challenging in the following aspects. First, order
dependency mechanisms are unknown a-priori and can vary among series. For example, a series can
assign catalog numbers to records in either increasing or decreasing order over time. Second,
order dependencies are often not satisfied by every record pair in a real-world series. There can
be a substantial number of records that slightly violate an order dependency. Existing ordering
integrity constraints would consider such records as exceptions, and blindly label them as
outliers. The two factors make our goal of ``one shot, two kills'' - series discovery as well as
error detection extremely challenging.
To make order dependencies applicable to real-world series, we propose the notion of longest
monotonic bands that characterize series, meanwhile being able to distinguish slight violations to
order dependencies from local outliers in a series. We also provide an efficient framework for
discovering series that are approximated by longest monotonic bands. In this talk, I will present
analyses of our proposed algorithms, and show the effectiveness of our framework with preliminary
results in real-world datasets.
Video Broadcast
Video Broadcast:No
Tags, Category, Keywords and additional notes
Attachments, File(s):
