MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Algorithms for Finding Orders and Analyzing Sets of Chains

Dr. Antti Ukkonen
Helsinki University
Talk
AG 5  
AG Audience
English

Date, Time and Location

Monday, 19 January 2009
11:00
60 Minutes
E1 4
433
Saarbrücken

Abstract

Rankings of items are a useful concept in a variety of applications,
such as clickstream analysis, some voting methods, bioinformatics, and
other fields of science such as paleontology. This thesis addresses
two problems related to such data. The first problem is about finding
orders, while the second one is about analyzing sets of orders.

We address two different tasks in the problem of finding
orders.  We can find
orders either by computing an aggregate of a set
of known orders, or by constructing an order for a previously
unordered data set. For the first task we show that bucket orders, a
subclass of partial orders, are a useful structure for summarizing sets
of orders. We formulate an optimization problem for finding such
partial orders, show that it is NP-hard, and give an efficient
randomized algorithm for finding approximate solutions to
it. Moreover, we show that the expected cost of a solution found by
the randomized algorithm differs from the optimal solution only by a
constant factor.  For the second approach we propose a simple method for
sampling orders for 0--1 vectors that is based on the consecutive ones
property.

For analyzing orders, we discuss three different methods. First, we
give an algorithm for clustering sets of orders. The algorithm
is a variant of Lloyd's iteration for solving the k-means problem.  We
also give two different approaches for mapping orders to vectors in a
high-dimensional Euclidean space. These mappings are used on one hand
for clustering, and on the other hand for creating two dimensional
visualizations (scatterplots) for sets of orders. Finally, we discuss
randomization testing in case of orders. To this end we propose an
MCMC algorithm for creating random sets of orders that preserve
certain well defined properties of a given set of orders. The random
data sets can be used to assess the statistical significance of the
results obtained e.g. by clustering.

Contact

Petra Schaaf
500
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 01/13/2009 12:46
Petra Schaaf, 01/13/2009 12:24 -- Created document.