Campus Event Calendar: Christian Böhm (05/30/2001 in 45

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

New for: D3

What and Who

The Similarity Join: A Powerful Database Primitive for High Performance Data Mining

Christian Böhm

LMU München

MPI-Kolloquium

AG 1, AG 2, AG 3, AG 4

AG Audience

Note: We use this to send email in the morning.

Date, Time and Location

Wednesday, 30 May 2001

17:40

45 Minutes

45 - FR 6.2

Saarbrücken

Abstract

Larger and larger amounts of data are collected and stored in

databases, increasing the need for efficient and effective
analysis methods to make use of the information contained
implicitly in the data. The extraction of such potentially useful
information is called data mining.

In the presentation, it is shown that numerous data mining methods
such as density based clustering, k-means clustering, outlier
detection, or k-nearest neighbor classification can be based on
the similarity join as a database primitive. By such a
reformulation, the identical result can be achieved at a
drastically improved efficiency.

Therefore, the similarity join becomes an important basic
operation of multimedia databases. For a given set of feature
vectors, the similairty join determines those object pairs which
are most similar according to some appropriate similairty measure,
in SQL style SELECT * FROM P1,P2 WHERE distance (P1,P2) <= epsilon

The structure of the talk is as follows: In the first part we show
how typical algorithms of data analysis and data mining can be
reformulated such that they are exclusively based on the
similarity join. According to several example applications, we
demonstrate the enormous performance potential of this database
primitive.

The second part of the presentation is dedicated to the develop-
ment of efficient algorithms for the computation of the similarity
join. First we introduce a new cost model for index based
similairty join algorithms. Starting from this cost model, we
develop an innovative index architecture which takes into account
that similarity join algorithms require a separate optimization of
CPU and I/O cost.

Next, we develop a similarity join algorithm which is particularly
suited for massive data sets. It is based on a particular sort
order for high dimensional data. Finally, we propose a technique
for the reduction of CPU cost which is universally applicable in
index based and non-index-based similarity join methods.

A perspective on future research directions in the area of
database primitives for similarity search, data analysis, and data
mining concludes the talk.

Contact

Christoph Storb

-900

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Tags, Category, Keywords and additional notes

Note:

Applicant for group leader of independant research group

Imprint / Impressum | Data Protection / Datenschutzhinweis