MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Distributed Similarity Search in High Dimensions

Sebastian Michel
École Polytechnique Fédérale de Lausanne
Talk
AG 1, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Tuesday, 10 March 2009
14:00
45 Minutes
E1 4
024
Saarbrücken

Abstract

In this talk we will have a look at similarity search techniques for high-dimensional data.

We present a novel approach for distributed K-Nearest Neighbor (KNN) search and range query processing. Our approach is based on Locality Sensitive Hashing (LSH) which has proven very efficient in answering KNN queries in centralized settings. We consider mappings from the multi-dimensional LSH bucket space to the linearly ordered set of nodes in a network that jointly maintain the indexed data and derive requirements to achieve high quality search results and limit the number of network accesses.

We put forward two such mappings that come with these salient properties: being locality preserving so that buckets likely to hold similar data are stored on the same or neighboring peers and having a predictable output distribution to ensure fair load balancing.

We show how to leverage the linearly aligned data for efficient KNN search and how to efficiently process range queries which is, to the best of our knowledge, not possible in existing LSH schemes.

We will conclude the talk by reporting on a comprehensive performance evaluation using real world data that our approach brings major performance and accuracy gains compared to state-of-the-art.

Contact

Conny Liegl
302-70150
--email hidden
passcode not visible
logged in users only

Conny Liegl, 03/09/2009 13:46 -- Created document.