MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Algorithms for Shared-Memory Matrix Completion and Maximum Inner Product Search

Christina Teflioudi
MMCI
Promotionskolloquium
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Thursday, 14 April 2016
16:00
60 Minutes
E1 4
0.24
Saarbrücken

Abstract

In this thesis, we propose efficient and scalable algorithms for shared-memory matrix factorization and maximum

inner product search. Matrix factorization is a popular tool in the data mining community due to its ability to
quantify the interactions between different types of entities. It typically maps the (potentially noisy) original
representations of the entities into a lower dimensional space, where the "true" structure of the dataset is revealed.
Inner products of the new vector representations are then used to measure the interactions between different
entities. The strongest of these interactions are usually of particular interest in applications and can be retrieved
by solving a maximum inner product search problem.

For large real-life  problem instances of matrix factorization and maximum inner product search, efficient and
scalable methods are necessary. We first study large-scale matrix factorization in a shared-memory setting and
we propose a cache-aware, parallel method that avoids fine-grained synchronization or locking. In more detail,
our approach partitions the initial large problem into small, cache-fitting sub-problems that can be solved
independently using stochastic gradient descent. Due to the low cache-miss rate and the absence of any locking
or synchronization, our method achieves superior performance in terms of speed (up to 60% faster) and scalability
than previously proposed techniques.

We then proceed with investigating the problem of maximum inner product search and design a cache-friendly
framework that allows for both exact and approximate search. Our approach reduces the original maximum inner
product search problem into a set of smaller cosine similarity search problems that can be solved using existing
cosine similarity search techniques or our novel algorithms tailored for use within our framework. Experimental
results show that our approach is multiple orders of magnitude faster than naive search, consistently faster than
alternative methods for exact search, and achieves better quality-speedup tradeoff (up to 3.9x faster for similar
recall levels) than state-of-the-art approximate techniques.

Contact

Daniela Alessi
5000
--email hidden
passcode not visible
logged in users only

Daniela Alessi, 04/11/2016 14:20 -- Created document.