inner product search. Matrix factorization is a popular tool in the data mining community due to its ability to
quantify the interactions between different types of entities. It typically maps the (potentially noisy) original
representations of the entities into a lower dimensional space, where the "true" structure of the dataset is revealed.
Inner products of the new vector representations are then used to measure the interactions between different
entities. The strongest of these interactions are usually of particular interest in applications and can be retrieved
by solving a maximum inner product search problem.
For large real-life problem instances of matrix factorization and maximum inner product search, efficient and
scalable methods are necessary. We first study large-scale matrix factorization in a shared-memory setting and
we propose a cache-aware, parallel method that avoids fine-grained synchronization or locking. In more detail,
our approach partitions the initial large problem into small, cache-fitting sub-problems that can be solved
independently using stochastic gradient descent. Due to the low cache-miss rate and the absence of any locking
or synchronization, our method achieves superior performance in terms of speed (up to 60% faster) and scalability
than previously proposed techniques.
We then proceed with investigating the problem of maximum inner product search and design a cache-friendly
framework that allows for both exact and approximate search. Our approach reduces the original maximum inner
product search problem into a set of smaller cosine similarity search problems that can be solved using existing
cosine similarity search techniques or our novel algorithms tailored for use within our framework. Experimental
results show that our approach is multiple orders of magnitude faster than naive search, consistently faster than
alternative methods for exact search, and achieves better quality-speedup tradeoff (up to 3.9x faster for similar
recall levels) than state-of-the-art approximate techniques.