Sampling algorithms and coresets for Lp regression
Petros Drineas
Rensselaer Polytechnic Institute, New York, USA
AG1 Mittagsseminar (own work)
Petros Drineas is an assistant professor in the computer science department of Rensselaer Polytechnic Institute. His main research interests are the design and analysis of randomized and approximation algorithms for linear algebraic problems, and their applications to information retrieval and data mining.
L2 regression, also known as the least squares approximation problem, and more generally Lp regression, are fundamental problems that have numerous applications in mathematics and statistical data analysis. Recall that the over-constrained Lp regression problem takes as input an n x d matrix A, where n > d, an n-vector b, and a number p, and it returns as output a d-vector x such that ||Ax-b||_p is minimized. Several randomized algorithms for this problem, each of which provides a relative-error approximation to the exact solution, are described. The first algorithm applies novel ``subspace sampling'' probabilities to randomly sample constraints and thus construct a coreset for the L2 regression problem. The second algorithm (devised by T. Sarlos) applies a random projection and uses a ``fast'' version of the Johnson-Lindenstrauss lemma to obtain an efficient approximation to the L2 regression problem. The third algorithm generalizes the concept of a ``well-conditioned'' basis and of ``subspace-preserving sampling'' to obtain efficiently a coreset for Lp regression, for all $p\in[1,\infty)$.