The k-means++ seeding algorithm is a simple sampling algorithm defined as follows: Pick the first center randomly from among the given points. For i>1, pick a given point to be the i^{th} center with probability proportional to the square of the distance of this point from the nearest previously chosen (i-1) centers. Practitioners feel good about this algorithm since it is simple to implement while theorists feel comfortable because Arthur and Vassilvitskii showed that this algorithm gives an O(log k) approximation in expectation. In this talk, we will explore different aspects of this simple sampling technique. We will see how it can be used obtain a PTAS, observe conditions under which it gives a constant approximation, see some of the limits of this technique, and conclude with some open questions.
This is a joint work with various colleagues and students at CSE, IIT Delhi.