New for: D3
Our main contributions are as follows. First, we show that instances that have the $( 1+\alpha, \epsilon )$-property and for which, additionally, the clusters in the target clustering are large, are easier than general instances: the algorithm proposed in [BBG'09] is a constant factor approximation algorithm with an approximation guarantee that is better than the known hardness of approximation for general instances. Further, we show that it is $NP$-hard to check if an instance satisfies the $( 1+\alpha, \epsilon )$-property for a given $(\alpha, \epsilon)$; the algorithms in [BBG'09] need such $\alpha$ and $\epsilon$ as input parameters, however. Second, we show how to use their algorithms even if we do not know values of $\alpha$ and $\epsilon$ for which the assumption holds. Finally, we implement these methods and other popular methods, and test them on real world data sets. We find that on these data sets there are no $\alpha$ and $\epsilon$ so that the dataset has both $( 1+\alpha, \epsilon )$-property and sufficiently large clusters in the target solution. For the general case, we show that on our data sets the performance guarantee proved by [BBG'09] is meaningless for the values of $\alpha, \epsilon$ such that the data set has the $( 1+\alpha, \epsilon )$-property. The algorithm nonetheless gives reasonable results, although it is outperformed by other methods.
Joint work with Michael Yu and Anke van Zuylen.