In this talk I will discuss algorithms and lower bounds for the classical set cover problem, in streaming settings.
The problem is that of a stream of subsets of an underlying universe of size $n$, each set given by the list of elements it includes. The goal of the algorithm is that of finding (approximations of) minimum covers of the universe, or merely estimating the size of minimum covers, under the constraint that the algorithm can use only limited space.
This space is typically limited by $O(n {polylog}n)$ bits, a constraint referred to as semi-streaming.
I will concentrate on the basic setting of a single-pass semi-streaming algorithm, and the basic problems of finding a family of sets of minimum cardinality (or cost, in the weighted case), that covers the whole universe, or only a $(1-\epsilon)$-fraction of the universe, for some $0 \leq \epsilon < 1$. For this problem we give an asymptotically tight trade-off between $\epsilon$ and the approximation ratio: We design a semi-streaming algorithm that constructs a succinct data structure $D$ from which one can extract, for every $0 \leq \epsilon < 1$, a $(1 - \epsilon)$-cover that approximates the optimal 1-cover within a factor of $f(\epsilon, n)$, where $f(\epsilon, n)= O(1/\epsilon)$ if $ \epsilon > 1 / \sqrt{n}$, and $f(\epsilon, n)=O (\sqrt{n})$ otherwise. In particular for the traditional set cover problem of covering the whole universe we obtain an $O(\sqrt{n})$-approximation. This algorithm is proved to be best possible for any randomized algorithm by establishing a family (parameterized by $\epsilon$) of matching lower bounds.
I will then shortly review several subsequent results that later extended our work to the case of multiple passes on the stream, and to the problem of only estimating the size of the cover, in particular giving lower bounds on the decision-problem variant. I will further point out a number of open questions about the set cover problem in streaming settings.
Joint work with Yuval Emek. |