We study the succinct data structure problem of locally decoding short substrings of a given text under its compressed BWT, i.e., with small additive redundancy r over the Move-To-Front compression. The celebrated BWT-based FM-index yield a trade-off of r = O~(n/\sqrt{t}) bits, to decode a single character in O(t) time. We give a near-quadratic improvement r = O~(n\lg(t)/t). As a by-product, we obtain an exponential (in t) improvement on the redundancy of the FM-index for counting pattern-matches on compressed text. In the interesting regime where the text compresses to n^{1-o(1)} bits, these results provide an exp(t) overall space reduction. For the local decoding problem of BWT, we also prove an \Omega(n/t^2) cell-probe lower bound for "symmetric" data structures.
We achieve our main result by designing a compressed partial-sums (Rank) data structure over BWT. The key component is a locally-decodable Move-to-Front (MTF) code: with only O(1) extra bits per block of length n^{\Omega(1)}, the decoding time of a single character can be decreased from \Omega(n) to O(\lg n). This result is of independent interest in algorithmic information theory.
This is joint work with Omri Weinstein.
Note: This talk will take 45min.