MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D3

What and Who

Opportumistic data structures

Paolo Ferragina
Dipartimento di Informatica, Universita' di Pisa, Italy
Talk
AG 1, AG 2, AG 3, AG 4  
AG Audience
English

Date, Time and Location

Thursday, 3 August 2000
13:30
30 Minutes
46
024
Saarbrücken

Abstract

There is an upsurging interest in designing succinct data structures

for basic searching problems. The motivation has to be found in the exponential increase
of electronic data nowadays available which is even surpassing the
significant increase in memory and disk storage capacities of current
computers. Space reduction is an attractive issue because it is also
intimately related to performance improvements as noted by several
authors (e.g. Knuth [vol. 3], Bentley [Progr. Pearls]). In
designing these implicit data structures the goal is to reduce
as much as possible the auxiliary information kept together with the
input data without introducing a significant slowdown in the final
query performance. Yet input data are represented in their entirety
thus taking no advantage of possible repetitiveness into them. The
importance of those issues is well known to programmers who typically
use various tricks to squeeze data as much as possible and still
achieve good query performance. Their approaches, though, boil down
to heuristics whose effectiveness is witnessed only by
experimentation.

In this paper, we address the issue of compressing and indexing data
by studying it in a theoretical framework. We devise a novel data
structure for indexing and searching whose space occupancy is a
function of the entropy of the underlying data set. We call the data
structure opportunistic
since its space occupancy is decreased when the input is compressible
at no significant slowdown in the query performance. More precisely,
its space occupancy is optimal in an information-content sense
because a text T[1,u] is stored using O(Hk(T)) + o(1) bits
per input symbol, where Hk(T) is the k-th order entropy of T
(the bound holds for any fixed k). Given an arbitrary string
P[1,p], the opportunistic data structure allows to search for the
occ occurrences of P in T requiring O(p + occ log^c u)$ time
complexity (for any fixed c >0). If data are
uncompressible we achieve the best space bound currently
known [Grossi-Vitter, STOC 00]; on compressible data our solution
improves the succinct suffix array of [Grossi-Vitter, STOC 00] and
the classical suffix tree and suffix array data structures either
in space or in query time complexity or both.

Finally we will discuss some applications of our opportunistic data
structure and some preliminary experimental results.

Contact

--email hidden
passcode not visible
logged in users only