To such applications, we propose a novel data structure, SEManTIKs, for space efficient storage and management of large text and sequence databases. The structure uses bit vectors that reuse the storage space for common triplets and, therefore, has low space requirements as compared to the existing trie-based structures. In addition to exact string search operation, SEManTIKs also efficiently handles prefix and suffix search queries. We also propose an extension of the structure for handling substring searches, albeit with an increase in the storage requirements. This extension is important in comparison to the trie-like and compressed dictionary-based methods that are unable to handle such queries efficiently.
We perform several experiments to show that SEManTIKs outperforms the existing structures by nearly a factor of two in terms of space requirements, while the various query times are either better or comparable.