Campus Event Calendar: Sourav Dutta (10/08/2012 in E1 4/024)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Space-Efficient Management of String Databases by Reusing Common Characters

Sourav Dutta

International Max Planck Research School for Computer Science - IMPRS

PhD Application Talk

AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI

Public Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Monday, 8 October 2012

09:00

60 Minutes

E1 4

024

Saarbrücken

Abstract

With the advent of scanning devices and optical character recognition tools, efficient storage and management of large sets of text data has emerged as a fundamental requirement for a large number of modern day applications including dictionary and thesaurus storage, indexing of sequence databases, search engine applications, and XML information mining.

To such applications, we propose a novel data structure, SEManTIKs, for space efficient storage and management of large text and sequence databases. The structure uses bit vectors that reuse the storage space for common triplets and, therefore, has low space requirements as compared to the existing trie-based structures. In addition to exact string search operation, SEManTIKs also efficiently handles prefix and suffix search queries. We also propose an extension of the structure for handling substring searches, albeit with an increase in the storage requirements. This extension is important in comparison to the trie-like and compressed dictionary-based methods that are unable to handle such queries efficiently.
We perform several experiments to show that SEManTIKs outperforms the existing structures by nearly a factor of two in terms of space requirements, while the various query times are either better or comparable.

Contact

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Marc Schmitt, 10/05/2012 16:08 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis