MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D2, D3

What and Who

Implementation of a File-Based Indexing Framework for the TopX Search Engine

Levan Kasradze
Saarland University
PhD Application Talk
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Friday, 15 October 2010
11:00
90 Minutes
E1 4
024
Saarbrücken

Abstract

Full text indices provide fast string search over huge text collections. The most challenging issues of these indices have traditionally been their space consumption and construction time.

This thesis implements a file-based indexing framework for the TopX search engine. The indexing framework constructs an inverted index. We have implemented parallelized indexer that succeeds against huge text collections. Our indexing framework supports several index layouts with content based and proximity scoring functions. To reduce required disk space, we employ static index pruning techniques with quality guarantees. For a given keyword we are able to fetch the corresponding inverted list with only two sequential disk I/O operations.
Our experimentation using TREC Terabyte Track “GOV2” (426 GB) collection showed that it is possible to construct indices with the BM25 and proximity scores in 190 hours and only using disk space of magnitude 31% of original collection size.

Contact

imprs office
0681 93 25 225
--email hidden
passcode not visible
logged in users only

Stephanie Jörg, 10/14/2010 14:24
Stephanie Jörg, 10/14/2010 14:13 -- Created document.