Thesis - Masters thesis
@MastersThesis
Diplomarbeit


Show entries of:

this year (2010) | last year (2009) | two years ago (2008) | Notes URL


Action:

login to update

Options:







Author

Author(s)*:

Kasradze, Levan

BibTeX citekey*:

Kasradze2008

Language:

English

Title, School

Title*:

Implementation of a File-based Indexing Framework for the TopX Search Engine

School*:

Universität des Saarlandes

Type of Thesis*:

Masters thesis

Month:

May

Year*:

2008

Pages:


Publisher

Publishers Name:


Publishers Address:


Note, Abstract, ©

Note:


LaTeX Abstract:

Full text indices provide fast string search over huge text collections. The most challenging issues of these indices have traditionally been their space consumption and construction time.
This thesis implements a file-based indexing framework for the TopX
search engine. The indexing framework constructs an inverted index. We have implemented parallelized indexer that succeeds against huge text collections.
Our indexing framework supports several index layouts with content based and proximity scoring functions. To reduce required disk space, we employ static index pruning techniques with quality guarantees. For a given keyword we are able to fetch the corresponding inverted list with only two sequential disk I/O operations.
Our experimentation using TREC Terabyte Track “GOV2” (426 GB)
collection showed that it is possible to construct indices with the BM25 and proximity scores in 190 hours and only using disk space of magnitude 31% of original collection size.

Keywords:


HyperLinks / References / URLs:


Personal Comments:


Download
Access Level:

Internal

Referee, Status

1. Referee:

Gerhard Weikum

2. Referee:

Holger Bast

Supervisor:

Ralf Schenkel

Status:

Completed

First Lecture Title:


Location of Lecture:


Date of the Kolloquium:

16 January 2009

Chair of the Kolloquium:


Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

IMPRS-CS

Audience:

Expert

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort

BibTeX Entry:
@MASTERSTHESIS{Kasradze2008,
AUTHOR = {Kasradze, Levan},
TITLE = {Implementation of a File-based Indexing Framework for the TopX Search Engine},
SCHOOL = {Universit{\"a}t des Saarlandes},
YEAR = {2008},
MONTH = {May},
}

Entry last modified by Stephanie Jörg, 01/16/2009
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)