Journal Article
@Article
Artikel in Fachzeitschrift


Show entries of:

this year (2018) | last year (2017) | two years ago (2016) | Notes URL

Action:

login to update

Options:








Author, Editor(s)

Author(s):

Crauser, Andreas
Ferragina, Paolo

dblp
dblp



BibTeX cite key*:

Crauser-Ferragina2001

Title

Title*:

A theoretical and experimental study on the construction of suffix arrays in external memory

Journal

Journal Title*:

Algorithmica

Journal's URL:

http://link.springer.de/link/service/journals/00453/

Download URL
for the article:

http://link.springer.de/link/service/journals/00453/papers/2032001/20320001.pdf

Language:

English

Publisher

Publisher's
Name:

Springer

Publisher's URL:

http://www.springer-ny.com/

Publisher's
Address:

New York, USA

ISSN:

0178-4617

Vol, No, pp, Date

Volume*:

32

Number:

1

Publishing Date:

January 2002

Pages*:

1-35

Number of
VG Pages:

34

Page Start:


Page End:


Sequence Number:


DOI:


Note, Abstract, ©

Note:


(LaTeX) Abstract:

The construction of full-text indexes on very large text collections
is nowadays a hot problem. The suffix array [Manber-Myers,~1993] is
one of the most attractive full-text indexing data structures due to
its simplicity, space efficiency and powerful/fast search operations
supported. In this paper we analyze, both theoretically and
experimentally, the I/O-complexity and the working space of six
algorithms for constructing large suffix arrays. Three of them are
the state-of-the-art, the other three algorithms are our new
proposals. We perform a set of experiments based on three different
data sets (English texts, Amino-acid sequences and random texts) and
give a precise hierarchy of these algorithms according to their
working-space vs. construction-time tradeoff. Given the current
trends in model design~\cite{Farach-et-al,Vitter} and disk
technology~\cite{quantum,Ruemmler-Wilkes}, we will pose particular
attention to differentiate between ``random'' and ``contiguous''
disk accesses, in order to reasonably explain some practical
I/O-phenomena which are related to the experimental behavior of
these algorithms and that would be otherwise meaningless in the
light of other simpler external-memory models.

At the best of our knowledge, this is the first study which provides
a wide spectrum of possible approaches to the construction of suffix
arrays in external memory, and thus it should be helpful to anyone
who is interested in building full-text indexes on very large text
collections.

Finally, we conclude our paper by addressing two other issues. The
former concerns with the problem of building word-indexes; we show
that our results can be successfully applied to this case too,
without any loss in efficiency and without compromising the
simplicity of programming so to achieve a uniform, simple and
efficient approach to both the two indexing models. The latter issue
is related to the intriguing and apparently counterintuitive
``contradiction'' between the effective practical performance of the
well-known BaezaYates-Gonnet-Snider's algorithm~\cite{book-info},
verified in our experiments, and its unappealing (i.e., cubic)
worst-case behavior. We devise a new external-memory algorithm that
follows the basic philosophy underlying that algorithm but in a
significantly different manner, thus resulting in a novel approach
which combines good worst-case bounds with efficient practical
performance.

URL for the Abstract:


Categories,
Keywords:

external memory, indexing

HyperLinks / References / URLs:


Copyright Message:


Personal Comments:


Download
Access Level:

Public

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

Algorithms and Complexity Group

Audience:

Expert

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort


BibTeX Entry:

@ARTICLE{Crauser-Ferragina2001,
AUTHOR = {Crauser, Andreas and Ferragina, Paolo},
TITLE = {A theoretical and experimental study on the construction of suffix arrays in external memory},
JOURNAL = {Algorithmica},
PUBLISHER = {Springer},
YEAR = {2002},
NUMBER = {1},
VOLUME = {32},
PAGES = {1--35},
ADDRESS = {New York, USA},
MONTH = {January},
ISBN = {0178-4617},
}


Entry last modified by Christine Kiesel, 03/02/2010
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Andreas Crauser
Created
02/23/2001 04:48:00 PM
Revisions
9.
8.
7.
6.
5.
Editor(s)
Christine Kiesel
Christine Kiesel
Christine Kiesel
Uwe Brahm
Uwe Brahm
Edit Dates
30.05.2005 16:46:47
08.09.2003 16:44:48
07.05.2003 17:35:06
03.04.2002 22:18:47
03.04.2002 22:17:35
Show details for Attachment SectionAttachment Section
Hide details for Attachment SectionAttachment Section