Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

MPI-I-98-1-024

$q$-gram based database searching using a suffix array (QUASAR)

Burkhardt, Stefan and Crauser, Andreas and Ferragina, Paolo and Lenhof, Hans-Peter and Rivals, Eric and Vingron, Martin

MPI-I-98-1-024. October 1998, 11 pages. | Status: available - back from printing | Next --> Entry | Previous <-- Entry

Abstract in LaTeX format:
With the increasing amount of DNA sequence information deposited in
our databases searching for similarity to a query sequence
has become a basic operation in molecular biology.
But even today's fast algorithms reach their limits when
applied to all-versus-all comparisons of large databases.
Here we present a new data base searching
algorithm dubbed QUASAR (Q-gram Alignment based on Suffix ARrays)
which was designed to quickly detect sequences with strong
similarity to the query in a context where many searches are
conducted on one database. Our algorithm applies a modification of
$q$-tuple filtering implemented on top of a suffix array. Two
versions were developed, one for a RAM resident suffix array and one
for access to the suffix array on disk. We compared our implementation
with BLAST and found that our approach is an order of magnitude faster.
It is, however, restricted to the search for strongly similar DNA
sequences as is typically required, e.g., in the context of clustering
expressed sequence tags (ESTs).

Acknowledgement:
References to related material:

To download this research report, please select the type of document that fits best your needs.Attachement Size(s):
MPI-I-98-1-024.ps229 KBytes
Please note: If you don't have a viewer for PostScript on your platform, try to install GhostScript and GhostView
URL to this document: http://domino.mpi-inf.mpg.de/internet/reports.nsf/NumberView/1998-1-024
Hide details for BibTeXBibTeX
@TECHREPORT{BurkhardtCrauserFerraginaLenhofRivalsVingron98,
  AUTHOR = {Burkhardt, Stefan and Crauser, Andreas and Ferragina, Paolo and Lenhof, Hans-Peter and Rivals, Eric and Vingron, Martin},
  TITLE = {$q$-gram based database searching using a suffix array (QUASAR)},
  TYPE = {Research Report},
  INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
  ADDRESS = {Im Stadtwald, D-66123 Saarbr{\"u}cken, Germany},
  NUMBER = {MPI-I-98-1-024},
  MONTH = {October},
  YEAR = {1998},
  ISSN = {0946-011X},
}