MPI-INF Logo
Publications

Server    domino.mpi-inf.mpg.de

Proceedings Article, Paper
@InProceedings
Beitrag in Tagungsband, Workshop

Author, Editor
Author(s):
Burkhardt, Stefan
Kärkkäinen, Juha
dblp
dblp
Editor(s):
Amir, Amihood
Landau, Gadi
dblp
dblp
BibTeX cite key*:
Burkhardt2001
Title, Booktitle
Title*:
Better Filtering with Gapped q-Grams
Booktitle*:
Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Event, URLs
Conference URL::
http://csweb.haifa.ac.il/cpm/
Downloading URL:
http://www.mpi-sb.mpg.de/~stburk/gapped-q.ps
Event Address*:
Jerusalem, Israel
Language:
English
Event Date*
(no longer used):
July 2001
Organization:
Event Start Date:
3 December 2023
Event End Date:
3 December 2023
Publisher
Name*:
Springer
URL:
http://www.springer.de/
Address*:
Berlin, Germany
Type:
Vol, No, Year, pp.
Series:
Lecture Notes in Computer Science
Volume:
2089
Number:
Month:
July
Pages:
73-85
Year*:
2001
VG Wort Pages:
ISBN/ISSN:
3-540-42271-4
Sequence Number:
DOI:
Note, Abstract, ©
(LaTeX) Abstract:
The q-gram filter is a popular filtering method for approximate
string matching. It compares substrings of length q (the q-grams)
in the pattern and the text to identify the text areas that might
contain a match. A generalization of the method is to use gapped
q-grams, subsets of q characters in some fixed non-contiguous
shape, instead of contiguous substrings. Although mentioned a few
times in the literature, this generalization has never been studied
in any depth. In ths paper, we report the first results from a
study on gapped q-grams. We show that gapped q-grams can provide
orders of magnitude faster and/or more efficient filtering than
contiguous q-grams. The performance, however, depends on the shape
of the q-grams. The best shaoes are rare and often posess no
apparen regularity. We show how to recognize good shapes and
demonstrate with experiments their advantage over both contiguous
and average shapes. We concentrate here on the k mismatches
problem, but also outline an approach for extending the results
to the more common k differences problem.
Keywords:
approximate string matching, filter algorithm, gapped q-grams, k mismatches problem
Download
Access Level:

Correlation
MPG Unit:
Max-Planck-Institut für Informatik
MPG Subunit:
Algorithms and Complexity Group
Audience:
experts only
Appearance:
MPII WWW Server, university publications list, MPII FTP Server, working group publication list, VG Wort, MPG publications list, Fachbeirat



BibTeX Entry:

@INPROCEEDINGS{Burkhardt2001,
AUTHOR = {Burkhardt, Stefan and K{\"a}rkk{\"a}inen, Juha},
EDITOR = {Amir, Amihood and Landau, Gadi},
TITLE = {Better Filtering with Gapped q-Grams},
BOOKTITLE = {Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching},
PUBLISHER = {Springer},
YEAR = {2001},
VOLUME = {2089},
PAGES = {73--85},
SERIES = {Lecture Notes in Computer Science},
ADDRESS = {Jerusalem, Israel},
MONTH = {July},
ISBN = {3-540-42271-4},
}


Entry last modified by Stefan Burkhardt, 03/02/2010
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Stefan Burkhardt
Created
07/20/2001 14:01:34
Revisions
2.
1.
0.

Editor(s)
Stefan Burkhardt
Anja Becker
Stefan Burkhardt

Edit Dates
04/16/2003 04:11:35 PM
08.04.2002 12:48:04
20/07/2001 14:01:34