MPI-INF Logo
Publications

Server    domino.mpi-inf.mpg.de

Proceedings Article, Paper
@InProceedings
Beitrag in Tagungsband, Workshop

Author, Editor
Author(s):
Celikik, Marjan
Bast, Holger
dblp
dblp
Editor(s):
Shin, Dongwandblp
Not MPII Editor(s):
Shin, Dongwan
BibTeX cite key*:
bastcelikik2009sac
Title, Booktitle
Title*:
Fast Error-Tolerant Search on Very Large Texts
spelling-variants.pdf (304.6 KB)
Booktitle*:
The 24th Annual ACM Symposium on Applied Computing
Event, URLs
Conference URL::
http://www.acm.org/conferences/sac/sac2009/
Downloading URL:
http://doi.acm.org/10.1145/1529282.1529669
Event Address*:
Honolulu, Hawaii, USA
Language:
English
Event Date*
(no longer used):
Organization:
Event Start Date:
8 March 2009
Event End Date:
12 March 2009
Publisher
Name*:
ACM
URL:
http://www.acm.org/
Address*:
New York, NY
Type:
Vol, No, Year, pp.
Series:
Volume:
Number:
Month:
March
Pages:
1724-1731
Year*:
2009
VG Wort Pages:
ISBN/ISSN:
978-1-60558-166-8
Sequence Number:
DOI:
10.1145/1529282.1529669
Note, Abstract, ©
(LaTeX) Abstract:
We consider the following spelling variants clustering problem: Given a list
of distinct words, called lexicon, compute (possibly overlapping) clusters of
words which are spelling variants of each other. This problem naturally arises
in the context of error-tolerant full-text search of the following kind: For
a given query, return not only documents matching the query words exactly but
also those matching their spelling variants. This is the inverse of the
well-known "Did you mean: ... ?" web search engine feature, where the error
tolerance is on the side of the query, and not on the side of the documents.

We combine various ideas from the large body of literature on approximate
string searching and spelling correction techniques to a new algorithm for the
spelling variants clustering problem that is both accurate and very efficient
in time and space. Our largest lexicon, containing roughly 10 million words,
can be processed in about 16 minutes on a standard PC using 10 MB of
additional space. This beats the previously best scheme by a factor of two in
running time and by a factor of more than ten in space usage. We have
integrated our algorithms into the CompleteSearch engine in a way that
achieves error-tolerant search without significant blowup in neither index size
nor query processing time.
Keywords:
Spelling variants, approximate string matching, error-tolerant search
Personal Comments:
Verlagsversion? ACHTUNG Copyright : Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SAC’09 March 8-12, 2009, Honolulu, Hawaii, U.S.A.
Copyright 2009 ACM 978-1-60558-166-8/09/03 ...$5.00.
Download
Access Level:
Public

Correlation
MPG Unit:
Max-Planck-Institut für Informatik
MPG Subunit:
Algorithms and Complexity Group
Appearance:
MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort



BibTeX Entry:

@INPROCEEDINGS{bastcelikik2009sac,
AUTHOR = {Celikik, Marjan and Bast, Holger},
EDITOR = {Shin, Dongwan},
TITLE = {Fast Error-Tolerant Search on Very Large Texts},
BOOKTITLE = {The 24th Annual ACM Symposium on Applied Computing},
PUBLISHER = {ACM},
YEAR = {2009},
PAGES = {1724--1731},
ADDRESS = {Honolulu, Hawaii, USA},
MONTH = {March},
ISBN = {978-1-60558-166-8},
DOI = {10.1145/1529282.1529669},
}


Entry last modified by Anja Becker, 03/04/2010
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
[Library]
Created
02/03/2009 03:57:27 PM
Revisions
3.
2.
1.
0.
Editor(s)
Anja Becker
Anja Becker
Marjan Celikik
Marjan Celikik
Edit Dates
04.03.2010 11:11:04
01.03.2010 14:24:50
03/27/2009 02:50:45 PM
02/03/2009 03:57:27 PM


File Attachment Icon
spelling-variants.pdf