Proceedings Article, Paper
@InProceedings
Beitrag in Tagungsband, Workshop


Show entries of:

this year (2019) | last year (2018) | two years ago (2017) | Notes URL

Action:

login to update

Options:




Library Locked Library locked




Author, Editor

Author(s):

Celikik, Marjan
Bast, Holger

dblp
dblp



Editor(s):

Shin, Dongwan

dblp

Not MPII Editor(s):

Shin, Dongwan

BibTeX cite key*:

bastcelikik2009sac

Title, Booktitle

Title*:

Fast Error-Tolerant Search on Very Large Texts


spelling-variants.pdf (304.6 KB)

Booktitle*:

The 24th Annual ACM Symposium on Applied Computing

Event, URLs

URL of the conference:

http://www.acm.org/conferences/sac/sac2009/

URL for downloading the paper:

http://doi.acm.org/10.1145/1529282.1529669

Event Address*:

Honolulu, Hawaii, USA

Language:

English

Event Date*
(no longer used):


Organization:


Event Start Date:

8 March 2009

Event End Date:

12 March 2009

Publisher

Name*:

ACM

URL:

http://www.acm.org/

Address*:

New York, NY

Type:


Vol, No, Year, pp.

Series:


Volume:


Number:


Month:

March

Pages:

1724-1731

Year*:

2009

VG Wort Pages:


ISBN/ISSN:

978-1-60558-166-8

Sequence Number:


DOI:

10.1145/1529282.1529669



Note, Abstract, ©


(LaTeX) Abstract:

We consider the following spelling variants clustering problem: Given a list
of distinct words, called lexicon, compute (possibly overlapping) clusters of
words which are spelling variants of each other. This problem naturally arises
in the context of error-tolerant full-text search of the following kind: For
a given query, return not only documents matching the query words exactly but
also those matching their spelling variants. This is the inverse of the
well-known "Did you mean: ... ?" web search engine feature, where the error
tolerance is on the side of the query, and not on the side of the documents.

We combine various ideas from the large body of literature on approximate
string searching and spelling correction techniques to a new algorithm for the
spelling variants clustering problem that is both accurate and very efficient
in time and space. Our largest lexicon, containing roughly 10 million words,
can be processed in about 16 minutes on a standard PC using 10 MB of
additional space. This beats the previously best scheme by a factor of two in
running time and by a factor of more than ten in space usage. We have
integrated our algorithms into the CompleteSearch engine in a way that
achieves error-tolerant search without significant blowup in neither index size
nor query processing time.

Keywords:

Spelling variants, approximate string matching, error-tolerant search


Personal Comments:

Verlagsversion? ACHTUNG Copyright : Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SAC’09 March 8-12, 2009, Honolulu, Hawaii, U.S.A.
Copyright 2009 ACM 978-1-60558-166-8/09/03 ...$5.00.

Download
Access Level:

Public

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

Algorithms and Complexity Group

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort



BibTeX Entry:

@INPROCEEDINGS{bastcelikik2009sac,
AUTHOR = {Celikik, Marjan and Bast, Holger},
EDITOR = {Shin, Dongwan},
TITLE = {Fast Error-Tolerant Search on Very Large Texts},
BOOKTITLE = {The 24th Annual ACM Symposium on Applied Computing},
PUBLISHER = {ACM},
YEAR = {2009},
PAGES = {1724--1731},
ADDRESS = {Honolulu, Hawaii, USA},
MONTH = {March},
ISBN = {978-1-60558-166-8},
DOI = {10.1145/1529282.1529669},
}


Entry last modified by Anja Becker, 03/04/2010
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
[Library]
Created
02/03/2009 03:57:27 PM
Revisions
3.
2.
1.
0.
Editor(s)
Anja Becker
Anja Becker
Marjan Celikik
Marjan Celikik
Edit Dates
04.03.2010 11:11:04
01.03.2010 14:24:50
03/27/2009 02:50:45 PM
02/03/2009 03:57:27 PM
Show details for Attachment SectionAttachment Section
Hide details for Attachment SectionAttachment Section

View attachments here:


File Attachment Icon
spelling-variants.pdf