Campus Event Calendar: Marjan Celikik (06/18/2008 in E1 4/022)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Fast Error-Tolerant Search on Very Large Texts

Marjan Celikik

Max-Planck-Institut für Informatik - D1

AG1 Mittagsseminar (own work)

AG 1

AG Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Wednesday, 18 June 2008

14:00

30 Minutes

E1 4

022

Saarbrücken

Abstract

We consider the following spelling variants clustering problem:

Given a list of distinct words, called lexicon, compute
(possibly overlapping) clusters of words which are spelling
variants of each other. This problem naturally arises in
the context of error-tolerant full-text search of the following
kind: For a given query, return not only documents matching
the query words exactly but also those matching their
spelling variants. This is the inverse of the well-known ”Did
you mean: ... ?” web search engine feature, where the error
tolerance is on the side of the query, and not on the side of
the documents.
We combine various ideas from the the large body of literature
on approximate string searching and spelling correction
problems to a new algorithm for the spelling variants
clustering problem that is both accurate and very efficient
in time and space. Our largest lexicon, containing roughly
25 million words, can be processed in about 18 minutes on
a standard PC using 10 MB of additional space. This beats
the previously best scheme by a factor of two in running time
and by a factor of more than ten in space usage. We have
integrated our algorithms into the CompleteSearch engine in
a way that achieves error-tolerant search without significant
blowup in either index size or query processing time.

Contact

Marjan Celikik

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Tags, Category, Keywords and additional notes

Keywords, Tags:

Spelling Variants, Approximate String Matching, Error-Tollerant Text Search

Marjan Celikik, 06/16/2008 19:17
Marjan Celikik, 06/09/2008 10:41 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis