Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Estimating the Selectivity of Approximate String Queries
Speaker:Arturas Mazeika
coming from:Uni Bozen
Speakers Bio:
Event Type:Talk
Visibility:D5, RG2
We use this to send out email in the morning.
Level:AG Audience
Date, Time and Location
Date:Thursday, 29 November 2007
Duration:60 Minutes
Building:E1 4
Approximate queries on string data are important, due to the prevalence
of such data in databases and various conventions and errors in string
data. We present the VSol estimator, a novel technique for estimating
the selectivity of approximate string queries. The VSol estimator is
based on inverse strings and makes the performance of the selectivity
estimator independent of the number of strings.  To get inverse
we decompose all database strings into overlapping substrings of
q (q-grams) and then associate each q-gram with its inverse string:
the IDs of all strings that contain the q-gram.  We use signatures to
compress inverse strings, and clustering to group similar signatures.

We study our technique analytically and experimentally.  The space
complexity of our estimator only depends on the number of
in the database and the desired estimation error.  The time to
the selectivity is independent of the number of database strings
and linear wrt the length of the query string.  We give a detailed
empirical performance evaluation of our solution for synthetic and
real world datasets. We show that VSol is effective for large skewed
databases of short strings.

The talk is based on the paper that Divesh Srivastava, Nick Koudas,
Mike Boehlen and I published this year in TODS.
Name(s):Gerhard Weikum
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Attachments, File(s):
  • Petra Schaaf, 11/23/2007 01:50 PM
  • Petra Schaaf, 11/23/2007 01:46 PM -- Created document.