Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Estimating the Selectivity of Approximate String Queries
Speaker:Arturas Mazeika
coming from:Uni Bozen
Speakers Bio:
Event Type:Talk
Visibility:D5, RG2
We use this to send out email in the morning.
Level:AG Audience
Language:English
Date, Time and Location
Date:Thursday, 29 November 2007
Time:11:30
Duration:60 Minutes
Location:Saarbr├╝cken
Building:E1 4
Room:433
Abstract
Approximate queries on string data are important, due to the prevalence
of such data in databases and various conventions and errors in string
data. We present the VSol estimator, a novel technique for estimating
the selectivity of approximate string queries. The VSol estimator is
based on inverse strings and makes the performance of the selectivity
estimator independent of the number of strings.  To get inverse
strings
we decompose all database strings into overlapping substrings of
length
q (q-grams) and then associate each q-gram with its inverse string:
the IDs of all strings that contain the q-gram.  We use signatures to
compress inverse strings, and clustering to group similar signatures.

We study our technique analytically and experimentally.  The space
complexity of our estimator only depends on the number of
neighborhoods
in the database and the desired estimation error.  The time to
estimate
the selectivity is independent of the number of database strings
and linear wrt the length of the query string.  We give a detailed
empirical performance evaluation of our solution for synthetic and
real world datasets. We show that VSol is effective for large skewed
databases of short strings.

The talk is based on the paper that Divesh Srivastava, Nick Koudas,
Mike Boehlen and I published this year in TODS.
Contact
Name(s):Gerhard Weikum
Phone:500
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Note:
Attachments, File(s):
  • Petra Schaaf, 11/23/2007 01:50 PM
  • Petra Schaaf, 11/23/2007 01:46 PM -- Created document.