Campus Event Calendar: Arturas Mazeika (11/29/2007 in E1 4/433)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Estimating the Selectivity of Approximate String Queries

Arturas Mazeika

Uni Bozen

Talk

AG 5, RG2

AG Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Thursday, 29 November 2007

11:30

60 Minutes

E1 4

433

Saarbrücken

Abstract

Approximate queries on string data are important, due to the prevalence of such data in databases and various conventions and errors in string data. We present the VSol estimator, a novel technique for estimating the selectivity of approximate string queries. The VSol estimator is based on inverse strings and makes the performance of the selectivity estimator independent of the number of strings. To get inverse strings we decompose all database strings into overlapping substrings of length q (q-grams) and then associate each q-gram with its inverse string: the IDs of all strings that contain the q-gram. We use signatures to compress inverse strings, and clustering to group similar signatures. We study our technique analytically and experimentally. The space complexity of our estimator only depends on the number of neighborhoods in the database and the desired estimation error. The time to estimate the selectivity is independent of the number of database strings and linear wrt the length of the query string. We give a detailed empirical performance evaluation of our solution for synthetic and real world datasets. We show that VSol is effective for large skewed databases of short strings. The talk is based on the paper that Divesh Srivastava, Nick Koudas, Mike Boehlen and I published this year in TODS.

Contact

Gerhard Weikum

500

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Petra Schaaf, 11/23/2007 13:50
Petra Schaaf, 11/23/2007 13:46 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis