MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Models and indices for integrating unstructured data with a relational database

Sunita Sarawagi
IIT Bombay
Talk
AG 1, AG 2, AG 3, AG 4, AG 5  
MPI Audience

Date, Time and Location

Monday, 13 September 2004
15:00
-- Not specified --
46.1 - MPII
Rotunde 4th floor
Saarbrücken

Abstract

Database systems are islands of structure in a sea of unstructured
data sources.  Several real-world applications now need to create
bridges for smooth integration of semi-structured sources with
existing structured databases for seamless querying and mining.  This
integration requires extracting structured column values from the
unstructured source and mapping them to known database entities.
Existing methods of data integration do not effectively exploit the
wealth of information available in multi-relational entities.

We present statistical models for co-reference resolution and
information extraction in a database setting.  We then go over the
performance challenges of training and applying these models
efficiently over very large databases.  This requires us to break open
a black box statistical model and extract predicates over indexable
attributes of the database. We show how to extract such predicates for
several classification models, including naive Bayes classifiers and
support vector machines.  We extend these indexing methods for
supporting similarity predicates needed during data integration.

Contact

Gerhard Weikum
0681/9325-500
--email hidden
passcode not visible
logged in users only

Tags, Category, Keywords and additional notes

Homepage:
http://www.it.iitb.ac.in/~sunita/

Biography:

Sunita Sarawagi researches in the fields of databases, data mining,
and machine learning.  She is associate professor at IIT Bombay. Prior
to that she was a research staff member at IBM Almaden Research Center.
She got her PhD in databases from the University of California at
Berkeley and a bachelors degree from IIT Kharagpur.  She was visiting
associate professor at CMU Jan-May 2004.  She has several publications
in international conferences on databases and data mining and several
patents.  She has served as program committee member for ACM SIGMOD,
VLDB, ACM SIGKDD, IEEE ICDE and ICML conferences and is editor-in-chief
of the ACM SIGKDD newsletter.

Uwe Brahm, 09/12/2004 01:08
Uwe Brahm, 09/12/2004 01:07 -- Created document.