MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Information Discovery in Large Complex Datasets

Julia Stoyanovich
University of Pennsylvania
SWS Colloquium

Julia Stoyanovich is a Visiting Scholar at the University of Pennsylvania. Julia holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics and Statistics from the University of Massachusetts at Amherst. After receiving her B.S. Julia went on to work for two start-ups and one real company in New York City, where she interacted with, and was puzzled by, a variety of massive datasets. Julia's research focuses on modeling and exploring large datasets in presence of rich semantic and statistical structure. She has recently worked on personalized search and ranking in social content sites, rank-aware clustering in large structured datasets that focus on dating and restaurant reviews, data exploration in repositories of biological objects as diverse as scientific publications, functional genomics experiments and scientific workflows, and representation and inference in large datasets with missing values.
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Expert Audience
English

Date, Time and Location

Thursday, 15 March 2012
10:30
60 Minutes
G26
206
Kaiserslautern

Abstract

The focus of my research is on enabling novel kinds of interaction between the user and the information in a variety of digital environments, ranging from social content sites, to digital libraries, to the Web. In this talk, I will give an overview of my research, and will then present two recent lines of work that focus on information discovery in two important application domains.

In the first part of this talk, I will present an approach for tracking and querying fine-grained provenance in data-intensive workflows. A workflow is an encoding of a sequence of steps that progressively transform data products. Workflows help make experiments reproducible, and may be used to answer questions about data provenance -- the dependencies between input, intermediate, and output data. I will describe a declarative framework that captures fine-grained dependencies, enabling novel kinds of analytic queries, and will demonstrate that careful design and leveraging distributed processing make tracking and querying fine-grained provenance feasible.

In the second part of this talk, I will discuss personalized search and ranking on the Social Web. Social Web users provide information about themselves in stored profiles, register their relationships with other users, and express their preferences with respect to information and products. I will argue that information discovery should account for a user's social context, and will present network-aware search – a novel search paradigm in which result relevance is computed with respect to a user's social network. I will describe efficient algorithms appropriate for this setting, and will show how social similarities between users may be leveraged to make processing more efficient.

Contact

Brigitta Hansen
0681 93039102
--email hidden

Video Broadcast

Yes
Saarbrücken
E1 5
5th floor
passcode not visible
logged in users only

Uwe Brahm, 03/01/2012 11:43
Brigitta Hansen, 02/29/2012 13:45 -- Created document.