Mirek Riedewald received his PhD from the University of California at Santa Barbara, USA. Currently he is an Associate Professor in the College of Computer and Information Science at Northeastern University in Boston, USA. Prior to joining Northeastern University, he was a Research Associate at Cornell University. He also held visiting research positions at Microsoft Research in Redmond and at the Max Planck Institute for Informatics in Germany. Prof. Riedewald's research interests are in databases and data mining, with an emphasis on designing scalable analysis techniques for data-driven science. He has collaborated successfully with scientists from different domains, including ornithology, physics, mechanical and aerospace engineering, and astronomy. This work resulted in novel approaches for data warehousing, data stream processing, prediction, and parallel data processing using computer clusters. He is now focusing on exploratory analysis of massive observational data and on techniques for automated reconstruction of structure and dynamics of neural circuits, a crucial step toward understanding the functionality of the brain. Prof. Riedewald's work was published in the premier peer-reviewed data management research venues like ACM SIGMOD, VLDB, IEEE ICDE, and IEEE TKDE, as well as in domain science journals.
It all started with a seemingly simple request for help by the Cornell Lab of Ornithology, one of the world's leaders in research about birds and the environment. To reach beyond the thousands of regular contributors to their citizen-science programs, they wanted to leverage their vast collections of bird observation data in order to help less experienced users identify the species of an observed bird. This quickly turned into a challenging problem at the intersection of big-data management and machine learning.
The result is Merlin, a system for exploratory search in large databases. The user interacts with it by specifying probability distributions over attributes, which express imprecise conditions about the entities of interest. Merlin helps the user home in on the right query conditions by addressing three key challenges: (1) efficiently computing results for an imprecise query, (2) providing feedback about the sensitivity of the result to changes of individual conditions, and (3) suggesting new conditions. We provide an overview of Merlin, formally introduce the notion of sensitivity, and present novel algorithms for quantifying the effect of uncertainty in user-specified conditions. To support interactive responses, we also develop techniques that can deliver probability estimates within a given realtime limit. Finally, we will discuss the challenges in accurately estimating probabilities, e.g., the value of P in "The bird you are looking for is species S with probability P," and how Merlin addresses them in an interactive environment with hard real-time constraints.