MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

UDAO: A Next-Generation Cloud Data Analytics Optimizer via Large-Scale Machine Learning

Yanlei Diao
Ecole Polytechnique Paris and UMass Amherst
INF Distinguished Lecture Series

Yanlei Diao is Professor of Computer Science at the University of Massachusetts Amherst, USA and Ecole Polytechnique, France. Her research interests lie in big data analytics and scalable intelligent information systems, with a focus on optimization in cloud analytics, data stream analytics, explanation discovery, interactive data exploration, and uncertain data management. She received her PhD in Computer Science from the University of California, Berkeley in 2005.

Prof. Diao is a recipient of the 2016 ERC Consolidator Award, 2013 CRA-W Borg Early Career Award (one female computer scientist selected each year for outstanding contributions), IBM Scalable Innovation Faculty Award, and NSF Career Award. She has given keynote speeches at the ACM DEBS Conference, the ExploreDB workshop, and the Distinguished Lecture Series at the IBM Almaden Research Center, the University of Texas at Austin and Technische Universitaet Darmstadt. She has served as Editor-in-Chief of the ACM SIGMOD Record, Associate Editor of ACM TODS, Chair of the ACM SIGMOD Research Highlight Award Committee, and member of the SIGMOD and PVLDB Executive Committees. She was PC Co-Chair of IEEE ICDE 2017 and ACM SoCC 2016, and served on the organizing committees of SIGMOD, PVLDB, and CIDR, as well as on the program committees of many international conferences and workshops.


http://www.lix.polytechnique.fr/~yanlei.diao/
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI  
MPI Audience
English

Date, Time and Location

Thursday, 16 December 2021
12:00
60 Minutes
Virtual talk
Virtual talk
Saarbrücken

Abstract

Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take task objectives such as user performance goals and budgetary constraints and automatically configure an analytical job to achieve these objectives. This talk presents UDAO, a Unified Data Analytics Optimizer, that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of configurations to reveal tradeoffs between different objectives, recommends a new cluster configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. Such optimization is further enabled by a Deep Learning-based modeling approach that can learn a model for each user objective as complex as necessary for the underlying computing environment. Detailed experiments using a Spark-based prototype and benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. Compared to Ottertune, a state-of-the-art performance tuning system, UDAO recommends Spark configurations that yield 26%-49% reduction of running time of the TPCx-BB benchmark while adapting to different user preferences on multiple objectives. This talk ends by outlining remaining research challenges in automated resource management and performance optimization for cloud data analytics.

Contact

Gerhard Weikum
+49 681 9325 5000
--email hidden

Virtual Meeting Details

Zoom
927 4617 9996
passcode not visible
logged in users only

Petra Schaaf, 12/09/2021 13:53 -- Created document.