Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:New algorithms for map-reduce as well as SQL join and group-by
Speaker:Goetz Graefe
coming from:Google Madison
Speakers Bio:Goetz Graefe is a world-leading database researcher best known for his pioneering contributions on query processing and optimization. The Volcano 

system that he developed in the early 1990s has been a blueprint for the query execution engines of widely deployed database systems, 
both academic and commercial ones. 

Goetz recently joined Google as a senior researcher. 
Previously, he was with Hewlett-Packard Laboratories since 2007, where he led a research group focused on transactional indexing and file storage. 
Prior to his time at HP Labs, he spent 12 years as a software architect 
in product development at Microsoft. Both query optimization and query execution of Microsoft's SQL Server are based on his designs. 
Even earlier, he was a professor at Oregon State University and Portland State University. 
Goetz obtained his doctoral degree from the University of Wisconsin and his diploma degree from the University of Braunschweig. 

Goetz is the author of numerous influential papers and surveys as well as patents. He received the ACM SIGMOD Test-of-Time Award in 2000 for work on parallel  query execution, the IEEE ICDE Influential Paper Award in 2005 for extensible query processing, and the ACM Software Systems Award in 2009 for his contributions to the Gamma database machine project. Goetz has been an  HP Fellow.

Event Type:MPI-Kolloquium
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Expert Audience
Language:English
Date, Time and Location
Date:Friday, 12 August 2016
Time:15:00
Duration:60 Minutes
Location:Saarbrücken
Building:E1 4
Room:024
Abstract
The core of map-reduce systems are shuffle operations, sorting, and matching algorithms. Our focus here is on the algorithms that perform local map and reduce operations after shuffling data across the cluster interconnect. These local operations require algorithms quite similar to SQL join and group-by operations.

Database query processing traditionally relies on three types of algorithms for join and grouping operations. For joins, index nested loops join exploits an index on its inner input, merge join exploits sorted inputs, and hash join exploits differences in the sizes of the join inputs. For grouping, an index-based algorithm has been used in the past whereas today sort- and hash-based algorithms prevail. Cost-based query optimization chooses the most appropriate algorithm for each query and for each operation. Unfortunately, mistaken algorithm choices during compile-time query optimization are common yet expensive to investigate and to resolve.

Our goal is to end mistaken choices among join algorithms and among grouping algorithms by replacing the three traditional types of algorithms with a single one. Like merge join, this new join algorithm exploits sorted inputs. Like hash join, it exploits different input sizes for unsorted inputs. In fact, for unsorted inputs, the cost functions for recursive hash join and for hybrid hash join have guided our search for the new join algorithm. In consequence, the new join algorithm can replace both merge join and hash join in a database management system.

Similarly, in map-reduce systems, the new algorithm can replace all existing local algorithms for map and reduce operations.

Results from an implementation of the core algorithm are reported.

Bio: Goetz Graefe recently joined Google as a Principal Researcher. Previously, he had been an HP Fellow in HP Labs, an architect within Microsoft SQL Server, and a professor in Boulder CO and Portland OR. He has worked on academic research, technology transfer, product design, and release management; his areas of interest include database query optimization, dataflow query execution, indexing, transactions, concurrency control, logging, and recovery.
Contact
Name(s):Petra Schaaf
Phone:5000
EMail:--email address not disclosed on the web
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Note:
Attachments, File(s):

Created by:Petra Schaaf/AG5/MPII/DE, 08/11/2016 11:11 AMLast modified by:Uwe Brahm/MPII/DE, 11/24/2016 04:13 PM
  • Petra Schaaf, 08/11/2016 11:37 AM -- Created document.