Campus Event Calendar

Event Entry

What and Who

New algorithms for map-reduce as well as SQL join and group-by

Goetz Graefe
Google Madison

Goetz Graefe is a world-leading database researcher best known for his pioneering contributions on query processing and optimization. The Volcano 
system that he developed in the early 1990s has been a blueprint for the query execution engines of widely deployed database systems, 
both academic and commercial ones. 

Goetz recently joined Google as a senior researcher. 
Previously, he was with Hewlett-Packard Laboratories since 2007, where he led a research group focused on transactional indexing and file storage. 
Prior to his time at HP Labs, he spent 12 years as a software architect 
in product development at Microsoft. Both query optimization and query execution of Microsoft's SQL Server are based on his designs. 
Even earlier, he was a professor at Oregon State University and Portland State University. 
Goetz obtained his doctoral degree from the University of Wisconsin and his diploma degree from the University of Braunschweig. 

Goetz is the author of numerous influential papers and surveys as well as patents. He received the ACM SIGMOD Test-of-Time Award in 2000 for work on parallel  query execution, the IEEE ICDE Influential Paper Award in 2005 for extensible query processing, and the ACM Software Systems Award in 2009 for his contributions to the Gamma database machine project. Goetz has been an  HP Fellow.
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Expert Audience

Date, Time and Location

Friday, 12 August 2016
60 Minutes
E1 4


The core of map-reduce systems are shuffle operations, sorting, and matching algorithms. Our focus here is on the algorithms that perform local map and reduce operations after shuffling data across the cluster interconnect. These local operations require algorithms quite similar to SQL join and group-by operations.

Database query processing traditionally relies on three types of algorithms for join and grouping operations. For joins, index nested loops join exploits an index on its inner input, merge join exploits sorted inputs, and hash join exploits differences in the sizes of the join inputs. For grouping, an index-based algorithm has been used in the past whereas today sort- and hash-based algorithms prevail. Cost-based query optimization chooses the most appropriate algorithm for each query and for each operation. Unfortunately, mistaken algorithm choices during compile-time query optimization are common yet expensive to investigate and to resolve.

Our goal is to end mistaken choices among join algorithms and among grouping algorithms by replacing the three traditional types of algorithms with a single one. Like merge join, this new join algorithm exploits sorted inputs. Like hash join, it exploits different input sizes for unsorted inputs. In fact, for unsorted inputs, the cost functions for recursive hash join and for hybrid hash join have guided our search for the new join algorithm. In consequence, the new join algorithm can replace both merge join and hash join in a database management system.

Similarly, in map-reduce systems, the new algorithm can replace all existing local algorithms for map and reduce operations.

Results from an implementation of the core algorithm are reported.

Bio: Goetz Graefe recently joined Google as a Principal Researcher. Previously, he had been an HP Fellow in HP Labs, an architect within Microsoft SQL Server, and a professor in Boulder CO and Portland OR. He has worked on academic research, technology transfer, product design, and release management; his areas of interest include database query optimization, dataflow query execution, indexing, transactions, concurrency control, logging, and recovery.


Petra Schaaf
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 08/11/2016 11:37 -- Created document.