MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D1, D2, D3, INET, D4, D5, D6

What and Who

A configurable SQL Query Generator that Learns from Existing Workloads: project review

Oleksandr Marmaliuk
Odesa National Mechnikov University
PhD Application Talk
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI  
AG Audience
English

Date, Time and Location

Monday, 27 January 2025
13:00
30 Minutes
Virtual talk
zoom

Abstract

The design of SQL is founded on three-valued logic (3VL), rather than the conventional two-valued Boolean logic (2VL). Alongside true and false, 3VL introduces an additional value, unknown, to account for nulls. While considered essential for SQL’s expressiveness, it is frequently criticized for producing unintuitive query behavior and leading to programmer errors. We present a configurable and extensible random SQL query generator. Its main strengths lie in its adaptability to various SQL fragments and its capacity to simulate existing query workloads. SQL syntax is modeled as a graph, which is then transformed into a Markov chain, with generated queries corresponding to runs of this chain. A key component of the generator is a module that trains the Markov chain on a given query workload, learning edge probabilities to ensure that generated queries resemble the original workload. This generator can be utilized to identify errors in SQL implementations or to assess the proportion of queries within a workload that could create portability challenges across different DBMSs. Additionally, we report on testing a recent hypothesis suggesting that SQL’s three-valued logic for handling nulls can be replaced by standard Boolean logic. While this holds true for the vast majority of TPC benchmark queries, we now confirm its validity for a substantial set of generated queries.

Contact

Ina Geisler
+49 681 9325 1802
--email hidden

Virtual Meeting Details

Zoom
passcode not visible
logged in users only

Ina Geisler, 01/27/2025 09:12
Ina Geisler, 01/24/2025 10:46 -- Created document.