MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

High-Throughput and Predictable VM Scheduling for High-Density Workloads

Manohar Vanga
MMCI
SWS Student Defense Talks - Thesis Defense
SWS  
Public Audience
English

Date, Time and Location

Thursday, 26 November 2020
15:30
60 Minutes
G26
111
Kaiserslautern

Abstract

In the increasingly competitive public-cloud marketplace, improving the efficiency of data centers is a major concern. One way to improve efficiency is to consolidate as many VMs onto as few physical cores as possible, provided that performance expectations are not violated. However, as a prerequisite for increased VM densities, the hypervisor’s VM scheduler must allocate processor time efficiently and in a timely fashion. As we show in this thesis, contemporary VM schedulers leave substantial room for improvements in both regards when facing challenging high-VM-density workloads that frequently trigger the VM scheduler.


As root causes, we identify (i) high runtime overheads and (ii) unpredictable scheduling heuristics. To better support high VM densities, we propose Tableau, a VM scheduler that guarantees a minimum processor share and a maximum bound on scheduling delay for every VM in the system. Tableau combines a low-overhead, core-local, table-driven dispatcher with a fast on-demand table-generation procedure (triggered on VM creation/teardown) that employs scheduling techniques typically used in hard real-time systems. In an evaluation of Tableau and three current Xen schedulers on a 16-core Intel Xeon machine, Tableau is shown to improve tail latency (e.g., a 17x reduction in maximum ping latency compared to Credit) and throughput (e.g., 1.6x peak web server throughput compared to RTDS when serving 1 KiB files with a 100 ms SLA). Further, we show that, owing to its focus on efficiency and scalability, Tableau provides comparable or better throughput than existing Xen schedulers in dedicated-core scenarios as are commonly employed in public clouds today.

Another common requirement in public clouds is the ability to use idle cycles in the system to perform low-priority background work, without affecting the performance of primary VMs (which are typically paid for by customers). The primary obstacle to achieving this is the lack of strong performance guarantees for VMs, which Tableau provides. We present the design of a background scheduler that enables a lower-priority class of VMs to use any idle cycles in the system, and present results showing that they have low impact on the performance of table-driven VMs, making it practical for cloud environments.

Finally, VM churn and workload variations in multi-tenant public clouds result in changing interference patterns at runtime, resulting in performance variation. In particular, variation in last-level cache (LLC) interference has been shown to have a significant impact on virtualized application performance in cloud environments. We present a novel approach for dealing with such dynamically changing interference, which involves periodically regenerating tables that provide the same guarantees on utilization and scheduling latency for all VMs in the system, but have different LLC interference characteristics.

We present two strategies to mitigate LLC interference: a randomized approach, and one that uses performance counters to detect VMs running cache-intensive workloads and selectively mitigate interference. Our results show that randomizing tables works well for mitigating worst-case slowdowns due to cache interference, while the performance-counter-based approach requires a more robust mechanism for detecting interfering VMs in order to match the performance of the randomized approach.

Contact

--email hidden
passcode not visible
logged in users only

Maria-Louise Albrecht, 11/23/2020 15:53 -- Created document.