Campus Event Calendar

Event Entry

What and Who

Reducing Data Movement to Accelerate Machine Learning

Saurabh Agarwal
University Wisconsin-Madison

Saurabh is a Fifth year PhD student at University of Wisconsin-Madison. He works in the area of building Systems for Machine Learning. His work involves building new systems for emerging machine learning workloads to make training and inference faster, scalable and more efficient. Several of his works have been published in MLSys, Neurips, ICML, SOSP and Eurosys. 
AG Audience

Date, Time and Location

Wednesday, 8 May 2024
60 Minutes
E1 5


Training and Inference of Machine Learning jobs has become a dominant workload in data centers. In this talk I will first show how existing system designs can result in communication being a bottleneck -- specifically in the context of distributed training of ML models. Subsequently, I will introduce Bagpipe, a system that improves the training throughput of recommendation models by reducing remote embedding access overhead. BagPipe builds an oracular cache with the aid of our novel lookahead algorithm and realizes up to 5.6x improvement in training throughput while providing the same convergence and reproducibility guarantees as synchronous training. Finally, I will present CHAI (Clustered Head Attention for Inference), a new inference time method which reduces the memory bandwidth bottleneck of LLM inference. CHAI dynamically removes redundant heads in multi-head attention thus improving the latency for LLM inference by up to 1.7x and the size of K,V cache by up to 20%.


Claudia Richter
+49 681 9303 9103
--email hidden
passcode not visible
logged in users only

Claudia Richter, 05/07/2024 15:59
Claudia Richter, 05/06/2024 13:35 -- Created document.