MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Democratizing LLM Access: Techniques for Cost-Effective and Efficient Usage

Attreyee Mukherjee
University of Mumbai
PhD Application Talk
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI  
AG Audience
English

Date, Time and Location

Tuesday, 28 January 2025
13:00
30 Minutes
Virtual talk
zoom

Abstract

Large Language Models (LLMs) have revolutionized various domains, but their high computational costs and inefficiencies in repetitive tasks pose significant challenges. My work explores two optimization techniques—batching and caching—to reduce costs and improve efficiency. Batching involves grouping multiple queries for simultaneous processing, with methods such as independent query grouping and single-example followed by batched queries. Caching leverages reusable outputs through techniques like full prompt or prefix caching, key-value caching, and similarity-based caching. I evaluated the batching techniques on different workloads and context sizes and found that a consistent reduction in monetary costs is seen, ranging from 1.5 to 4 times. Future work involves integrating advanced batching techniques, caching methods, and other optimization strategies into end-to-end agentic systems, as these approaches, when applied to LLM workflows—such as autonomous systems that generate, evaluate, and act on outputs—demonstrate significant cost savings, thereby bridging the gap between LLM performance and real-world constraints to enhance accessibility and efficiency.

Contact

Ina Geisler
+49 681 9325 1802
--email hidden

Virtual Meeting Details

Zoom
passcode not visible
logged in users only

Ina Geisler, 01/27/2025 09:22
Ina Geisler, 01/24/2025 11:32 -- Created document.