Campus Event Calendar: Satnam Singh (07/22/2024 in E1 5/029)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Running Large Language Models at Scale on Groq's LPU Machine Learning Chips

Satnam Singh

Groq (Mountain View, California, USA)

Talk

Satnam Singh is a Fellow at Groq where he applies the power of functional programming languages to the design of machine learning chips and their programming models.
Satnam Singh previously worked at Google (machine learning chips, cluster management), Facebook (Android optimization), Microsoft (parallel and concurrent programming)
and Xilinx (Lava DSL for hardware design). He started his career as an academic at the University of Glasgow (FPGA-based application acceleration and functional programming).

AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI

AG Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Monday, 22 July 2024

10:00

60 Minutes

E1 5

029

Saarbrücken

Abstract

The core technology produced at Groq are silicon chips designed to accelerate machine learning inference tasks, a compiler for programming these chips, and rack-scale deployments of foundation large language models (LLMs) with public API access. This presentation gives an overview of the Groq hardware architecture, with a focus on its deterministic characteristics that prove advantageous for achieving very low latency implementations of open weight foundation large language models (e.g. Llama3-70B, Gemma2, Mixtral 8x7B), as well as deploying large rack-scale systems at scale with predictable performance. An overview will also be given of the compiler we have developed for the Groq architecture, based on MLIR for the front end (consuming ONNX from PyTorch, as well as our own linear algebra representation, and some support for Tensorflow/JAX). The front-end of the compiler is based on MLIR, and the back-end uses a custom intermediate representation and is written in Haskell. I'll say a few words about the specific things I have worked on personally, which include the design of power management hardware features, an experimental domain specific language (DSL) for programming our chips written in Haskell, and the formal verification of our hardware using temporal logic and model checking (SystemVerilog Assertions).

You can try out our LLM implementations at http://groq.com

Contact

Claudia Richter

+49 681 9303 9103

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Claudia Richter, 07/16/2024 14:32 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis