Virtual Event

Sharma, R. (CSE) – Automatically Evolving GPU Libraries for Performance Portable AI Kernels

Name: Sharma, R. (CSE) – Automatically Evolving GPU Libraries for Performance Portable AI Kernels
Start: 2025-12-12T09:30:00-08:00
End: 2025-12-12T11:30:00-08:00

December 12 @ 9:30 am

Virtual Event

GPUs are the workhorses of modern AI, widely deployed and developed by many vendors including Apple, Qualcomm, Intel, AMD, and NVIDIA. While these GPUs all offer high compute potential, programming them effectively is difficult because they differ in performance-critical features like SIMT width, cache capacity, and memory bandwidth, demanding different optimization strategies. Tunable kernels address this by exposing parameters such as tiling dimensions and workgroup sizes, enabling per-device specialization. Yet this produces static libraries: tuned once, then frozen, degrading as new hardware emerges. We propose automatically evolving libraries that expand their tuning knowledge as new hardware emerges, with minimal impact on user experience.

To build such libraries, we first need to understand the tuning landscape. We address this through GPU Goldmines, a WebGPU-based framework for exhaustively collecting tuning data across diverse devices. Our tuned matrix multiplication kernels outperform an optimized baseline by 8.4x on average, while matrix-vector kernels achieve 93% of platform bandwidth. We find that hyper-tuning for a single GPU causes 50% performance degradation on other devices, whereas data-driven portability methods recover 88% of peak performance. These kernels are fundamental to the prefill and decode phases of LLM inference. We integrate them into llama.cpp as our evaluation platform, where they outperform CPU and Vulkan backends.

Building on this data, we are developing Living Libraries to improve performance continuously without disrupting users. This means choosing good parameters upfront, learning from real-world execution, and knowing when to keep searching versus when to stop, though hand-designed parameter spaces remain inherently bounded. To move beyond this, we extend toward LLM-based kernel evolution, where language models propose entirely new kernel variants, opening a less structured but higher potential search space.

Event Host: Rithik Sharma, Ph.D. Student, Computer Science and Engineering

Advisors: Tyler Sorensen & Yuanchao Xu

Zoom- https://ucsc.zoom.us/j/91880443682?pwd=BOOc90v0CKGj0ZMyBNZEFHHLBgmTgu.1

Passcode- 238807

Details

Date:: December 12
Time:: 9:30 am – 11:30 am
Event Category:: Ph.D. Presentations

Organizers

: Baskin School of Engineering
: Computer Science and Engineering Department

Sharma, R. (CSE) – Automatically Evolving GPU Libraries for Performance Portable AI Kernels

Details

Organizers

Related Events

Zhu, R. (ECE) – From Neuromorphic Principles to Efficient Neural Language Architectures

Singh, A. (ECE) – Quantum Key Distribution Using Entangled Pairs with Random Grouping

Tran, L. (BMEB) – Polysome Shadowing: A Long-Read Sequencing Approach to Study Translation