Chen, Z. (CSE) – GPU Subgroup Semantics for Portable High-Performance Kernels
Hybrid Event
Modern high-performance GPU kernels increasingly rely on subgroup-level execution, including subgroup-level communication, subgroup operations, and matrix operations. These features are essential for workloads such as matrix multiplication and FlashAttention, but their language-level guarantees remain difficult to reason about. Existing programming models often leave unclear which threads participate in subgroup operations, when subgroup threads are required […]