Loading Events

« All Events

Wang, H. (CSE) – Accelerating RTL Simulation with Specialized Graph Partitioners

March 13 @ 2:00 pm

Register transfer level (RTL) simulation is an invaluable tool for developing, debugging, verifying, and validating hardware designs. However, the performance of RTL simulation has long been a limiting factor in industry. Despite the inherent parallelism of hardware, current RTL simulators have not achieved practical performance gains due to fundamental challenges in communication, synchronization, memory bandwidth, and architectural mapping.

This dissertation addresses the RTL simulation performance problem from three complementary perspectives: optimizing simulation latency through parallelism, improving aggregate throughput via deduplication, and enabling efficient GPU acceleration with RTL-native semantics.

First, we present RepCut, a parallel RTL simulation methodology that uses replication-aided partitioning to cut circuits into balanced partitions with minimal overlaps. By replicating the overlaps, RepCut eliminates problematic data dependences between partitions and significantly reduces synchronization overhead. RepCut achieves superlinear speedups of up to 27.10x using 24 threads with only a 3.81% replication cost.

Second, we introduce Simulation Deduplication, a technique that exploits the extensive reuse of building blocks in modern hardware designs. By generating shared code for duplicated instances and carefully co-scheduling their execution, we reduce the instruction cache footprint and memory bandwidth pressure. This approach achieves up to 1.95x speedup for single simulations and 2.09x improvement in overall batch simulation throughput.

Third, we present Toucan, a GPU-accelerated RTL simulation framework that preserves RTL semantics rather than flattening designs to gate-level netlists. By leveraging native GPU arithmetic operations and introducing warp-level micro-partitioning with shuffle-based communication, Toucan achieves efficient mapping of irregular circuit topologies to GPU SIMT architectures while maintaining fast compilation times. Toucan achieves up to 4.73x speedup over the state-of-the-art GPU RTL simulator on large multi-core designs.

Together, these three approaches provide a comprehensive solution to RTL simulation performance optimization, demonstrating significant improvements over state-of-the-art commercial and open-source simulators across multiple hardware platforms and design scales.

Event Host: Haoyuan Wang, Ph.D. Candidate, Computer Science and Engineering

Advisor: Jose Renau

Zoom- https://ucsc.zoom.us/j/94044618343?pwd=xZkK8GmD28P2Vf8pbyl6aoOaNxxhya.1

Passcode- 574772

Details

Other

Room Number
BE-318

Venue