Nov 18 – 22, 2024
America/New_York timezone

GPU Direct RDMA using SLAC’s open source DMA engine

Nov 21, 2024, 2:30 PM
15m
262A (Student Union)

262A

Student Union

Parallel Presentation RDC5: Trigger and DAQ RDC 05 - Trigger and DAQ Parallel Session

Speaker

Ryan Herbst (SLAC National Accelerator Laboratory)

Description

To support larger bandwidth detector data, systems must be able to move the data directly to the processing elements with minimal software intervention. As an example, LCLS-II operation of the ePixUHR 35K detector will generate data on the order of 250GB/s at 35kHz– far more than the existing CPU-based DAQ setup can handle. Using NVIDIA’s GPUDirect RDMA technology, we implemented a low-latency and high-throughput data flow that allows acquired data to be compressed and processed on the GPU with minimal involvement of the CPU. Our test setup involves an AMD Kintex KCU1500 and an NVIDIA RTX A5000 GPU on the same PCIe root complex. RDMA allows the KCU1500’s custom firmware to transfer data directly to the GPU, skipping the additional DMA transfer to main memory that would usually be required. We use CUDA device launchable graphs to initiate DMA transfers and process the incoming data. This allows the control flow and data processing to take place exclusively on the GPU, with the host processor taking a supervisory role. Confining control flow to the GPU using CUDA graphs resulted in a significant reduction in measured latency. This approach has the potential to support next-generation detectors required for future High Energy Physics experiments.

Primary author

Ryan Herbst (SLAC National Accelerator Laboratory)

Co-authors

Jeremy Lorelli (SLAC National Accelerator Laboratory) Larry Ruckman (SLAC National Accelerator Laboratory) Mudit Mishra (SLAC National Accelerator Laboratory)

Presentation materials