Squeezing Sparsity: Real-Time Data Compression for AI at Collider Experiments

Author: Denis Avetisyan

A new hardware generator efficiently compresses sparse data streams, unlocking the potential of graph neural networks for high-speed data analysis in particle physics and beyond.

Data originating from detector frontends, such as those within the Belle II ECL, undergoes sparsity compression as a crucial step toward efficient processing within dataflow accelerators.

This work details a configurable FPGA-based architecture for low-latency stream compaction, specifically targeting real-time processing in applications like the Belle II experiment.

Increasing data rates and algorithmic complexity in high-energy physics experiments pose significant challenges for real-time data processing. This paper introduces a novel hardware architecture for ‘Real-Time Stream Compaction for Sparse Machine Learning on FPGAs’, designed to address these demands by efficiently preprocessing sparse data streams for accelerator-based machine learning. Our approach utilizes a configurable, hierarchical compression pipeline to reduce data volume and optimize throughput, enabling low-latency inference with Graph Neural Networks in first-level trigger systems, demonstrated within the Belle II experiment. Will this scalable dataflow architecture unlock new possibilities for real-time data analysis across a broader range of large-scale scientific applications?

Unveiling Patterns in the Data Deluge

The Belle II experiment, operating at the SuperKEKB accelerator, produces a data stream of unprecedented volume – roughly 40 terabytes per second. This immense rate stems from the high luminosity of SuperKEKB, designed to maximize particle interactions, and the complex detectors employed to record the resulting particles’ trajectories and energies. Simply storing this raw data is impractical; the resulting digital archive would quickly become unmanageable and prohibitively expensive. Consequently, a sophisticated real-time data reduction system is essential, acting as a critical first step in the analysis pipeline. This system must swiftly identify and retain only those events likely to contain valuable physics information, effectively filtering out the vast majority of uninteresting collisions before they overwhelm the storage and processing capabilities. The challenge lies not only in the sheer data volume but also in doing so without introducing significant biases or losing rare, but crucial, signals.

The sheer volume of data produced by modern particle physics experiments, like Belle II, presents a significant hurdle to uncovering new physics. Conventional event selection techniques, reliant on sequentially analyzing each collision, are increasingly unable to keep pace with the data rates-reaching terabytes per second. This inability to process data quickly enough doesn’t simply mean discarding information; it directly diminishes the experiment’s sensitivity to rare and subtle signals indicative of new particles or phenomena. As interesting events become a smaller fraction of the total data stream, the probability of missing them increases, effectively obscuring the very discoveries the experiment aims to make. Consequently, innovative approaches to real-time data reduction are not merely a technical necessity, but a fundamental requirement for maximizing the scientific return of these ambitious endeavors.

The sheer volume of data produced by modern particle physics experiments, like Belle II, demands exceptionally efficient first-level trigger systems. These systems function as a critical gatekeeper, sifting through millions of potential events per second to identify those most likely to contain new physics. Without this initial, rapid selection, valuable signals would be irrevocably lost within the overwhelming background noise. Traditional methods, relying on fixed criteria, are increasingly unable to cope with the data rate, necessitating intelligent triggers capable of dynamically adapting to event characteristics. This pre-selection process drastically reduces the amount of data that needs to be stored and analyzed, preserving computational resources and ultimately maximizing the experiment’s sensitivity to rare and subtle phenomena.

The sheer volume of data produced by modern particle physics experiments, like Belle II, demands a fundamental shift in how events are selected for further analysis. Traditional trigger systems, designed to quickly discard uninteresting data, are increasingly unable to keep pace with the rising data rates, potentially obscuring rare and significant physics signals. This limitation has spurred investigation into novel trigger architectures, moving beyond fixed, pre-programmed criteria toward systems capable of ‘intelligent’ decision-making. These advanced systems leverage machine learning algorithms and customizable hardware to adapt to changing experimental conditions and identify subtle patterns indicative of new physics, effectively sifting through the data deluge to prioritize the most promising events for detailed reconstruction and analysis. The development of such adaptable and efficient triggers is not merely a technological challenge, but a crucial step in maximizing the scientific return from these complex experiments.

The Belle II first-level trigger system employs a Graph Neural Network ECL Trigger Module to efficiently compress sparse data for real-time event selection.

Harnessing Relational Data with Dynamic Graph Neural Networks

Dynamic Graph Neural Networks (Dynamic GNNs) address the challenges of real-time event processing in high-energy physics by directly incorporating the relational information inherent in particle interactions. Traditional event selection relies on pre-defined criteria, whereas Dynamic GNNs operate on event data represented as graphs, where particles are nodes and their interactions are edges. This graph-based approach allows the network to learn and exploit complex correlations between particles without requiring explicit feature engineering. The ‘dynamic’ aspect refers to the network’s ability to process variable-sized graphs representing events with differing numbers of particles and interactions, and to adapt its processing based on the specific topology of each event. This capability is crucial for identifying rare or unusual events amidst a high background rate, as the network can prioritize and focus on the most relevant relationships between particles in real-time.

Traditional event selection in particle physics relies on pre-defined criteria, which may be suboptimal for events deviating from expected patterns. Dynamic Graph Neural Networks (Dynamic GNNs) address this limitation by constructing a graph representation of each individual event and processing it independently. This allows the network to learn event-specific features and adapt its selection criteria accordingly. Consequently, Dynamic GNNs can improve selection efficiency by identifying interesting events that might be missed by fixed algorithms, and reduce false positive rates by better characterizing background noise within each event’s unique context. The adaptability stems from the network’s ability to weigh the importance of different particles and interactions based on the event’s topology, rather than applying a uniform standard.

GraVNet, a Dynamic Graph Neural Network architecture, facilitates efficient trigger data processing through its inherent design. It employs a message-passing scheme where nodes represent particles and edges represent their interactions, allowing the network to dynamically adjust its receptive field based on event topology. This adaptability contrasts with static graph neural networks, which have fixed graph structures. GraVNet utilizes learnable edge and node features, combined with a relativistic velocity update, to accurately model particle trajectories and interactions. The architecture’s flexibility stems from its ability to process graphs of varying sizes and connectivities, making it well-suited for the irregular and complex patterns found in high-energy physics event data. This allows for parallelization and scalable processing of large datasets, crucial for real-time trigger systems.

Representing event data as a graph allows the system to model particle interactions and their relationships, facilitating the identification of complex patterns beyond the scope of traditional methods. Nodes in the graph represent individual particles or detectors, while edges define the interactions between them; this structure enables the network to learn features based on the connectivity and properties of these interactions. By propagating information across the graph, the system can assess the significance of each event and prioritize those with characteristics indicative of interesting physics, resulting in improved accuracy in identifying potentially valuable data for further analysis.

The Belle II first-level trigger system's GNN-ETM module receives events with varying data density, as shown by the histogram of the fraction of non-zero data values per event. — The Belle II first-level trigger system’s GNN-ETM module receives events with varying data density, as shown by the histogram of the fraction of non-zero data values per event.

Optimizing for Speed: Hardware Implementation and Data Compression

Hardware implementation of Dynamic Graph Neural Networks (GNNs) necessitates the use of Field Programmable Gate Arrays (FPGAs) to meet the demands of real-time processing and high throughput. The AMD Ultrascale XCVU190 FPGA is utilized as a platform for accelerating GNN computations due to its balance of logic resources and power efficiency. Software-defined implementations on CPUs and GPUs often lack the performance required for complex dynamic graphs, whereas FPGAs enable parallel processing and custom data paths optimized for graph algorithms. This hardware acceleration is particularly crucial for applications involving rapidly changing graph structures and large datasets, where maintaining low latency is paramount.

Sparsity compression techniques capitalize on the fact that trigger data in dynamic graph neural networks often contains a high proportion of zero-valued elements. By identifying and eliminating redundant data transfers and computations associated with these zero values, the overall computational load is substantially reduced. Specifically, implementation of this compression module achieves a 324x reduction in computational load when compared to processing the full, uncompressed dataset, representing a significant optimization for hardware acceleration of dynamic GNNs.

Chisel is a hardware description language utilized to construct the sparsity compression module due to its ability to express hardware structure and behavior concisely. This allows for the creation of a customized dataflow architecture optimized for the specific characteristics of trigger data. The language’s support for parameterized modules and automatic generation of Verilog facilitates efficient design exploration and rapid prototyping. This implementation enables the reduction of data volume through the identification and elimination of redundant information before processing, directly contributing to the observed 324x reduction in computational load. The resulting hardware module is then integrated into the larger system for synthesis and verification using tools like Vivado and ModelSim.

The hardware design underwent synthesis, verification, and validation using Vivado and ModelSim to ensure correct functionality and performance. Operational testing achieved a target frequency of 500 MHz; however, certain configurations resulted in a lowest achieved frequency of 277 MHz. Throughout testing, the system consistently maintained a latency overhead of under 60 ns, indicating efficient data processing within the defined performance parameters.

This sparsity compression hardware module utilizes <span class="katex-eq" data-katex-display="false">N_I = 5</span>, <span class="katex-eq" data-katex-display="false">N_O = 2</span>, and <span class="katex-eq" data-katex-display="false">D = 5</span> to efficiently compress data by selectively removing redundant information. — This sparsity compression hardware module utilizes $N_I = 5$ , $N_O = 2$ , and $D = 5$ to efficiently compress data by selectively removing redundant information.

GNN-ETM: A Leap Forward in Event Selection

The Belle II experiment benefits from a novel event selection module, the Graph Neural Network ECL Trigger Module (GNN-ETM), which substantially improves data filtering efficiency. This system harnesses recent progress in Dynamic Graph Neural Networks, allowing it to adapt to the complex patterns within particle collision data. Crucially, the GNN-ETM isn’t solely a software innovation; it’s coupled with dedicated hardware optimization, enabling real-time processing at the experiment’s high luminosity of $5.1 \times 10^{34} \text{ cm}^{-2} \text{s}^{-1}$ . By representing electromagnetic calorimeter signals as nodes within a graph, the network efficiently identifies potentially interesting events, reducing the data rate from $30 \text{ kHz}$ while maintaining a critical latency of $4.4 \ \mu \text{s}$ – a feat previously unattainable with traditional methods. This refined event selection ultimately allows physicists to more effectively sift through vast amounts of data, enhancing the search for rare and elusive new physics.

The Graph Neural Network ECL Trigger Module (GNN-ETM) begins its event selection process with data derived from Trigger Cells – meticulously preprocessed signals originating from the Electromagnetic Calorimeter. These Trigger Cells represent a condensed form of the raw calorimeter data, effectively summarizing energy deposits and spatial information. This pre-processing step is crucial, as it reduces the computational burden on the graph neural network while retaining the essential characteristics needed to identify potential physics signals. The GNN then leverages this summarized information, treating each Trigger Cell as a node within a graph, and establishes connections between neighboring cells to capture the spatial relationships crucial for distinguishing genuine events from background noise. This graph-based approach allows the GNN-ETM to efficiently analyze complex patterns within the calorimeter data, ultimately enhancing the selection of meaningful events for further investigation.

The Belle II experiment operates at an exceptionally high instantaneous luminosity of $5.1 \times 10^{34} \text{ cm}^{-2} \text{s}^{-1}$ , generating a substantial amount of data that requires efficient filtering. Integrating the Graph Neural Network ECL Trigger Module (GNN-ETM) into the first-level trigger system addresses this challenge by drastically reducing the data rate while crucially preserving the integrity of physics signals. The system successfully maintains a Data Acquisition (DAQ) event readout rate of 30 kHz, effectively sifting through the immense data stream. Importantly, this processing occurs within a strict time constraint, satisfying the demanding 4.4 µs first-level trigger latency deadline, which is essential for real-time data analysis and event selection at such high collision rates.

The enhanced event selection capabilities facilitated by the GNN-ETM directly translate to a heightened ability to probe the fundamental constituents of the universe. By significantly reducing irrelevant background noise while meticulously preserving signals indicative of new physics, researchers can now delve into datasets with unprecedented clarity. This improved signal-to-noise ratio is crucial for detecting exceedingly rare decay processes and subtle deviations from established theoretical models. Consequently, investigations into areas like CP violation, searches for dark matter candidates, and precision measurements of Standard Model parameters benefit from a substantially increased sensitivity, potentially revealing previously inaccessible insights into the workings of nature and pushing the boundaries of particle physics knowledge.

The presented work focuses on extracting meaningful information from data streams, a challenge elegantly addressed through configurable hardware. This approach mirrors a fundamental Stoic principle: understanding the natural order of things to navigate them effectively. As Marcus Aurelius observed, “The impediment to action advances action. What stands in the way becomes the way.” Similarly, the inherent sparsity of the data, initially appearing as an obstacle, becomes the driving force behind the compression technique. By recognizing and leveraging this pattern, the system achieves low-latency, real-time processing, effectively turning a limitation into an advantage for graph neural network applications within the Belle II experiment. The careful design of the dataflow architecture is, in essence, a logical response to the observed constraints of the system.

Where Does the Stream Lead?

The demonstrated capacity to dynamically compact sparse data streams offers a tempting illusion of control – the ability to sculpt information flow to the exigencies of real-time analysis. However, the inherent limitations of any hardware-centric approach remain. This work, while successful within the Belle II context, highlights the persistent tension between configurable acceleration and adaptability. Future explorations must address the cost of reconfiguration – the energy and latency overhead of shifting the compression strategy to accommodate evolving data characteristics or novel graph neural network architectures. Simply achieving low latency is insufficient; maintaining predictable low latency under changing conditions is the true challenge.

A crucial, often overlooked, aspect lies in the interpretability of the compression itself. While effective at reducing data volume, the method provides little insight into which data points are being discarded, or the potential bias introduced by the compression algorithm. A fruitful avenue for research involves developing compression schemes that offer a quantifiable measure of information loss, allowing physicists to assess the impact on downstream analysis. The pursuit of efficiency should not come at the expense of scientific rigor.

Ultimately, this work represents a localized solution to a broader problem – the relentless increase in data rates from modern collider experiments. The long-term trajectory likely involves a hybrid approach, combining configurable hardware accelerators with intelligent, software-defined data selection strategies. The goal isn’t merely to process more data, but to extract more meaning from it, a task that demands not just computational power, but also a deeper understanding of the underlying physics.

Original article: https://arxiv.org/pdf/2602.23281.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Unveiling Patterns in the Data Deluge

Harnessing Relational Data with Dynamic Graph Neural Networks

Optimizing for Speed: Hardware Implementation and Data Compression

GNN-ETM: A Leap Forward in Event Selection

Where Does the Stream Lead?

See also: