When Federated Learning Falls Apart: Why Data Splits Break AI Models

Author: Denis Avetisyan

New research reveals that performance drops in decentralized learning aren’t just random, but a result of internal network structures disintegrating under non-ideal data conditions.

The aggregated global model successfully replicates the precise neural network structure learned by individual clients-as demonstrated by the prevalence of shared connections-indicating effective knowledge consolidation without introducing representational drift during the learning process.

A mechanistic analysis demonstrates that ‘circuit collapse’ in federated learning, driven by data heterogeneity, can be mitigated by increasing model sparsity.

While Federated Learning promises collaborative model training on decentralized data, its performance sharply declines with non-independent and identically distributed (Non-IID) datasets-a phenomenon poorly understood at a mechanistic level. This paper, ‘Mechanistic Analysis of Circuit Preservation in Federated Learning’, investigates this failure mode through the lens of mechanistic interpretability, revealing that Non-IID data induces ‘circuit collapse’ – the destructive interference of specialized sub-networks within the global model. By tracking these circuits in sparse neural networks, we demonstrate that data heterogeneity leads to structural divergence and degradation of these functional units. Could a focus on preserving these core circuits unlock more robust and reliable Federated Learning systems?

The Fractured Consensus: Navigating Data Heterogeneity

Federated learning, a promising technique for collaborative model training while preserving data privacy, faces significant challenges when dealing with non-independent and identically distributed (non-IID) data. Unlike traditional centralized learning where data is often homogeneous, federated learning operates on datasets residing on diverse client devices – each potentially holding a drastically different distribution of information. This inherent data heterogeneity leads to substantial performance degradation as the locally trained models on each client diverge considerably from a globally optimal solution. While the premise of federated learning is to aggregate knowledge without sharing raw data, the variance in local datasets introduces biases and instabilities during the model averaging process, ultimately hindering the overall convergence and accuracy of the collaboratively built model. Consequently, addressing this sensitivity to non-IID data is crucial for realizing the full potential of federated learning in real-world applications.

Weight divergence represents a significant obstacle in federated learning systems when dealing with non-independent and identically distributed (non-IID) data. As individual client models are trained on locally held datasets-often reflecting unique distributions-the learned model weights begin to drift apart. This phenomenon isn’t merely a matter of slight variation; substantial weight divergence actively impedes the global model’s ability to converge to an optimal solution. The aggregation process, intended to create a generalized model, becomes less effective as disparate weights pull the global model in conflicting directions, potentially leading to instability or even failure to learn. Consequently, addressing weight divergence is crucial for deploying robust and reliable federated learning applications in real-world scenarios characterized by heterogeneous data landscapes.

A robust federated learning system hinges on pinpointing the origins of weight divergence, a phenomenon where locally trained models on individual devices drift significantly from the global model. This divergence isn’t simply random noise; it’s often systematically linked to the non-independent and identically distributed (non-IID) nature of data held by each client. Investigations reveal that skewed data distributions, where certain clients possess data heavily biased towards specific classes or features, are primary drivers. Furthermore, variations in dataset size and the presence of label noise contribute to differing update magnitudes and directions during the aggregation process. Consequently, a detailed understanding of these root causes – encompassing statistical heterogeneity, systemic biases, and client-specific data characteristics – is essential for developing effective mitigation strategies and ensuring the reliable convergence of federated models across diverse and decentralized datasets.

Addressing weight divergence represents a critical advancement for federated learning systems operating in realistic, non-independent and identically distributed (non-IID) environments. When individual client models begin to drift significantly from one another during training – a phenomenon exacerbated by varying local data distributions – the global model’s performance suffers, potentially negating the benefits of decentralized training. Consequently, research is heavily focused on developing novel aggregation strategies, personalized model architectures, and data-sharing techniques that can effectively dampen these divergent tendencies. These methods aim to either constrain the individual model updates, enhance the robustness of the global aggregation process, or actively transfer knowledge between clients to promote greater consistency, ultimately enabling federated learning to function reliably even with highly heterogeneous data landscapes.

Even with non-IID data and two classes per client, specialist clients maintain structurally divergent circuits, as evidenced by consistently low average Intersection over Union (IoU) values.

Circuit Degradation: The Anatomy of Divergence

Circuit collapse, observed in federated learning, refers to the degradation of functional sub-networks within client models during distributed training. This manifests as a loss of coherent feature representation and a reduction in the ability of these sub-networks to perform their intended computations. The phenomenon is posited as a primary driver of weight divergence, where client models increasingly deviate from each other and the global model, hindering convergence and overall performance. Essentially, as circuits collapse, the learned representations become fragmented and inconsistent across clients, leading to unstable training dynamics and diminished generalization capabilities.

Circuit collapse, observed as weight divergence in federated learning, is driven by two primary mechanisms: destructive interference and structural drift. Destructive interference occurs when updates to functionally similar circuits across different clients counteract each other, reducing overall performance. Simultaneously, structural drift refers to the divergence in the topology of these circuits – specifically, changes in the connections between nodes within the network – as clients train on non-IID data. This combination leads to a degradation of functional sub-networks, as circuits become misaligned and lose their ability to effectively contribute to the global model. The effect is not simply a difference in learned weights, but a fundamental alteration in the organization of the neural network itself.

Intersection-over-Union (IoU) scores provide quantitative evidence of circuit divergence in non-Independent and Identically Distributed (non-IID) federated learning scenarios. Low IoU scores indicate minimal overlap between the functional sub-networks – or ‘circuits’ – developed on different clients. This signifies that clients are learning substantially different representations despite being trained on a related task, demonstrating a lack of convergence in the learned feature spaces. The magnitude of divergence, as measured by IoU, is directly correlated with the degree of non-IID data distribution across clients; more extreme non-IID settings consistently yield lower IoU scores, confirming that data heterogeneity drives circuit divergence.

Maintaining circuit integrity – the preservation of functional sub-networks within a federated learning model – is critical for stable training because degradation of these circuits directly correlates with weight divergence. Observed phenomena, including circuit collapse and resulting low Intersection-over-Union (IoU) scores in non-IID settings, indicate that structural drift and destructive interference between circuits negatively impact model convergence. Preventing the loss of functional connectivity within these circuits, therefore, is a primary factor in mitigating weight divergence and ensuring consistent performance across participating clients during federated learning.

Despite a two-class-per-client non-IID setting, the observed structural collapse persists, indicating that local circuits are not preserved when aggregating models globally.

Deconstructing the Black Box: A Circuit-Level Analysis

Circuit analysis in neural networks involves identifying the smallest possible subsets of network parameters – termed ‘circuits’ – that are sufficient to perform a defined computation. This approach moves beyond treating neural networks as monolithic entities and instead focuses on decomposing them into functionally interpretable components. The core principle is that complex behaviors emerge not from the interaction of all parameters, but from the coordinated activity of these minimal circuits. Identifying these circuits requires techniques capable of isolating and characterizing parameter contributions, allowing researchers to understand how a network achieves a specific outcome rather than simply observing that it does. The size of a circuit is typically measured by the number of parameters it contains; smaller circuits indicate more efficient and potentially more robust computations.

L0 regularization and the Heaviside step function are utilized to induce sparsity in neural network activations, thereby facilitating the identification of minimal, functional circuits. L0 regularization directly penalizes the number of non-zero parameters, encouraging the network to rely on a small subset for computation. The Heaviside step function, $H(x) = \begin{cases} 1, & x \geq 0 \\ 0, & x < 0 \end{cases}$ , introduces a hard threshold, effectively binarizing activations and further promoting sparsity. By applying these methods, researchers aim to isolate the most critical connections and neurons responsible for specific computations within the network, simplifying analysis and improving interpretability.

The Straight-Through Estimator (STE) addresses the challenge of training neural networks with discontinuous functions, such as those introduced by binarization or thresholding used in circuit discovery. During backpropagation, the STE approximates the derivative of a discontinuous function as if it were simply the identity function. This allows gradients to flow through the non-differentiable operation, enabling optimization via gradient descent. While mathematically inaccurate, the STE provides a computationally efficient and practically effective method for learning sparse circuits defined by these discontinuous functions, despite the lack of a true gradient signal.

Circuit analysis techniques, when applied to federated learning models, enable the monitoring of individual circuit integrity across decentralized clients. This involves identifying and tracking the minimal parameter sets responsible for specific computations on each client. Deviations from expected circuit behavior – such as parameter drift, functional redundancy, or the emergence of non-functional parameters – can indicate circuit collapse or model degradation. Monitoring these metrics allows for the detection of clients exhibiting compromised circuits, potentially due to data poisoning, adversarial attacks, or local model drift, and facilitates interventions like model repair, client isolation, or re-training to maintain the overall robustness and reliability of the federated model.

Applying a client's local circuit mask to the global model preserves functionality by achieving near-perfect accuracy on the client’s specialized classes, even in a non-IID, 2-class-per-client setting. — Applying a client’s local circuit mask to the global model preserves functionality by achieving near-perfect accuracy on the client’s specialized classes, even in a non-IID, 2-class-per-client setting.

The Architecture of Efficiency: Sparsity and the Neural Network

Convolutional Neural Networks, frequently demonstrated using datasets like the ‘MNIST Dataset’ of handwritten digits, rely on specific architectural components to effectively process information. These networks commonly employ ‘Max-Pooling’ layers, which reduce the spatial size of the representation to decrease computational load and control overfitting, alongside ‘ReLU Activation’ functions – Rectified Linear Units – that introduce non-linearity allowing the network to learn complex patterns. The combination of these techniques enables convolutional networks to extract hierarchical features from input data, identifying edges, textures, and ultimately, the objects or patterns within the images. This foundational approach has proven remarkably effective in a wide range of image recognition tasks, and serves as a cornerstone of modern computer vision systems.

Convolutional neural networks, while powerful, often contain redundant parameters that contribute to high computational demands. Neural network pruning addresses this by systematically removing connections and neurons with minimal impact on performance, thereby achieving sparsity. This reduction in complexity isn’t merely about speed; a sparser network generally requires less data to train, improving its ability to generalize to unseen examples and reducing the risk of overfitting. The process involves identifying and eliminating less important weights, often based on magnitude, and then retraining the remaining network to recover any lost accuracy. Consequently, sparse networks offer a compelling pathway toward deploying complex models on resource-constrained devices and enhancing their robustness in real-world applications.

Neural network sparsity doesn’t merely reduce computational load; it fundamentally alters the internal representations within the network, specifically diminishing the phenomenon of neuronal superposition. Superposition, where a single neuron responds to multiple, often conflicting, features, can create ambiguity and hinder reliable processing. By pruning redundant connections and activating only a select subset of neurons, sparsity forces each remaining neuron to specialize, responding to a narrower, more defined input. This specialization increases the clarity of the signal flowing through the network, making it easier to discern which features are driving a particular decision. Consequently, the resulting circuits become more interpretable – researchers can more readily understand the function of individual neurons – and more robust, as the network is less susceptible to noise or irrelevant inputs that might otherwise trigger spurious activations and degrade performance.

The Lottery Ticket Hypothesis proposes a surprising characteristic of neural network training: within a randomly initialized, densely connected network, there exist sparse sub-networks – termed ‘winning tickets’ – capable of achieving performance comparable to the original, much larger network. This isn’t simply a matter of finding a smaller network that works as well; rather, these winning tickets can be identified within the initial weights, before any training has occurred, and then trained in isolation. Researchers demonstrate this by iteratively pruning weights from a network, retraining the remaining connections, and repeating the process, ultimately revealing these highly effective sparse structures. This discovery shifts the focus from optimizing the entire network to identifying and cultivating these inherent, efficient sub-networks, suggesting a pathway toward significantly reducing computational demands and potentially unlocking more interpretable and robust artificial intelligence systems – effectively optimizing the circuit itself rather than just the weights within it.

Increasing sparsity during training demonstrably improves functional preservation, as evidenced by consistently higher cross-evaluation accuracy across different sparsity levels in the 2-class Non-IID experiment.

Beyond Metrics: Charting a Course for Robust Federated Learning

Intersection-over-Union (IoU) serves as a powerful quantitative tool for assessing the structural similarity between neural network circuits during federated learning. This metric, borrowed from object detection tasks, calculates the overlap between the circuits represented by individual clients and the globally aggregated model. Essentially, IoU determines the degree to which the learned features align across different participants. A high IoU score signifies strong consistency – the client’s circuit closely resembles the global one – while a low score suggests substantial divergence in the learned representations. By providing a direct measure of this structural overlap, IoU offers a valuable lens through which to understand the health and stability of a federated learning system, enabling researchers to pinpoint potential issues like model collapse and guide strategies for improved generalization and robustness.

When data is Independently and Identically Distributed (IID) across clients in a federated learning system, the resulting circuits – representing the learned model’s structure – exhibit remarkably high consistency. Quantitative analysis using Intersection-over-Union (IoU) demonstrates this, consistently yielding scores near 1.0. This signifies that the overlapping areas between the circuits representing learned features are almost complete, implying nearly identical feature learning across all clients. Essentially, each client learns a model structure almost identical to the collective knowledge, suggesting effective knowledge sharing and a stable training process. The high IoU in IID scenarios serves as a baseline for evaluating the impact of data heterogeneity and identifying potential issues, like model collapse, that may arise in more complex, non-IID settings.

When examining federated learning scenarios where data is unevenly distributed – specifically, in 1-Class Non-IID settings – a pronounced divergence in circuit structure emerges between local client models and the globally aggregated model. Quantitative analysis reveals a substantial decrease in Intersection-over-Union (IoU) scores, frequently falling below a threshold of 0.3. This signifies that the overlapping areas between the circuits representing learned features are minimal, implying substantially different feature representations. Such a low IoU indicates that individual clients are developing highly specialized, and often incompatible, feature representations, hindering the global model’s ability to effectively generalize across the entire dataset and potentially leading to performance degradation or model collapse.

Monitoring circuit integrity during federated learning offers a proactive approach to preventing model collapse, a phenomenon where individual client models diverge significantly from the global model. Utilizing metrics such as Intersection-over-Union (IoU) allows for quantitative assessment of structural similarity between client circuits and the aggregated global circuit; a declining IoU score signals increasing divergence and potential collapse. This continuous monitoring enables timely intervention, such as adjusting learning rates or implementing regularization techniques, to encourage convergence and maintain a robust global model. By actively tracking these indicators of circuit health, federated learning systems can move beyond reactive troubleshooting and embrace a preventative strategy, ensuring stable and reliable performance even in challenging, non-IID data environments.

Specialist clients, when trained on non-IID data with one class per client, learn structurally disjoint sub-networks as evidenced by their low average Intersection over Union (IoU).

The pursuit of robustness in federated learning, as demonstrated by this analysis of circuit preservation, mirrors a fundamental principle of systems exploration. The paper reveals how non-IID data induces ‘circuit collapse’ – a structural failure within the model – and proposes sparsity as a corrective measure. This echoes Grace Hopper’s sentiment: “It’s easier to ask forgiveness than it is to get permission.” One doesn’t passively accept a system’s limitations; instead, one deliberately introduces controlled disruption – in this case, sparsity – to test the boundaries and reveal underlying vulnerabilities. The resulting divergence of weights, while seemingly chaotic, ultimately provides insight into the network’s operational logic and potential for resilience. It’s a beautiful demonstration of reverse-engineering through induced failure.

Beyond the Black Box

The observation that federated learning’s vulnerabilities stem from structural disintegration – a ‘circuit collapse’ – feels less like a discovery and more like a confirmation of predictable systemic failure. Complex systems rarely degrade gracefully; they unravel. The immediate mitigation – inducing sparsity – functions as a crude form of structural engineering, bracing the network against the stresses of non-IID data. But bracing isn’t understanding. Future work must move beyond symptom management and towards a truly mechanistic interpretation of these circuits. What functional roles are lost first? What specific data pathologies trigger the initial cascade of failures?

A curious point arises from the reliance on sparsity: is the ‘ideal’ federated model simply a minimal, robust circuit, stripped of all but the essential connections? Or is this a forced adaptation, a compromise imposed by the limitations of the current paradigm? The focus should shift towards methods for preventing collapse, perhaps through novel regularization techniques that actively maintain structural integrity, or by developing training protocols that promote the emergence of inherently robust architectures.

Ultimately, the field must acknowledge the uncomfortable truth: federated learning, as currently practiced, is a process of controlled demolition. The goal isn’t to build a perfect global model, but to coax a fragile structure through a hostile environment. The real challenge lies in reverse-engineering the rules governing this structural failure – and then, perhaps, breaking them.

Original article: https://arxiv.org/pdf/2512.23043.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/