Securing the Internet of Things: A Federated Learning Defense

Author: Denis Avetisyan

This review explores how federated learning algorithms can bolster IoT security against evolving cyberattacks, even with fragmented data.

The analysis quantifies the distribution of data points across various attack types, demonstrating the relative frequency with which each vulnerability is observed within the dataset.

Performance comparisons of FedAvg, FedProx, and Scaffold reveal FedProx to be the most effective approach for intrusion detection systems operating under non-IID data distributions.

Despite the increasing sophistication of intrusion detection systems, securing the Internet of Things remains challenging due to data privacy concerns and the distributed nature of IoT devices. This paper, ‘A Robust Federated Learning Approach for Combating Attacks Against IoT Systems Under non-IID Challenges’, investigates the performance of several Federated Learning algorithms-FedAvg, FedProx, and Scaffold-in detecting IoT attacks under realistic, non-independent and identically distributed (non-IID) data conditions. Our analysis of the CICIoT2023 dataset reveals that FedProx consistently outperforms other methods when dealing with statistically heterogeneous data distributions. Will these findings pave the way for more resilient and privacy-preserving IoT security solutions?

The Escalating Cyber Threat Landscape

Contemporary digital networks are battling a relentless escalation in cyberattack complexity. While early threats often consisted of relatively simple denial-of-service (DoS) attacks – overwhelming systems with traffic – modern adversaries now deploy far more intricate strategies. The emergence of botnets like Mirai exemplifies this shift, leveraging compromised Internet of Things (IoT) devices to launch massive, distributed attacks capable of disrupting critical infrastructure and services. These intrusions are no longer limited to mere disruption; attackers now aim to steal sensitive data, hold systems for ransom, or even manipulate industrial processes. The sophistication extends to polymorphic malware, capable of changing its code to evade detection, and advanced persistent threats (APTs), which establish long-term footholds within networks to conduct espionage or sabotage. This constant evolution demands a proactive and adaptive security posture, moving beyond traditional signature-based defenses to embrace behavioral analysis and machine learning-driven threat detection.

Conventional intrusion detection systems, designed for a comparatively simpler digital landscape, are increasingly overwhelmed by the sheer scale and diversity of modern cyberattacks. These systems often rely on signature-based detection, proving ineffective against zero-day exploits and polymorphic malware that constantly alter their characteristics. Furthermore, the escalating volume of network traffic generates a high rate of false positives, requiring significant manual intervention and obscuring genuine threats. This inability to effectively differentiate between benign activity and malicious intent creates critical vulnerabilities, leaving networks exposed to prolonged attacks and potential data breaches. The limitations of these traditional approaches highlight the urgent need for more intelligent, adaptive security solutions capable of analyzing network behavior in real-time and proactively mitigating emerging threats.

The exponential growth of Internet of Things (IoT) devices introduces a dramatically expanded attack surface for malicious actors. Unlike traditional computing devices with established security protocols, many IoT devices are resource-constrained and lack robust security features, often shipping with default passwords or unpatched vulnerabilities. This creates a fertile ground for botnets, denial-of-service attacks, and data breaches, as compromised devices can be easily co-opted and utilized in large-scale intrusions. Consequently, current security solutions, designed for conventional networks, are proving inadequate; scalable and adaptive security measures-including advanced threat detection, intrusion prevention, and secure device management-are urgently needed to mitigate the escalating risks posed by this interconnected web of vulnerable devices and to ensure the continued reliability and safety of critical infrastructure and personal data.

Effective cybersecurity research and development increasingly relies on comprehensive datasets that mirror real-world network traffic, and the CICIoT2023 dataset stands as a significant contribution in this area. This large-scale compilation captures a diverse range of malicious activities targeting Internet of Things devices, offering a realistic and challenging benchmark for evaluating the performance of intrusion detection and prevention systems. Unlike synthetic or limited datasets, CICIoT2023 encompasses a broad spectrum of attack vectors – including botnet activity, data exfiltration, and denial-of-service attacks – all generated within a controlled emulation of a smart home environment. This allows security professionals and researchers to test and refine their algorithms against authentic, contemporary threats, fostering innovation in scalable and adaptable security solutions crucial for protecting the rapidly expanding landscape of connected devices. The dataset’s detailed labeling and comprehensive feature sets further enable the development of machine learning models capable of accurately identifying and mitigating these evolving cyber risks.

The number of data points varies across different attack types, indicating varying levels of data required for analysis.

Decentralized Security: The Promise of Federated Learning

Federated Learning (FL) is a distributed machine learning approach that allows for model training on a multitude of decentralized devices or servers holding local data samples, without requiring the explicit exchange of those data samples. This is achieved by training models locally on each device, then aggregating only model updates – such as gradients or model weights – to a central server. This process inherently enhances data privacy, as raw data remains on the client devices. Furthermore, reducing the need to transfer large datasets significantly lowers bandwidth requirements and associated communication costs, making FL particularly suitable for applications involving edge devices and large-scale, geographically distributed data.

Data partitioning is the initial step in federated learning, involving the division of a centralized dataset into subsets which are then distributed across multiple client devices or servers. Each client receives a portion of the data, effectively creating a decentralized data landscape. This distribution is crucial as it enables local model training directly on the client’s data, avoiding the need to transfer raw data to a central server. The specific method of partitioning – whether random, stratified, or based on data characteristics – impacts the subsequent training process and overall model performance. The size and characteristics of each partition can vary, but the collective partitions represent the complete dataset used for model development.

Federated Averaging (FedAvg) is a prevalent algorithm in federated learning that facilitates the aggregation of model updates from decentralized clients. The process involves each client training a local model on its respective dataset and subsequently transmitting the model weights or gradients to a central server. The server then computes a weighted average of these updates, where the weights are typically proportional to the size of each client’s dataset. This aggregated model represents the global model, which is then redistributed to the clients for the next round of training. By only exchanging model parameters, rather than raw data, FedAvg enhances data privacy and reduces communication costs. The efficiency of FedAvg is further improved by performing multiple local training epochs on each client before averaging, decreasing the frequency of communication with the central server.

The efficacy of Federated Learning (FL) is significantly impacted by the statistical properties of the data distributed across clients. When data is Independently and Identically Distributed (IID), each client’s dataset shares the same underlying distribution, simplifying model aggregation and convergence. Conversely, Non-IID data, characterized by differing distributions across clients – a common scenario in real-world applications – introduces challenges such as model drift and slower convergence rates. This data heterogeneity necessitates specialized algorithms and techniques, including client selection strategies, personalized model updates, and robust aggregation methods, to mitigate performance degradation and ensure the global model accurately represents the overall data distribution. The degree of Non-IIDness, quantified by metrics such as Dirichlet distribution parameters, directly correlates with the complexity of training and the potential for reduced model accuracy.

Clients independently prepare their local data before model training.

Taming the Beast: Addressing Data Heterogeneity in FL

Many real-world datasets, including those utilized in cybersecurity applications like cyberattack analysis, exhibit Non-Independent and Identically Distributed (Non-IID) data distributions. This characteristic means that data samples across different clients or devices are not representative of a single, uniform distribution; instead, they vary significantly in terms of features, class labels, or quantity. Consequently, machine learning models trained on these Non-IID datasets can suffer from biased parameter updates during federated learning, leading to reduced generalization performance and lower overall accuracy compared to scenarios with IID data. The skew in data distribution can cause the global model to favor patterns prevalent in certain clients, hindering its ability to effectively analyze or predict outcomes across the entire dataset.

FedProx and Scaffold address statistical heterogeneity in federated learning through distinct algorithmic modifications. FedProx introduces a proximal term to the local objective function, effectively adding a regularization component that penalizes deviations from the global model, thereby constraining local updates and promoting convergence even with non-IID data. Scaffold, conversely, employs control variates – auxiliary variables used to reduce the variance of the stochastic gradients – to correct for the drift caused by client-level data distributions. Specifically, these control variates estimate the difference between local and global gradients, allowing for a more accurate aggregation of updates. Both approaches aim to stabilize the training process and improve model generalization by reducing the impact of disparate data distributions across clients.

Rigorous experimentation has demonstrated the efficacy of advanced federated learning algorithms in addressing non-IID data distributions. Specifically, the Scaffold algorithm achieved an accuracy of 96.16% when trained on independently and identically distributed (IID) data. In contrast, the standard FedAvg algorithm exhibited significantly reduced performance under non-IID conditions, achieving only 28.88% accuracy. This substantial disparity highlights the detrimental impact of statistical heterogeneity on model generalization and the necessity of employing specialized algorithms like Scaffold to maintain performance in realistic, decentralized data environments.

Model training within federated learning environments employs optimization algorithms such as Stochastic Gradient Descent (SGD) and evaluates performance using loss functions, commonly Cross-Entropy Loss, to iteratively adjust model parameters. Comparative analysis demonstrates the efficacy of algorithms designed to address data heterogeneity; specifically, utilizing FedProx with a $\mu$ value of 0.04 resulted in a 71.88% accuracy rate under Non-IID data conditions, a substantial improvement over the 28.88% achieved by FedAvg under the same conditions. This performance gain is further substantiated by the associated loss values: FedProx registered a loss of 1.10, compared to the significantly higher loss of 12.54 recorded by FedAvg.

Our approach outperforms conventional methods when applied to non-IID data, demonstrating improved global performance.

The Future of Defense: FL and the Evolving Cybersecurity Landscape

Federated Learning represents a significant step forward in the development of intrusion detection systems, moving beyond the limitations of traditional centralized approaches. This distributed machine learning technique allows models to be trained across a network of devices or servers holding local data samples – without requiring the exchange of that data. This is particularly critical in cybersecurity, where data is often sensitive and subject to strict privacy regulations. The inherent adaptability of Federated Learning stems from its ability to continuously refine models using new, locally-sourced data, enabling the identification of emerging threat patterns and zero-day exploits. Unlike static, signature-based systems, a Federated Learning-powered intrusion detection system can evolve in real-time alongside the ever-changing landscape of cyberattacks, offering a more resilient and proactive defense against sophisticated threats. The decentralized nature also mitigates single points of failure, bolstering the overall robustness of the security infrastructure.

Federated Learning offers a novel approach to bolstering cybersecurity defenses against increasingly complex threats. Traditional intrusion detection systems often struggle with the nuances of attacks like spoofing and reconnaissance due to limitations in training data diversity; these systems are frequently trained on centralized datasets that fail to capture the full spectrum of malicious activity. However, FL circumvents this issue by training models across a decentralized network of devices and data sources-each contributing local data without directly sharing it. This distributed approach allows the system to learn from a far wider range of attack patterns and network behaviors, significantly improving its ability to identify subtle indicators of compromise. By aggregating insights from diverse environments, FL-powered systems can more accurately detect anomalous activity indicative of spoofing attempts-where attackers disguise their identity-and reconnaissance attacks-where adversaries gather information to plan further exploits-ultimately providing a more robust and adaptable defense against evolving cyber threats.

Significant advancements in federated learning (FL) for cybersecurity necessitate continued refinement of algorithms tailored to specific applications, such as intrusion detection and malware analysis. Current research focuses on minimizing communication costs and maximizing model accuracy when dealing with the unique characteristics of network data. A critical area of investigation involves bolstering data privacy; techniques like differential privacy and secure multi-party computation are being explored to prevent sensitive information leakage during model training. Furthermore, addressing the inherent vulnerabilities of FL systems themselves – including potential poisoning attacks where malicious actors manipulate local models – is paramount. Successfully navigating these challenges will not only enhance the effectiveness of FL-driven cybersecurity solutions but also build trust in their deployment across critical infrastructure and sensitive data environments.

Realizing the transformative potential of Federated Learning (FL) for cybersecurity hinges on overcoming the inherent challenges posed by data heterogeneity. Critical infrastructure and expansive networks generate data with vast differences in format, volume, and quality – stemming from diverse sensor types, varying operational conditions, and disparate logging practices. Consequently, effective FL systems require sophisticated techniques to normalize and harmonize this data without compromising privacy or introducing bias. Ongoing research focuses on developing robust aggregation algorithms and personalized model training strategies that can accommodate these variations, allowing FL to accurately identify anomalies and threats across heterogeneous environments. Successfully addressing data heterogeneity will not only improve the performance of intrusion detection systems, but also enable FL to protect a wider range of critical assets and networks against increasingly complex cyberattacks.

The pursuit of elegant solutions in distributed systems invariably collides with the messiness of production realities. This paper’s exploration of Federated Learning algorithms – FedAvg, FedProx, and Scaffold – highlights a familiar pattern. While each approach aims to overcome the challenges of non-IID data, the finding that FedProx offers the most robust performance feels less like a triumph of theory and more like a pragmatic acknowledgment of data’s inherent inconsistencies. As Barbara Liskov once stated, “It’s one thing to program something; it’s another thing to build a system that will last.” The system will always find a way to expose the cracks in even the most carefully constructed abstractions. The claim of overcoming statistical heterogeneity is ambitious; the fact that FedProx simply mitigates the impact feels… realistic. If all the tests pass, it’s probably because they aren’t testing the edge cases production will inevitably throw at it.

What’s Next?

The demonstrated advantage of FedProx under non-IID conditions feels less like a triumph and more like a temporary stay of execution. Statistical heterogeneity, after all, isn’t a bug to be solved, but a feature of every production deployment. The CICIoT2023 dataset, while valuable, represents a controlled fracture of reality. The true test will come when models encounter distributions that weren’t merely sampled differently, but actively evolve to evade detection. Everything optimized will one day be optimized back.

The aggregation strategies examined here – averaging, proximity regularization – address the symptoms of data disparity, not the underlying condition. Future work will likely focus less on clever averaging and more on mechanisms for active data sharing – or, more realistically, carefully controlled leakage – between devices. The architecture isn’t a diagram; it’s a compromise that survived deployment. Expect to see increasingly sophisticated differential privacy techniques grafted onto these frameworks, acknowledging that perfect security is a mathematical ideal, not an engineering goal.

Ultimately, the field chases a moving target. The very act of securing IoT systems creates new attack surfaces. The focus isn’t merely on building intrusion detection systems, but on constructing systems resilient enough to absorb, adapt, and even learn from inevitable compromise. The code doesn’t get refactored – it gets resuscitated.

Original article: https://arxiv.org/pdf/2511.16822.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Escalating Cyber Threat Landscape

Decentralized Security: The Promise of Federated Learning

Taming the Beast: Addressing Data Heterogeneity in FL

The Future of Defense: FL and the Evolving Cybersecurity Landscape

What’s Next?

See also: