Securing the Algorithm: The Future of Cyber Threat Intelligence

Author: Denis Avetisyan

As artificial intelligence systems become increasingly integrated into critical infrastructure, traditional cybersecurity approaches are proving inadequate, demanding a fundamental shift in how we detect and respond to emerging threats.

Indicators of compromise are derived from NSFOCUS threat intelligence [21], providing a foundation for identifying and mitigating malicious activity.

This review examines the necessary adaptations to cyber threat intelligence practices to address the unique vulnerabilities of AI systems, including adversarial attacks, data poisoning, and the need for novel indicators of compromise.

While conventional cyber defenses struggle with the novel attack surfaces of increasingly pervasive artificial intelligence, a proactive threat intelligence approach is essential for mitigating emerging risks. This paper, ‘Cyber Threat Intelligence for Artificial Intelligence Systems’, investigates how to adapt cyber threat intelligence (CTI) practices to address the unique vulnerabilities of AI systems, outlining necessary indicators of compromise across the AI supply chain. The core finding is that a tailored CTI knowledge base-incorporating AI-specific artifacts and vulnerabilities-is crucial for supporting effective security tooling and response. Given the rapidly evolving landscape of adversarial attacks-including model poisoning and prompt injection-how can we build a robust and scalable CTI framework to proactively defend against threats targeting artificial intelligence?

The Evolving Threat Landscape: A System Under Stress

Machine learning systems, while offering transformative capabilities, are becoming focal points for increasingly complex and targeted attacks. These are not simply traditional cyberattacks adapted for a new domain; instead, adversaries are crafting exploits specifically designed to manipulate the learning process itself. Techniques range from data poisoning – subtly corrupting the training data to induce desired misclassifications – to adversarial examples, where carefully crafted inputs, imperceptible to humans, can cause a model to fail. The growing sophistication stems from the high stakes – compromised AI can impact critical infrastructure, financial markets, and even national security – and the potential for automated, scalable attacks. This shift necessitates a proactive security posture, moving beyond reactive defenses to anticipate and mitigate these novel threats before they can fully materialize.

Conventional cybersecurity strategies, designed to defend against established attack vectors, are increasingly ineffective when applied to the unique vulnerabilities of artificial intelligence systems. These systems, built on complex algorithms and vast datasets, present a moving target for malicious actors who exploit weaknesses in model training, data poisoning, or adversarial examples. Consequently, a specialized field of AI Threat Intelligence has emerged, focusing on proactively identifying, analyzing, and mitigating risks specific to machine learning models. This involves developing novel techniques for monitoring model behavior, detecting anomalous inputs, and building resilient AI systems capable of withstanding sophisticated attacks – a departure from traditional signature-based detection and firewall defenses.

The escalating sophistication of Large Language Models (LLMs), while driving innovation, simultaneously broadens the avenues for malicious actors. These models, with their intricate architectures and vast parameter spaces, present novel attack surfaces beyond those addressed by conventional cybersecurity measures. Exploits aren’t limited to data breaches; they now encompass prompt injection – manipulating the model’s output – and model stealing, where the intellectual property embedded within the LLM is compromised. Data from the AI Incident Database (AIID) underscores this growing threat; as of March 2026, the database records 5499 reported incidents, representing 1366 unique attacks against AI systems. This substantial increase demonstrates that successful exploits aren’t isolated events, but rather a consistently expanding pattern of vulnerability, demanding proactive and specialized AI threat intelligence to mitigate the risks associated with increasingly complex models.

The number of reported AI incidents in the AI Incident Database is projected to increase from 2015 to 2025 [18].

Deconstructing the Assault: Understanding Attack Vectors

Adversarial examples and data poisoning attacks represent critical vulnerabilities in machine learning systems. Adversarial examples are subtly modified inputs designed to cause a model to misclassify data, often with high confidence; these modifications are typically imperceptible to humans. Data poisoning, conversely, targets the training phase, introducing malicious or corrupted data into the training dataset. This can lead to the model learning incorrect patterns or exhibiting biased behavior. Both attack vectors exploit the reliance of AI models on statistical correlations within the data; adversarial examples manipulate inputs to leverage these correlations in unintended ways, while data poisoning alters the underlying statistical landscape of the training data itself. The impact ranges from image recognition failures to compromised decision-making in critical applications.

Prompt injection attacks represent a vulnerability specific to Large Language Models (LLMs) where malicious actors craft input prompts designed to override the model’s intended behavior and generate unintended outputs. Unlike traditional input validation failures, these attacks don’t exploit coding errors but rather leverage the LLM’s natural language processing capabilities against itself. Attackers can bypass safety protocols, extract sensitive information from the model’s training data, or compel the model to perform actions it was not designed for, such as disseminating misinformation or executing commands. Successful prompt injections often rely on techniques like prompt leaking-where the model reveals its underlying instructions-or instruction following, where the malicious prompt is disguised as a legitimate request that redirects the model’s focus.

Model backdoors represent a stealthy attack vector where malicious code is embedded directly into the parameters – the weights – of a trained AI model. This is achieved by manipulating the training data or the training process itself, creating specific patterns the model learns to associate with attacker-defined triggers. These triggers, often subtle and seemingly innocuous inputs, cause the model to deviate from its intended function and execute the attacker’s desired behavior, such as misclassification or data exfiltration. Crucially, these backdoors persist even after retraining or redeployment, remaining latent until activated by the specific trigger, and are difficult to detect through conventional security measures focused on input validation or runtime monitoring.

The AVID taxonomy's SEP and Lifecycle views provide complementary perspectives on the diverse range of risks present throughout an AI development workflow. — The AVID taxonomy’s SEP and Lifecycle views provide complementary perspectives on the diverse range of risks present throughout an AI development workflow.

Constructing the Shield: A Robust AI Threat Intelligence Infrastructure

AI incident databases and frameworks are critical for developing a comprehensive understanding of the evolving threat landscape. Resources like MITRE ATLAS provide a centralized repository for documenting AI-specific attacks, detailing tactics, techniques, and procedures (TTPs) employed by malicious actors. These platforms facilitate the sharing of threat intelligence, enabling security professionals to analyze attack patterns, identify emerging threats, and develop effective mitigation strategies. Documentation within these frameworks often includes details on the AI models targeted, the vulnerabilities exploited, and the potential impact of successful attacks, thereby supporting proactive defense measures and incident response planning.

Deep hashing and fuzzy hashing techniques are utilized to identify Indicators of Compromise (IOCs) associated with malicious artificial intelligence assets by creating unique, condensed signatures of AI models or datasets. Deep hashing generates signatures based on the internal parameters and structure of an AI model, allowing for the detection of subtle modifications or adversarial attacks. Fuzzy hashing, conversely, focuses on identifying similarities between files or data, even if they are not exact matches, which is crucial for detecting variations of malicious AI components. These methods enable security teams to proactively identify and mitigate threats by matching known malicious signatures or detecting anomalous deviations from legitimate AI assets, improving the overall resilience of AI systems.

The AI Vulnerability Database (AVID) functions as a publicly accessible, open-source repository for collecting and disseminating information regarding vulnerabilities specifically impacting Artificial Intelligence and Machine Learning systems. This collaborative effort aims to facilitate rapid response and mitigation of potential threats. Quantitative analysis demonstrates AVID’s increasing relevance; the number of documented vulnerability entries grew from 13 in 2022 to 27 in 2023, indicating a substantial expansion in both the scope of identified issues and community engagement with the platform.

The GMF Taxonomy and the CSET AI Harm Taxonomy offer standardized frameworks for categorizing and analyzing incidents involving artificial intelligence. The GMF Taxonomy focuses on classifying malicious uses of AI according to the stage of the AI lifecycle targeted – such as data, model, or deployment – and the type of harm caused. Conversely, the CSET AI Harm Taxonomy provides a more granular classification of harms, encompassing areas like manipulation, discrimination, and safety risks. Utilizing these taxonomies enables consistent incident reporting, facilitates threat analysis, and supports the development of targeted mitigation strategies by providing a shared understanding of AI-related harms and their characteristics.

The AVID taxonomy organizes robotic manipulation primitives into a matrix representing different approaches to action, vision, and interaction with the environment.

Proactive Evaluation and Future Directions: Hardening the System

The Qualifire Prompt Injections Benchmark represents a crucial step in assessing the security of Large Language Models (LLMs). This benchmark employs a carefully constructed dataset of 5000 prompts – comprising both harmless queries and adversarial ‘jailbreak’ attempts – to rigorously test an LLM’s susceptibility to prompt injection attacks. By subjecting models to this diverse range of inputs, researchers can quantitatively measure how effectively a model maintains its intended behavior when faced with malicious prompts designed to bypass safety protocols or extract sensitive information. The benchmark’s value lies not just in identifying vulnerabilities, but in providing a standardized method for comparing the robustness of different LLMs and tracking improvements in defense mechanisms as they are developed. This systematic evaluation is essential for building trustworthy AI systems capable of resisting manipulation and ensuring responsible deployment.

A comprehensive and systematic review of existing literature proves increasingly vital as the field of artificial intelligence rapidly advances. Such reviews are not merely summaries of past work, but actively pinpoint emerging threats and previously unaddressed vulnerabilities within AI systems. By meticulously analyzing research across disciplines – including computer science, cybersecurity, and even social sciences – these studies reveal patterns and potential weaknesses that might otherwise remain hidden. This proactive approach allows researchers to anticipate future attack vectors, understand the limitations of current defenses, and ultimately, build more robust and trustworthy AI applications before they are exploited, fostering a more secure and reliable technological landscape.

The pursuit of robust and reliable artificial intelligence necessitates sustained investigation and broadened cooperative efforts. Current defenses against adversarial attacks, including prompt injections and data poisoning, often prove brittle when confronted with novel techniques, highlighting a critical need for adaptive security measures. Future work must prioritize the development of proactive defense mechanisms, incorporating techniques like differential privacy and adversarial training, alongside standardized evaluation benchmarks-such as the Qualifire Prompt Injections Benchmark-to rigorously assess model resilience. Furthermore, fostering interdisciplinary collaboration between AI researchers, cybersecurity experts, and ethicists is paramount to anticipate emerging threats, establish shared best practices, and ultimately build AI systems deserving of public trust and widespread adoption.

The annotation process was successfully applied to a real-world AIID incident (ID 72) to facilitate analysis and understanding.

The study of cyber threat intelligence for AI systems necessitates a holistic understanding of interconnected components. It’s not merely about identifying malicious inputs – like those leveraged in prompt injection attacks – but recognizing how these attacks propagate through the entire AI infrastructure. As Andrey Kolmogorov stated, “The most important thing in science is not knowing, but knowing what to know.” This principle directly applies to securing AI; defining the crucial data points – the ‘Indicators of Compromise’ – requires a comprehensive grasp of the system’s architecture and potential vulnerabilities. A fragmented approach, focusing on isolated defenses, will ultimately prove insufficient against a determined adversary. The focus must be on understanding the entire landscape of threats and prioritizing knowledge acquisition accordingly.

The Horizon Recedes

The adaptation of cyber threat intelligence to artificial intelligence systems is not merely a technical challenge, but a conceptual one. This work highlights the necessity of novel indicators of compromise, yet it tacitly acknowledges a deeper truth: security, in these complex systems, is not a state achieved, but a continuous negotiation with emergent properties. Each mitigation, each new ‘indicator,’ introduces a new surface for attack, a new point of leverage for an adversary. The architecture is the system’s behavior over time, not a diagram on paper.

Future research will inevitably focus on automating the discovery of these adversarial patterns, employing machine learning to identify malicious inputs or model manipulations. However, a reliance solely on automated detection risks an arms race of increasing sophistication, where detection lags perpetually behind innovation in attack vectors. A more fruitful path may lie in understanding the fundamental principles governing the robustness of these systems-that is, their capacity to maintain function even in the presence of disruption.

The true limitation, perhaps, is not in the data itself, but in the models used to interpret it. Every optimization introduces new tension points. The field must move beyond simply cataloging attacks and toward a systemic understanding of how intelligence informs resilience, accepting that complete security is a fiction, and graceful degradation the only realistic goal.

Original article: https://arxiv.org/pdf/2603.05068.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Evolving Threat Landscape: A System Under Stress

Deconstructing the Assault: Understanding Attack Vectors

Constructing the Shield: A Robust AI Threat Intelligence Infrastructure

Proactive Evaluation and Future Directions: Hardening the System

The Horizon Recedes

See also: