Securing the Rise of AI Agents: A New Threat Modeling Approach

Author: Denis Avetisyan

As AI agents become increasingly integrated into critical systems, a robust security framework is essential, and researchers have developed a novel platform to proactively identify vulnerabilities.

The ASTRIDE framework predicts potential threats by processing incoming data through a defined flow, anticipating vulnerabilities before they manifest as exploitable conditions.

ASTRIDE combines visual analysis and large language model reasoning to automate threat modeling for agentic-AI applications, detecting both conventional and AI-specific security risks.

While increasingly sophisticated, AI agent-based systems introduce novel security challenges beyond the scope of traditional threat modeling. This paper presents ASTRIDE-a platform designed to address this gap by automating security analysis specifically for agentic architectures. ASTRIDE extends the established STRIDE framework with AI-specific threats and leverages a consortium of fine-tuned vision-language models alongside LLM reasoning to perform end-to-end threat modeling directly from system diagrams. Can this diagram-driven automation provide a scalable and explainable approach to securing the next generation of intelligent, autonomous systems?

Legacy Frameworks Can’t See the Forest for the Algorithms

Established threat modeling frameworks, such as STRIDE, were designed for conventional software systems and often fall short when applied to the complexities of artificial intelligence. These frameworks typically focus on well-defined inputs, outputs, and state transitions, but AI agents introduce dynamic and unpredictable behaviors. The ability of these agents to learn, adapt, and operate in ambiguous environments creates vulnerabilities that traditional methods struggle to identify or mitigate. Specifically, threats related to data poisoning, adversarial examples, and model extraction are difficult to model using established threat categories, requiring a re-evaluation of security assumptions and the development of new analytical techniques. Consequently, reliance on these legacy approaches can leave AI-driven systems exposed to novel and potentially devastating attacks.

As artificial intelligence systems grow in complexity, traditional security analysis methods are proving inadequate for identifying and mitigating novel threats. Current approaches often rely on manual inspection and predefined attack patterns, failing to keep pace with the adaptive and often unpredictable behavior of AI agents. Consequently, a paradigm shift towards automated security analysis is essential; this involves leveraging AI itself to proactively identify vulnerabilities, simulate attacks, and continuously monitor for anomalous behavior. This automated, AI-aware security can analyze the unique characteristics of AI systems – such as their training data, algorithms, and deployment environments – to uncover weaknesses that would otherwise remain hidden, ensuring a more robust and resilient defense against increasingly sophisticated attacks.

Current security methodologies, designed for conventional cyber threats, are increasingly inadequate when facing the novel attack vectors presented by artificial intelligence. These established frameworks often fail to account for the unique vulnerabilities inherent in AI systems – such as data poisoning, adversarial examples, and model extraction – leaving organizations exposed to attacks that bypass traditional defenses. This isn’t merely a matter of increased complexity; AI-driven attacks can be automated, polymorphic, and capable of rapidly exploiting zero-day vulnerabilities. Consequently, a failure to proactively adapt security practices to address these AI-specific risks creates significant blind spots, potentially leading to substantial breaches and systemic failures as malicious actors learn to weaponize the very intelligence designed to protect systems.

The ASTRIDE system integrates perception, prediction, and planning to enable robust robotic navigation and manipulation.

ASTRIDE: Extending the Old Guard with a Touch of Automation

ASTRIDE builds upon the established STRIDE threat modeling framework by incorporating threat categories specific to Artificial Intelligence systems. Traditional STRIDE focuses on threats like Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege; ASTRIDE extends this to include vulnerabilities arising from the unique characteristics of AI. Specifically, it addresses threats such as prompt injection, where malicious input manipulates the AI’s behavior, and context poisoning, which involves corrupting the data used to train or inform the AI model, leading to inaccurate or biased outputs. These AI-specific additions allow ASTRIDE to provide a more comprehensive threat assessment for systems utilizing machine learning and large language models, going beyond the scope of conventional security analyses.

ASTRIDE employs LLM Agents to manage a collaborative analysis process between Visual Language Models (VLMs) and a dedicated reasoning LLM. These agents function as orchestrators, directing the flow of information and tasks. Specifically, VLMs analyze system architecture diagrams to visually identify potential vulnerabilities, and the reasoning LLM then processes these findings, applying security expertise to assess risk and generate threat reports. This synergistic workflow allows ASTRIDE to combine the visual pattern recognition capabilities of VLMs with the logical reasoning and knowledge base of a specialized LLM, resulting in a more comprehensive and automated threat modeling process than either model could achieve independently.

ASTRIDE employs Visual Language Models (VLMs) to analyze system architecture diagrams as a primary method for identifying potential vulnerabilities within AI systems. These diagrams, often created using tools such as Mermaid, visually represent the components and data flow of the system. The VLM processes these visual representations to detect architectural weaknesses, potential attack surfaces, and misconfigurations that could be exploited. This analysis focuses on identifying how different system elements interact and where vulnerabilities might arise from these interactions, allowing ASTRIDE to pinpoint areas requiring further investigation and mitigation strategies. The use of diagrammatic input allows the VLM to reason about the system’s structure in a way that textual descriptions alone may not facilitate.

Traditional threat modeling for AI systems is a labor-intensive process, requiring significant time and expertise to identify potential vulnerabilities across complex architectures. ASTRIDE addresses this challenge by automating key stages of threat modeling, including vulnerability identification from system diagrams and the generation of threat scenarios. This automation demonstrably reduces the manual effort required – estimated reductions range from 40-60% in initial testing – allowing security professionals to focus on validating findings and implementing mitigations. The benefit is particularly pronounced in systems with numerous components and intricate data flows, where manual analysis would be both time-consuming and prone to oversight. By streamlining the process, ASTRIDE enables more frequent and thorough threat modeling, ultimately enhancing the security posture of AI-driven applications.

The image displays the prompt used to elicit reasoning from the OpenAI-gpt-oss large language model.

Visual Threat Analysis: It’s All About the Pictures

ASTRIDE employs a multi-VLM approach to visual threat analysis, leveraging Llama-Vision, Pix2Struct, and Qwen2-VL for the specific task of identifying prompt injection vulnerabilities. These models analyze visual inputs, such as system architecture diagrams and user interface screenshots, to detect potential pathways for malicious prompts that could compromise the AI system. The integration of multiple VLMs allows ASTRIDE to benefit from the unique strengths of each model – Llama-Vision for general image understanding, Pix2Struct for extracting structured information from diagrams, and Qwen2-VL for its multimodal capabilities – resulting in a more robust and comprehensive threat detection system focused on visually conveyed attack vectors.

ASTRIDE employs Unsloth and QLoRA to facilitate the deployment of computationally intensive Visual Language Models (VLMs). QLoRA is a quantization technique that reduces the precision of model weights, thereby decreasing both memory footprint and computational demands. This allows for training and inference on hardware with limited resources; during ASTRIDE’s fine-tuning process, peak reserved memory reached 14.605 GB, with an actual consumption of 5.853 GB, representing 39.69% of total available memory. Unsloth further optimizes the process by enabling efficient attention mechanisms and memory management, collectively making large model deployment practical on consumer-grade systems without significant performance degradation.

Fine-tuning of the integrated Visual Language Models (VLMs) – Llama-Vision, Pix2Struct, and Qwen2-VL – was completed in 1627 seconds, equivalent to 27.12 minutes. This process utilized a dataset consisting of 1200 records specifically curated for the identification of prompt injection vulnerabilities. The completion time represents the total duration required to adapt the pre-trained models to the specific task of visual threat analysis within the ASTRIDE framework, enabling improved accuracy and performance on the targeted security assessment.

QLoRA quantization significantly reduces the computational resources required for deploying large vision-language models (VLMs). During ASTRIDE’s training process, peak reserved memory reached 14.605 GB, but actual memory consumption was 5.853 GB, representing 39.69% of the total available memory. This reduction is achieved through quantization, a process that lowers the precision of the model’s weights, thereby decreasing both memory footprint and computational demands, and enabling effective operation on consumer-grade hardware without substantial performance degradation.

Visual Large Models (VLMs) enhance vulnerability identification by processing system architecture diagrams to detect potential weaknesses often overlooked by conventional security analysis. Traditional methods frequently rely on code review or static analysis, which may not fully capture vulnerabilities stemming from the interactions between system components as visualized in a diagram. VLMs, however, can interpret the graphical representation of the architecture, identifying risky configurations or data flows that suggest possible attack vectors. This capability provides a more comprehensive understanding of the attack surface by revealing vulnerabilities related to system design and integration, rather than solely focusing on code-level issues.

ASTRIDE’s security assessment capabilities are achieved through the integration of multiple Visual Language Models (VLMs) – Llama-Vision, Pix2Struct, and Qwen2-VL – focused on identifying prompt injection vulnerabilities within system architecture diagrams. This analysis is facilitated by the use of Unsloth and QLoRA, a quantization method that reduces the computational demands of these large models, allowing for deployment on consumer-grade hardware with a peak reserved memory of 14.605 GB during training. The system’s ability to analyze visual representations of system designs complements traditional security methods, enabling the identification of nuanced vulnerabilities and providing a more comprehensive understanding of the attack surface. Fine-tuning the VLMs on a dataset of 1200 records requires approximately 27.12 minutes, establishing a relatively efficient proactive security evaluation process for AI-driven systems.

QLoRA fine-tuning enables efficient VLM adaptation and deployment via Ollama.

Securing the Future: Proactive Defense is the Only Defense

ASTRIDE signifies a considerable advancement in the field of artificial intelligence security by delivering a solution capable of adapting to the constantly changing spectrum of cyber threats. Unlike traditional security measures often requiring extensive manual effort, ASTRIDE employs automation to systematically analyze AI systems, identifying potential weaknesses before they can be exploited. This scalability is particularly important given the rapid proliferation of AI across numerous industries, where the volume and complexity of models far exceed the capacity of human review alone. By proactively discovering and flagging vulnerabilities, ASTRIDE not only reduces the risk of successful attacks but also empowers developers to build more robust and trustworthy AI applications, fostering confidence in this increasingly pervasive technology.

ASTRIDE distinguishes itself by shifting the paradigm of AI security from reactive patching to proactive vulnerability discovery. Rather than waiting for exploits to emerge, the platform systematically analyzes AI models and their surrounding infrastructure to pinpoint potential weaknesses before they can be leveraged by malicious actors. This preventative approach allows organizations to address risks at the design stage, significantly reducing the attack surface and fostering the development of inherently more robust AI systems. By identifying vulnerabilities such as data poisoning, model evasion, and adversarial attacks, ASTRIDE empowers developers and security teams to build resilient AI that can withstand increasingly sophisticated threats, ensuring continued operation and safeguarding critical assets.

ASTRIDE’s core strength lies in its capacity to dissect the intricate designs of modern AI systems and pinpoint vulnerabilities unique to these technologies. Unlike traditional security tools geared towards conventional software, ASTRIDE actively searches for attacks that specifically exploit the nuances of machine learning, such as adversarial examples designed to mislead algorithms or data poisoning attempts aimed at corrupting training datasets. This detailed architectural analysis isn’t merely a passive scan; it allows the platform to understand how data flows through the AI, where sensitive information resides, and how an attacker might manipulate the system to access or compromise that data. Consequently, organizations can move beyond generic security measures and implement targeted defenses, safeguarding critical assets and preventing malicious actors from exploiting the increasing complexity of artificial intelligence.

The increasing integration of artificial intelligence into daily life – from healthcare and finance to transportation and communication – necessitates a parallel focus on its security and reliability. As AI systems become more complex and pervasive, the potential for malicious exploitation and unintended consequences grows exponentially. Tools like ASTRIDE are therefore not merely beneficial, but essential for fostering trustworthy AI. By proactively addressing vulnerabilities and enabling the development of robust defenses, platforms of this kind will be critical in ensuring that the benefits of AI are realized without compromising safety, privacy, or societal well-being. The future of AI hinges not only on innovation, but also on a commitment to responsible development, and automated security tools represent a key component of that commitment.

The pursuit of automated security, as demonstrated by ASTRIDE, feels…familiar. It’s a predictable cycle. One builds elegant systems, hoping to preemptively squash vulnerabilities with LLM reasoning and vision-language models. Yet, production, as always, will discover novel failure modes. It’s a comforting inevitability. Andrey Kolmogorov observed, “The most important thing in science is not to be afraid of making mistakes.” This rings particularly true when applying theoretical security models to the chaotic reality of agentic AI. ASTRIDE aims to identify both traditional and AI-specific threats, but the system’s ultimate test will be how it withstands the relentless creativity of attackers – and the unexpected behaviors of the agents themselves. It’s a temporary stay of execution, at best.

What’s Next?

ASTRIDE, like all attempts to formalize security analysis, delivers a snapshot of current vulnerability patterns. The platform accurately identifies threats as they are understood today. Production systems, however, have a remarkable capacity to invent novel failure modes, often in ways that circumvent even thoughtfully constructed threat models. The immediate future will inevitably involve a continuous game of catch-up, refining the fine-tuned vision-language models to recognize the emergent exploits that inevitably arise from deploying these agentic systems at scale.

The reliance on visual architecture diagrams is both a strength and a limitation. While offering a readily digestible input, it presumes an accurate and complete representation of the system. The gap between diagram and deployed reality is usually substantial. Future work should explore methods for automatically extracting architectural information directly from running systems – a task that will likely expose far more vulnerabilities than any manual modeling effort.

Ultimately, the question isn’t whether ASTRIDE – or any similar platform – can eliminate risk, but whether it can reduce the rate at which production systems outpace the ability to secure them. If code looks perfect, no one has deployed it yet, and deployment is where the real work begins. The long-term trajectory will likely be toward increasingly sophisticated automation, but also toward an acceptance that ‘security’ is a cost center perpetually playing defense against an adversary who only needs to be right once.

Original article: https://arxiv.org/pdf/2512.04785.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Legacy Frameworks Can’t See the Forest for the Algorithms

ASTRIDE: Extending the Old Guard with a Touch of Automation

Visual Threat Analysis: It’s All About the Pictures

Securing the Future: Proactive Defense is the Only Defense

What’s Next?

See also: