Quantifying the AI Threat Landscape

Author: Denis Avetisyan


A new methodology moves beyond qualitative assessments to model cybersecurity risks amplified by artificial intelligence.

Quantitative risk modeling offers a demonstrable advantage by translating uncertain future events into probabilistic outcomes, thereby enabling proactive mitigation strategies and informed decision-making under conditions of inherent financial exposure, ultimately shifting the focus from reactive crisis management to preemptive resource allocation based on calculated probabilities-a transformation crucial for sustainable portfolio health.
Quantitative risk modeling offers a demonstrable advantage by translating uncertain future events into probabilistic outcomes, thereby enabling proactive mitigation strategies and informed decision-making under conditions of inherent financial exposure, ultimately shifting the focus from reactive crisis management to preemptive resource allocation based on calculated probabilities-a transformation crucial for sustainable portfolio health.

This review details a quantitative approach to AI-enabled threat modeling, utilizing Monte Carlo simulation and large language models to assess nine risk scenarios.

Despite increasing awareness of potential harms, cybersecurity risk assessment often remains qualitative, hindering proactive mitigation strategies. This technical report, ‘Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse’, introduces a methodology for quantifying the uplift in cyberattack efficacy driven by advances in artificial intelligence. Applying this approach to nine distinct risk scenarios, we demonstrate systematic increases in attack speed, reach, and success probability, leveraging both human expert assessments and large language model-based simulations to estimate key risk factors. Can a shift toward data-driven, quantitative risk modeling-similar to that seen in industries like nuclear power-enable more informed decision-making for cybersecurity teams, AI developers, and policymakers alike?


The Illusion of Control: Modern Threats and Reactive Security

Conventional cybersecurity measures, designed to counter predictable threat patterns, are increasingly challenged by the adaptability of modern cyberattacks. Attackers are now routinely integrating artificial intelligence and machine learning to automate discovery, evade detection, and optimize the effectiveness of malicious code. This shift introduces a dynamic element previously uncommon, allowing attacks to morph in real-time and overcome signature-based defenses. The speed and scale at which these AI-powered attacks operate often overwhelm traditional security systems, which rely on human analysis and predefined rules. Consequently, organizations find themselves in a constant state of reaction, struggling to defend against threats that evolve faster than they can be understood and neutralized. The limitations of legacy systems highlight the urgent need for proactive, AI-driven security solutions capable of anticipating and mitigating these advanced threats.

The financial and operational consequences of successful cyberattacks are no longer incremental; they are escalating at an exponential rate. Recent analyses indicate that the average cost of a data breach or system compromise now falls between $165,000 and $815,000 annually, figures that represent a substantial increase over previous years. However, these monetary losses only partially capture the full scope of the damage, as disruptions to critical infrastructure – encompassing energy grids, healthcare systems, and financial networks – pose significant threats to public safety and economic stability. Beyond immediate financial repercussions, organizations face long-term consequences including reputational damage, legal liabilities, and the erosion of customer trust. The increasing interconnectedness of digital systems and the growing sophistication of threat actors are driving this upward trend, demanding a proactive and robust approach to cybersecurity that anticipates and mitigates these expanding risks.

A robust understanding and precise quantification of evolving cyber risks are now fundamental to constructing effective defense and mitigation strategies. Traditional assessments, often reliant on historical data, struggle to anticipate the dynamic and adaptive nature of modern threats, particularly those employing artificial intelligence. Accurate risk quantification allows organizations to prioritize security investments, allocate resources efficiently, and develop proactive measures tailored to the most likely and impactful attack vectors. This moves cybersecurity beyond reactive patching and incident response towards a predictive, resilience-focused approach, minimizing potential damage and ensuring business continuity. Without a clear understanding of the threat landscape – including the probability and potential impact of various attacks – organizations operate with incomplete information, increasing their vulnerability and potentially facing catastrophic consequences.

Conventional cybersecurity risk assessments largely rely on identifying known threat vectors and vulnerabilities, a strategy proving increasingly ineffective against attacks orchestrated by artificial intelligence. These methodologies struggle to anticipate the adaptive and polymorphic nature of AI-driven malware, which can rapidly evolve to evade detection and exploit previously unknown system weaknesses. Furthermore, current frameworks often fail to account for the scale and speed at which AI can automate reconnaissance, target selection, and attack execution, leading to a significant underestimation of potential damage. The reliance on historical data and static threat models creates a reactive posture, while sophisticated AI attacks demand proactive, predictive risk assessments capable of modeling emergent threats and quantifying the potential impact of intelligent adversaries.

Current AI safety frameworks predominantly utilize if-then scenarios to govern AI behavior, reflecting typical industry practice.
Current AI safety frameworks predominantly utilize if-then scenarios to govern AI behavior, reflecting typical industry practice.

From Theory to Practice: A Quantitative Approach to AI-Enabled Cyber Risk

The Quantitative Risk Modeling Methodology comprises six sequential steps: defining the scope of AI-enabled cyber offense scenarios; identifying relevant threat actors and their capabilities; utilizing the MITRE ATT&CK Framework to map tactics, techniques, and procedures (TTPs); estimating the probability of success for each TTP based on expert elicitation and Large Language Model (LLM)-simulated expert input; calculating the potential impact of a successful attack, incorporating asset value and vulnerability data; and finally, aggregating these probabilities and impacts to generate an overall quantitative risk score. This process facilitates a structured assessment of risk, moving beyond qualitative evaluations to provide a numerical representation of potential cyber threats originating from AI-enabled attacks.

The risk assessment methodology utilizes a combined approach to estimating critical risk factors, incorporating insights from both human cybersecurity experts and Large Language Model (LLM)-simulated experts. Human expert elicitation provides established knowledge and nuanced judgment regarding threat landscapes and vulnerability assessments. Complementing this, LLM-simulated experts, trained on extensive cybersecurity data, offer scalable and consistent estimations, particularly for rapidly evolving threat vectors. This dual approach aims to mitigate biases inherent in single-source estimations and enhance the robustness of risk factor assessments, including estimations of adversary capability and the likelihood of successful exploitation of vulnerabilities. The combined data is then used as input to the quantitative risk model.

The MITRE ATT&CK Framework is a knowledge base of adversary tactics and techniques based on real-world observations. It provides a standardized language to describe attacker behavior, categorizing tactics – such as initial access, execution, persistence, and privilege escalation – and detailing specific techniques within each. This framework enables a common operational picture for cybersecurity professionals, facilitating threat modeling, red team exercises, and the development of effective defenses. By mapping observed and simulated attacker behaviors to ATT&CK techniques, our quantitative risk modeling methodology benefits from a well-defined and consistently updated understanding of potential attack paths, allowing for more accurate assessment of likelihood and impact.

The methodology delivers quantitative risk assessments by moving beyond descriptive, qualitative analyses. Through the integration of human expert knowledge, Large Language Model (LLM)-simulated expertise, and the MITRE ATT&CK framework, the model generates a calculated probability of a successful cyber attack. Initial model runs, parameterized by elicited data and LLM outputs, currently estimate this probability at 0.064. This figure is derived from the aggregation of probabilities associated with individual ATT&CK techniques and tactics, weighted by their likelihood of exploitation and potential impact, offering a basis for prioritization of mitigation strategies and resource allocation.

Simulations using large language models corroborate human expert estimations of a substantial multiplicative increase in risk when attackers gain access to AI systems, with error bars indicating the uncertainty inherent in these estimations.
Simulations using large language models corroborate human expert estimations of a substantial multiplicative increase in risk when attackers gain access to AI systems, with error bars indicating the uncertainty inherent in these estimations.

Pinpointing the Weakness: Estimating Probability and Impact

Risk factor estimation forms the foundation of our risk quantification methodology by systematically assessing both the likelihood of a successful cyberattack and the magnitude of its resulting impact. This process involves identifying relevant threat actors, analyzing their capabilities and motivations, and evaluating the vulnerabilities present within a target system. The probability of a successful attack is not treated as a single point estimate but rather as a distribution reflecting the uncertainty inherent in predicting adversary behavior. Impact assessment encompasses a range of potential damages, including financial losses from ransom payments, data breach notification costs, legal fees, and the expenses associated with system recovery, business disruption, and reputational damage. Accurate estimation of these factors allows for the calculation of an expected loss, expressed as $Expected\,Loss = Probability\,of\,Breach \times Impact\,of\,Breach$, enabling informed decision-making regarding security investments and risk mitigation strategies.

Adversary tactics, as cataloged within the MITRE ATT&CK framework, are directly correlated with breach success rates because they represent specific, observable behaviors attackers utilize throughout the cyber kill chain. The framework details techniques – such as phishing, exploitation of public-facing applications, and credential dumping – each with varying levels of complexity and associated likelihood of success. By mapping observed or anticipated attacker behaviors to ATT&CK techniques, organizations can assess the probability of a successful breach based on the prevalence and effectiveness of those specific tactics. Techniques involving lateral movement and privilege escalation, for example, demonstrate a higher probability of successful data exfiltration compared to initial reconnaissance activities. Utilizing ATT&CK allows for a granular, behavior-based assessment of risk, moving beyond generalized threat actor profiles to focus on the how of an attack.

The increasing sophistication of artificial intelligence (AI) tools directly impacts the threat landscape by enhancing attacker capabilities. Specifically, AI facilitates the automation of reconnaissance, vulnerability exploitation, and social engineering attacks, increasing both the speed and scale of potential breaches. AI-powered tools can also bypass traditional security measures, such as signature-based detection systems, through the generation of polymorphic malware and adaptive attack patterns. This ultimately lowers the barrier to entry for malicious actors and increases the probability of successful attacks against targeted systems and data.

Risk quantification utilizes Monte Carlo Simulation to address inherent uncertainties in estimating financial impact. This method models a distribution of possible outcomes, resulting in a total estimated risk per attack of $815,000. This figure is derived from two primary cost components: estimated ransom payouts and recovery expenses. Ransom payouts are calculated at $165,000, based on a 30% probability of payment given a mean payout value of $550,000. Recovery costs, sourced from Sophos 2025 data for Small and Medium Enterprises (SMEs), are estimated at $650,000. The simulation accounts for variance in these factors to provide a probabilistic range of potential financial losses.

Shapley values reveal how saturating key response indicators (KRIs) contribute to increased probability of successful attack step completion across various MITRE ATT&CK tactics.
Shapley values reveal how saturating key response indicators (KRIs) contribute to increased probability of successful attack step completion across various MITRE ATT&CK tactics.

Beyond the Benchmark: Validating and Refining the Model

The methodology’s accuracy was rigorously tested using established cybersecurity benchmarks, specifically the BountyBench Benchmark and Cybench Benchmark. These evaluations assessed the framework’s capacity to correctly identify and quantify risk factors within simulated threat landscapes. By subjecting the system to these industry-standard tests, researchers could confirm its ability to reliably gauge the potential for exploitation and prioritize vulnerabilities. The benchmarks provided a controlled environment for measuring performance against known attack vectors, ultimately demonstrating the robustness of the approach in assessing real-world cybersecurity risks and informing proactive defense strategies.

Rigorous benchmarking against platforms like BountyBench and Cybench is crucial for gauging the practical effectiveness of artificial intelligence in cybersecurity. These assessments move beyond theoretical capabilities, simulating the complex landscape of real-world vulnerabilities and attack vectors. By evaluating AI’s performance in identifying and exploiting weaknesses under realistic conditions, researchers can determine its true potential for proactive threat detection and response. The benchmarks specifically test the AI’s ability to mimic the techniques of threat actors, pinpointing vulnerabilities before malicious entities can capitalize on them. This realistic appraisal is vital for establishing trust in AI-driven security solutions and ensuring they deliver tangible benefits to organizations facing increasingly sophisticated cyber threats.

Rigorous testing of the methodology reveals its practical value for cybersecurity professionals, extending beyond theoretical risk assessment to provide quantifiable intelligence. Analysis indicates approximately ten active malicious actors currently operating, each demonstrating an average of 200 attack attempts annually. This data-driven insight allows security teams to move beyond generalized threat landscapes and focus resources on a defined set of adversaries. The framework doesn’t simply identify vulnerabilities; it estimates the frequency of exploitation attempts, enabling organizations to prioritize defenses based on realistic attack volume and proactively mitigate the most pressing threats. Such granular detail empowers a shift from reactive incident response to a more strategic, preventative cybersecurity posture.

The developed framework furnishes organizations with the capacity to strategically allocate cybersecurity resources by pinpointing the most probable and impactful threats. Rather than reacting to incidents, this proactive approach enables preemptive strengthening of defenses against anticipated attacks. By quantifying risk and estimating the scale of potential adversary activity – approximately ten active actors each launching an average of 200 attacks annually – the system facilitates informed decision-making. This allows security teams to focus on fortifying critical assets and implementing targeted mitigation strategies, ultimately reducing the organization’s overall vulnerability and minimizing potential damage from emerging cyber threats.

The fully parametrized OC3 ransomware risk model leverages evidence from BountyBench and Cybench indicators to assess potential threats.
The fully parametrized OC3 ransomware risk model leverages evidence from BountyBench and Cybench indicators to assess potential threats.

The pursuit of quantitative modeling, as detailed in this paper, feels predictably ambitious. It’s attempting to apply rigorous mathematics to the fundamentally messy reality of human malice and emergent AI behaviors. The authors propose Monte Carlo simulations and LLM-assisted risk factor estimation – elegant solutions, certainly. But one anticipates the inevitable refinement cycles, the corner cases production systems will invariably reveal. As Carl Friedrich Gauss observed, “If I speak for my own benefit, I am ridiculous; if for the benefit of others, I am an oracle.” This research aims to be the latter, yet history suggests even the most sophisticated models will require constant recalibration when faced with real-world exploitation. The quantification of ‘AI uplift’ in attack vectors, while valuable, is simply another layer of abstraction atop an inherently unpredictable system.

What’s Next?

The pursuit of quantifying AI-enabled cybersecurity risk, as demonstrated, inevitably bumps against the age-old problem of garbage in, garbage out. Monte Carlo simulations will dutifully churn, and LLMs will confidently assign probabilities, but the underlying estimations of ‘AI uplift’ in attack efficacy remain… optimistic, at best. It’s a clever framework, certainly, and will undoubtedly produce spreadsheets that look authoritative. The real question is whether those numbers meaningfully reduce actual breaches, or merely offer a more precise accounting of disaster when it arrives.

The immediate future will likely involve a proliferation of bespoke risk models, each tailored to a specific threat landscape and boasting ever-more-granular LLM evaluations. These will, predictably, be incompatible with each other, requiring further layers of abstraction and, inevitably, introducing new sources of error. It’s a comforting illusion that a complex model is a better model, even when simpler heuristics achieved roughly the same result.

Ultimately, this work feels less like a fundamental breakthrough and more like a sophisticated reframing of existing problems. The core challenge – anticipating malicious actors – remains stubbornly resistant to quantification. One suspects that in a decade, someone will look back on this era of ‘AI risk modeling’ and wonder why everyone thought automating gut feelings was a good idea. Everything new is just the old thing with worse docs.


Original article: https://arxiv.org/pdf/2512.08864.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-10 15:18