Beyond Incident Counts: Mapping the Rise of AI Harms

Author: Denis Avetisyan

A new framework classifies the evolving patterns of AI-related incidents to move beyond simple tracking and toward proactive risk mitigation.

The study demonstrates a framework for interpreting AI incident reports-sourced from the OECD AI Incidents Monitor-by disentangling trends in media attention, system deployment, and actual harm frequency to categorize emerging risks based on estimated harm and exposure, thereby moving beyond simple incident counts to a more nuanced understanding of AI safety trajectories.

This paper proposes a pragmatic classification of AI incident trajectories based on exposure and harm levels to improve governance and monitoring.

Current approaches to tracking AI harms often conflate increasing reporting with actual shifts in risk, hindering effective governance. This challenge is addressed in ‘A pragmatic classification of AI incident trajectories’, which proposes a novel framework for disentangling exposure from harm rates and categorizing incident trends. By combining structured monitoring with LLM-assisted analysis, the work delivers an interpretative classification essential for actionable risk mitigation. As AI deployments proliferate, can a more nuanced understanding of incident trajectories truly enable proactive and informed policy decisions?

Beyond Superficial Counts: Towards Rigorous AI Incident Evaluation

The prevailing methods for tracking adverse events involving artificial intelligence frequently center on tallying reported incidents, a practice that offers a deceptively simple view of actual risk. This reliance on raw numbers obscures the critical distinctions between an incident’s potential for harm and the scope of its exposure – meaning how many systems or individuals were potentially affected. Simply counting occurrences fails to reveal the severity of each incident or whether repeated, minor events indicate a systemic vulnerability. Consequently, decision-makers are left with an incomplete understanding of the true threat landscape, hindering effective mitigation strategies and proactive governance frameworks. This superficial analysis can be particularly misleading as AI systems become increasingly integrated into critical infrastructure, demanding a more sophisticated approach to incident monitoring than mere quantification.

Current AI incident monitoring frequently conflates exposure – the number of individuals or systems potentially affected – with actual hazard, the degree of potential harm. This distinction is critical because a widespread exposure doesn’t necessarily equate to significant risk; a minor glitch affecting millions may be less concerning than a severe vulnerability impacting a smaller, critical group. Without separating these factors, organizations struggle to prioritize mitigation efforts effectively, potentially diverting resources to address broadly visible but low-impact issues while overlooking hidden, high-severity threats. This flawed approach hinders proactive governance, preventing the development of targeted safety protocols and robust risk assessments necessary for responsible AI deployment, and ultimately obscures a true understanding of systemic vulnerabilities.

Responsible artificial intelligence development demands a shift beyond simply counting incidents to thoroughly assessing both the severity of harm and the breadth of its potential impact. While incident numbers offer a basic overview, they fail to differentiate between widespread, minor glitches and isolated, high-consequence failures; an AI system affecting one million users with a negligible error is fundamentally different from one impacting ten with a critical flaw. This necessitates a detailed analysis of ‘exposure’ – the number of individuals or systems potentially affected – alongside the ‘extent of harm’ – the actual damage incurred, ranging from inconvenience to financial loss or even physical danger. Without concurrently quantifying these two dimensions, organizations risk misallocating resources, overlooking systemic vulnerabilities, and ultimately, failing to build truly robust and trustworthy AI systems.

Reliance on simple incident counts offers a dangerously incomplete view of AI risk, particularly as security breaches escalate – a recent surge of 9.8% within the financial sector highlights this growing concern. These ‘raw counts’ fail to differentiate between widespread exposure and actual harm, obscuring crucial patterns that indicate systemic vulnerabilities. An incident affecting a single user differs dramatically from one impacting thousands, yet both register as a single data point in superficial tracking. This lack of granularity hinders effective mitigation strategies and proactive governance, potentially masking critical weaknesses before they are exploited on a larger scale. Consequently, organizations may misinterpret their security posture, believing they are adequately protected when, in reality, they are overlooking significant, emerging threats.

Achieving more meaningful monitoring questions in exposure estimation requires defining a precise exposed population, but this increases complexity; however, improved data quality expands the range of tractable, high-impact questions available.

Applying Epidemiological Principles to AI Safety

Applying a public health lens to AI incident monitoring involves adapting established epidemiological methodologies – traditionally used to track and mitigate disease outbreaks – to the analysis of AI system failures and unintended consequences. This approach prioritizes proactive risk identification and mitigation through systematic data collection, analysis of incident patterns, and the development of targeted interventions. Key to this methodology is the shift from reactive incident response to preventative monitoring, allowing for the identification of emerging risks before they result in widespread harm. This includes defining quantifiable metrics for incident frequency, severity, and scope, and establishing baseline data to track changes over time – mirroring public health surveillance systems. Furthermore, it emphasizes a population-level understanding of risk, considering not just individual incidents but the cumulative impact on affected groups and systems.

Differentiating between Exposure Estimation and Harm Estimation is crucial for effective AI incident monitoring. Exposure Estimation quantifies the number of individuals or systems affected by an AI-related incident, providing a measure of the scope of the event. This is distinct from Harm Estimation, which assesses the severity of the impact on those affected – encompassing factors like financial loss, reputational damage, or physical harm. Separating these two metrics allows for a more nuanced understanding of risk; an incident with high exposure but low harm requires a different response than one with low exposure and high harm. This disaggregation is essential for prioritizing mitigation efforts and allocating resources effectively, especially given the increasing frequency of AI-related incidents, such as the 34.5% increase in data breaches within the financial sector, where both exposure and potential harm are significant concerns.

The SORT Framework facilitates a systematic approach to AI incident monitoring by structuring inquiries around four key components: Subject, defining the affected entities; Opportunity, identifying potential exposure vectors or vulnerabilities; Risk, characterizing the potential harms resulting from an incident; and Timeframe, establishing the relevant period for observation and analysis. Utilizing this framework allows for the development of focused monitoring questions, ensuring data collection is targeted and relevant to specific concerns. This structured methodology moves beyond reactive incident response, enabling proactive identification of emerging risks and facilitating the prioritization of mitigation efforts based on a clear understanding of potential impact and likelihood within a defined temporal context.

The implementation of a structured data collection and analysis framework is increasingly vital for AI safety monitoring, shifting the focus from isolated incident reports to quantifiable, evidence-based risk assessments. This approach is particularly relevant given the observed 34.5% increase in data breaches within the financial sector, demonstrating a clear and growing need for proactive identification and mitigation of AI-related vulnerabilities. Systematic data analysis enables the tracking of incident frequency, severity, and affected populations, allowing for the development of targeted interventions and the measurement of their effectiveness, ultimately moving beyond reactive responses to a preventative, public health-oriented strategy.

Automated Analysis: Scaling Incident Evaluation

Incident Monitoring forms the foundation of our analytical process, leveraging data aggregated from sources including the AIID (Artificial Intelligence Incident Database) and OECD AIM (OECD AI Monitoring). These sources provide a continuous stream of reports detailing incidents related to AI systems, encompassing a range of issues from performance failures and biases to security vulnerabilities and ethical concerns. Data intake prioritizes structured and semi-structured reports, facilitating automated processing and analysis. The scope of monitored incidents includes both publicly reported events and those identified through proactive scanning of relevant data sources, enabling a comprehensive view of the AI incident landscape.

LLM-Powered Filtering is employed to process high volumes of incident reports by automatically matching them to a standardized set of monitoring questions. This process leverages Large Language Models to analyze incoming text and determine relevance to predefined inquiries, reducing the need for manual review. The system identifies key information within each report and assigns it to the appropriate monitoring question, facilitating efficient data aggregation and analysis. This automated matching enables scalable incident analysis and allows analysts to focus on reports requiring deeper investigation, rather than initial categorization.

Incident Classification involves categorizing reported incidents according to pre-defined types – such as cyberattack, data breach, or system failure – and assigning a severity level based on potential impact, ranging from low to critical. This process utilizes structured data fields extracted during initial filtering and leverages a taxonomy of incident characteristics to ensure consistent labeling. Categorization by type and severity enables targeted analysis; for example, high-severity cyberattacks receive immediate attention from security response teams, while low-severity system failures are logged for trend analysis and preventative maintenance. The resulting classifications facilitate efficient resource allocation, improved reporting, and the development of focused mitigation strategies.

Trajectory Classification assesses incident data to detect evolving patterns indicative of increasing risks or positive developments. This process leverages Large Language Models (LLMs) to identify shifts in incident characteristics over time, allowing proactive intervention or resource allocation. Currently, the reliability of LLM-based assessments is being validated through inter-rater agreement analysis. This evaluation focuses on two key metrics: S_match, which measures the agreement on the presence or absence of specific trajectory characteristics, and R_match, which quantifies agreement on the reasoning behind the trajectory classification. Finalization of these metrics will establish a baseline for the accuracy and consistency of the automated trend identification system.

Tiered Estimation: Navigating Uncertainty in AI Risk Assessment

A tiered estimation approach addresses the challenges posed by incomplete or uncertain incident reporting in AI safety assessments. Rather than treating all reported incidents as equally valid, this methodology quantifies both the potential harm resulting from an incident and the exposure – how widely the incident’s effects might propagate – across multiple levels of confidence. This acknowledges that data on AI incidents is often subjective, incomplete, or reliant on self-reporting, leading to inherent uncertainties. By assigning incidents to tiers based on the strength of evidence – ranging from highly corroborated events to preliminary or anecdotal reports – a more realistic and nuanced understanding of risk emerges. This allows for a probabilistic assessment, moving beyond simple binary categorizations of ‘safe’ or ‘unsafe’, and ultimately enabling a more effective allocation of resources toward mitigating the most credible and impactful threats.

Traditional risk assessment often categorizes incidents as simply ‘safe’ or ‘unsafe’, a framework that fails to capture the spectrum of potential harm and the varying degrees of confidence in those assessments. A tiered estimation approach, however, moves beyond this binary limitation by acknowledging that not all incidents are equally understood or pose the same level of threat. This methodology allows for the categorization of risks based on the strength of evidence and the potential severity of consequences, creating a more granular and realistic picture of the overall risk landscape. Instead of a simple pass/fail, incidents can be classified into multiple tiers – for example, ‘negligible’, ‘low’, ‘moderate’, and ‘high’ – each reflecting a different combination of likelihood and impact. This nuanced understanding is crucial for effective resource allocation, enabling decision-makers to prioritize interventions based on the most pressing and well-defined risks, rather than being constrained by overly simplistic categorizations.

A tiered approach to evidence assessment enables a strategic allocation of resources, shifting focus towards incidents demanding immediate attention. Rather than treating all reported events as equal, this methodology categorizes them based on the strength and corroboration of supporting data – ranging from preliminary indications to thoroughly verified occurrences. This differentiation allows for the prioritization of investigations, mitigation efforts, and preventative measures, ensuring that limited resources are directed toward the most impactful and critical areas of concern. Consequently, organizations can move beyond reactive responses and proactively address the highest-risk scenarios, fostering a more robust and efficient AI governance framework and maximizing the effectiveness of safety protocols.

A robust, tiered estimation methodology is foundational to effective AI governance, enabling a shift from reactive responses to proactive safety measures. As reported incidents involving AI systems continue to rise, informed decision-making regarding deployment and risk mitigation becomes paramount; this approach provides the necessary granularity to assess and prioritize concerns. By systematically quantifying both potential harm and exposure levels, organizations can allocate resources strategically, focusing on the most critical areas requiring intervention. This rigorous process doesn’t simply identify risks, but empowers stakeholders to confidently navigate the complexities of AI safety, fostering responsible innovation and minimizing potential negative consequences through data-driven insights.

The pursuit of robust AI governance, as detailed in this classification of incident trajectories, demands a precision mirroring mathematical proof. The framework’s focus on exposure and harm – quantifying the extent of an incident’s reach alongside its negative effect – resonates with the need for provable correctness. Vinton Cerf aptly stated, “The Internet is not just a network of networks; it’s a network of people.” Similarly, AI incident monitoring isn’t simply counting errors; it’s mapping the propagation of harm across a complex system of interactions. A solution exhibiting merely functional behavior is insufficient; the classification must demonstrably reduce risk through verifiable reductions in both exposure and harm, akin to an asymptotic approach to zero error.

Where the Trajectory Leads

The presented classification, while a step toward quantifiable AI governance, fundamentally highlights the enduring tension between pragmatic monitoring and genuine systemic safety. Categorizing incident trajectories is not an end in itself; it is merely a refined method for counting different types of failures. The true challenge remains: to predict these trajectories before exposure escalates to harm. Reliance on post-hoc analysis, however sophisticated, treats symptoms, not causes.

Future work must confront the limitations of relying on observed incidents as the sole source of truth. The framework’s efficacy is intrinsically tied to the completeness of incident reporting – a notoriously optimistic assumption. A rigorous mathematical formalism linking model characteristics (architecture, training data, objective functions) to potential incident classes is desired, but currently absent. Until such a connection is established, the classification remains a descriptive exercise, not a predictive science.

The pursuit of ‘actionable insight’ should not be mistaken for actual problem-solving. Classifying harms, even with nuance, does not obviate the need for provably safe AI systems. One hopes that the field will eventually prioritize the elegance of a correct solution over the convenience of a merely ‘functional’ workaround. The trajectory of AI safety demands a commitment to mathematical certainty, not simply better data aggregation.

Original article: https://arxiv.org/pdf/2604.21412.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Superficial Counts: Towards Rigorous AI Incident Evaluation

Applying Epidemiological Principles to AI Safety

Automated Analysis: Scaling Incident Evaluation

Tiered Estimation: Navigating Uncertainty in AI Risk Assessment

Where the Trajectory Leads

See also: