Beyond AI Safety: Mapping the Spectrum of Harm

Author: Denis Avetisyan


A new taxonomy aims to systematically categorize and address the ethical and security risks posed by increasingly sophisticated artificial intelligence systems.

This review introduces HARM66+, a multi-level harm taxonomy for adversarial AI, enabling improved risk assessment, ethical analysis, and resilience analytics.

Despite the pervasive invocation of “harm” across fields from cybersecurity to ethics, a systematic and analytically rigorous taxonomy remains conspicuously absent. This gap is directly addressed in ‘In Quest of an Extensible Multi-Level Harm Taxonomy for Adversarial AI: Heart of Security, Ethical Risk Scoring and Resilience Analytics’, which introduces HARM66+, a novel framework explicitly defining and enumerating 66+ distinct harm types grounded in contemporary ethical theory. By organizing harms into domains and categories aligned with established ethical principles, and formalizing attributes like reversibility and duration, this work transforms harm from a rhetorical concept into an operational object of analysis. Will this formalized understanding of harm enable more robust risk assessment and, ultimately, the development of safer and more ethical AI systems?


Beyond the Triad: Defining the Evolving Landscape of AI Risk

The established cybersecurity triad of Confidentiality, Integrity, and Availability – long the cornerstone of digital protection – proves increasingly inadequate when addressing the risks presented by contemporary artificial intelligence. These models were designed to safeguard data and systems against deliberate breaches or malfunctions, focusing on discrete, identifiable threats. However, modern AI systems introduce harms that are often subtle, emergent, and operate across multiple domains, extending far beyond simple data compromise or service disruption. An AI-powered disinformation campaign, for example, doesn’t necessarily steal data or break a system, but erodes trust and manipulates public opinion – a harm the CIA triad is ill-equipped to recognize or mitigate. Similarly, biases embedded within AI algorithms can perpetuate systemic discrimination without a traditional security failure occurring. This necessitates a paradigm shift, moving beyond the protection of assets to encompass the broader societal and ethical implications of increasingly pervasive AI technologies.

The emergence of adversarial AI signifies a shift in potential harms, moving beyond traditional cybersecurity concerns like data breaches or system outages. These systems can now induce harms that are diffused across multiple domains – impacting not just technical infrastructure, but also societal structures, economic stability, and even individual well-being. Unlike conventional failures stemming from coding errors or hardware malfunctions, adversarial attacks exploit the very learning processes of AI, leading to unpredictable and often subtle consequences. A compromised image recognition system, for example, could misclassify critical medical scans, while a manipulated natural language processing model could spread disinformation with unprecedented reach. Consequently, a comprehensive risk assessment must extend beyond technical vulnerabilities to encompass the broader socio-technical landscape and account for the cascading effects of AI-driven harms, requiring interdisciplinary approaches and proactive mitigation strategies.

Existing methods for evaluating AI risk often treat harms as isolated incidents, overlooking the critical ways in which multiple factors can combine to produce disproportionately severe outcomes. A seemingly minor technical flaw, for example, can be amplified by societal biases embedded within training data, exacerbated by the speed and scale of automated deployment, and further compounded by a lack of transparency in algorithmic decision-making. This interplay creates a complex web of causation that traditional risk assessments struggle to untangle, highlighting the need for a more nuanced and structured understanding of harm itself – one that moves beyond simply identifying potential failures to analyzing how and why those failures escalate into significant consequences. Such an approach demands a shift toward systems-level thinking, recognizing that the true impact of AI is rarely the result of a single cause, but rather a confluence of interconnected vulnerabilities.

A Formal Taxonomy: Classifying the Spectrum of AI-Induced Harm

The Harm Taxonomy is a systematic categorization of potential damage resulting from AI systems, currently comprising over 66 distinct harm types. These harms are organized into two primary domains – Human Harms, addressing impacts on individuals, and Non-Human Harms, concerning damage to systems and the environment. Within these domains, harms are further classified into 11 specific categories, enabling a granular assessment of potential risks. This structured approach facilitates detailed analysis and comparison of diverse harms, moving beyond broad generalizations to identify specific vulnerabilities and inform mitigation strategies.

The Harm Taxonomy differentiates between harms affecting individuals – categorized as Human Harms – and those impacting entities beyond direct human experience, defined as Non-Human Harms. Human Harms encompass psychological, physical, social, and economic damages to persons, while Non-Human Harms address impacts on infrastructure, ecosystems, and other complex systems. This dual categorization ensures comprehensive coverage by acknowledging potential damage extending beyond immediate human well-being, including harms to critical resources, biodiversity, and the stability of engineered or natural environments. The inclusion of both categories allows for a holistic assessment of AI-driven risks, recognizing that damage can manifest across multiple domains.

The Harm Taxonomy’s foundational strength lies in its alignment with eleven contemporary ethical theories – including utilitarianism, deontology, virtue ethics, and care ethics – which provide a robust normative framework for both identifying potential harms and evaluating their significance. This grounding moves beyond subjective assessments by offering multiple, well-established ethical lenses through which to analyze each harm type, ensuring a more comprehensive and justifiable assessment of risk. By referencing these theories, the taxonomy establishes a clear rationale for classifying specific outcomes as harmful, facilitating consistent application and enabling nuanced comparisons between different harm profiles.

Quantifying Harm: Dimensions of Severity and Moral Weight

The Harm Taxonomy quantifies the severity of potential negative impacts by assessing harm along the dimensions of Irreversibility and Duration. Irreversibility refers to the extent to which a harmed entity can return to its pre-harm state; impacts classified as fully irreversible represent permanent alterations. Duration of Harm measures the length of time an entity experiences negative consequences, ranging from immediate and transient effects to chronic, long-term suffering. These dimensions are not mutually exclusive and are used in combination to generate a composite severity score, allowing for comparative analysis of different potential harms and facilitating a more granular ethical evaluation. The framework utilizes standardized metrics for both dimensions to ensure consistent and objective assessment.

Victim Classification is a central component of harm assessment, operating on the principle that the ethical weight of a harmful event is directly influenced by the moral status of the entity experiencing that harm. This classification isn’t simply about legal personhood, but rather a nuanced evaluation of sentience, cognitive capacity, and the capacity to experience suffering. Assigning differing levels of moral consideration allows for a more refined ethical evaluation; harm inflicted upon entities with higher moral status is considered more significant than equivalent harm to entities with lower status. This framework acknowledges that harm is not solely determined by the nature of the impact, but also by who or what is impacted, enabling a more precise quantification of severity.

The Harm Taxonomy’s validation process involved assessing inter-rater agreement on harm categorization, resulting in strong consensus – indicated by a mean consensus level of 2.0 or greater with a p-value less than 0.05 – for nine distinct harm categories. An additional set of harm categories demonstrated moderate consensus, achieving a mean consensus level of 1.5 or greater, also with a p-value less than 0.05. These statistically significant consensus levels across multiple categories support the framework’s reliability and objectivity in identifying and classifying potential harms.

Mapping the Spectrum: Manifestations of AI-Driven Harm

A comprehensive harm taxonomy applied to artificial intelligence reveals a surprisingly broad spectrum of potential negative consequences extending far beyond simple errors or malfunctions. These systems are now demonstrably capable of inflicting Psychological Harm, contributing to anxiety, depression, and diminished self-worth through manipulative design or biased outputs. Simultaneously, the scope of impact reaches into the physical world, with AI-driven processes contributing to Environmental Harm – from exacerbating resource depletion through optimized consumption patterns, to disrupting delicate ecosystems via autonomous robotics and predictive modeling that prioritizes short-term gains. This framework underscores that the risks associated with AI are not merely theoretical; they represent tangible threats to both individual well-being and planetary health, necessitating proactive mitigation strategies and ethical guidelines.

The increasing sophistication of artificial intelligence systems presents a subtle yet pervasive risk: the erosion of individual agency. This isn’t necessarily about malicious intent, but rather a consequence of systems designed to optimize for specific outcomes, often at the expense of human control. The harm taxonomy reveals that AI-driven recommendations, automated decision-making, and even seemingly benign assistive technologies can gradually diminish a person’s ability to make independent choices. As individuals increasingly defer to algorithms for tasks ranging from navigation and purchasing to financial planning and healthcare, their capacity for self-determination weakens. This loss of agency isn’t merely a matter of convenience; it represents a fundamental shift in the human experience, potentially leading to feelings of powerlessness, learned helplessness, and a diminished sense of self – consequences the taxonomy underscores as particularly concerning in the age of increasingly autonomous systems.

A robust harm taxonomy proves particularly effective in dissecting and addressing the pervasive issue of algorithmic bias. This framework doesn’t merely identify unfair outcomes, but categorizes the specific mechanisms driving them – be it biased training data, flawed model design, or discriminatory feature selection. By pinpointing these sources of inequity, the taxonomy facilitates targeted mitigation strategies, allowing developers and policymakers to move beyond abstract concerns about fairness towards concrete interventions. This granular approach enables a proactive assessment of AI systems, helping to prevent the perpetuation – and even amplification – of societal biases within automated decision-making processes, ultimately fostering more equitable and just technological applications.

The pursuit of HARM66+, as detailed in the article, echoes a fundamental principle of mathematical rigor. It isn’t simply about identifying potential harms from adversarial AI – but constructing a formal, extensible system for their categorization and analysis. This aligns perfectly with Andrey Kolmogorov’s assertion: “The essence of mathematics is freedom.” The taxonomy isn’t merely a descriptive list, but a framework allowing for precise definition and provable relationships between different harm types. By grounding the taxonomy in ethical theory, the article aims to move beyond ad-hoc risk assessment and toward a more principled and reliable method for evaluating the socio-technical risks posed by AI systems, mirroring the mathematical pursuit of axiomatic truth.

What’s Next?

The presentation of HARM66+ offers a structured vocabulary, yet the true test resides not in its breadth, but in its demonstrable consistency. Classifying harm is, after all, an exercise in applied epistemology-a mapping of observed phenomena onto abstract categories. The taxonomy’s utility will be determined by its capacity to resolve ambiguity, not merely enumerate potential failures. A rigorous mathematical formulation of inter-category relationships-precisely defining the boundaries and overlaps-remains a critical, and conspicuously absent, component.

Current approaches largely rely on subjective assessments, inevitably introducing bias. The field requires a move towards provable classifications. A harm, to be meaningfully addressed, must be definable with logical precision. The articulation of formal proofs-demonstrating the necessary and sufficient conditions for a given action to constitute a particular harm-should be considered the gold standard. Until then, assessments will remain, at best, educated guesses.

Further refinement necessitates a shift from descriptive categorization to predictive modeling. Can HARM66+ serve as the foundation for algorithms capable of anticipating potential harms before they manifest? Such a capacity would elevate the taxonomy from a post-hoc analysis tool to a proactive safeguard. The pursuit of such predictive power, however, demands a level of formalization currently lacking in most adversarial AI research.


Original article: https://arxiv.org/pdf/2601.16930.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-01-26 16:29