Tracing AI’s Roots: A New Framework for Trust and Risk

Author: Denis Avetisyan

Understanding the complex network of components and actors behind artificial intelligence is becoming crucial for safe and reliable deployment in critical systems.

This review proposes a taxonomy to map the AI supply chain, enabling improved risk management, data provenance tracking, and model governance.

Despite growing attention to risks associated with artificial intelligence, a systematic understanding of the complex supply chain underpinning modern AI systems remains a critical gap. This paper, ‘Identifying the Supply Chain of AI for Trustworthiness and Risk Management in Critical Applications’, addresses this challenge by proposing a novel taxonomy for categorizing entities within the AI supply chain-from data sources to deployed models-to facilitate improved risk assessment and governance. Our work bridges the divide between current AI governance practices and the urgent need for actionable risk management, particularly in high-stakes applications like healthcare and finance. How can increased supply chain visibility empower organizations to build more trustworthy and resilient AI systems?

The Algorithmic Foundation: Dependencies and Interconnections

Contemporary artificial intelligence isn’t a singular entity, but rather a meticulously layered architecture. At its core lies data – vast quantities used for both training and ongoing operation – inextricably linked to the algorithms, or models, that interpret it. These models aren’t self-executing; they require supporting programs – software libraries, operating systems, and specialized code – to function. Crucially, all of this relies on physical infrastructure: servers, networking, and power supplies. This interwoven dependency means the performance of an AI system isn’t solely defined by its algorithmic sophistication; a flaw in any component – a data corruption, a software bug, a server outage – can cascade through the entire system, impacting its reliability and potentially leading to unpredictable outcomes. The very nature of these systems establishes a complex web of interconnectedness that demands careful consideration and proactive management.

An artificial intelligence system’s capabilities are inextricably tied to the quality of its underlying components. The performance of a model, for instance, is fundamentally limited by the data used in its training – biased or incomplete datasets inevitably lead to flawed outputs. Similarly, the reliability of an AI depends on the integrity of the programs and infrastructure supporting it; a software bug or a server outage can halt operations entirely. This interconnectedness means that even a seemingly minor flaw in one foundational element can cascade into significant systemic failures, highlighting the critical need for continuous monitoring and rigorous testing of all components to ensure consistent and trustworthy results. Consequently, investment in robust data pipelines, secure infrastructure, and thoroughly vetted code is not merely a technical detail, but a prerequisite for deploying dependable AI solutions.

Acknowledging the interwoven components that constitute an artificial intelligence system – encompassing data sources, algorithmic models, supporting programs, and physical infrastructure – represents a crucial initial phase in establishing effective oversight and mitigating potential risks. A comprehensive grasp of these dependencies allows for the proactive identification of vulnerabilities, enabling the development of targeted safeguards against both unintentional failures and deliberate malicious interference. This foundational understanding isn’t merely a technical exercise; it’s the bedrock upon which responsible AI governance is built, facilitating the creation of policies and protocols that ensure system reliability, data security, and ethical operation. Without this initial step, attempts at AI risk management remain superficial, leaving systems exposed to unforeseen consequences and hindering the realization of their full potential.

Artificial intelligence systems, despite their apparent sophistication, rest on a delicate foundation of interconnected components. A lack of diligent assessment regarding these dependencies introduces significant vulnerabilities, potentially leading to unpredictable system failures or enabling malicious exploitation. Compromised data pipelines, flawed model training, software bugs, or infrastructure weaknesses can all cascade into critical errors, disrupting functionality and eroding trust. This isn’t merely a technical concern; adversaries can deliberately target these dependencies – poisoning training data, injecting malicious code, or launching denial-of-service attacks – to compromise system integrity and achieve harmful outcomes. Therefore, a proactive and comprehensive understanding of these foundational dependencies is paramount for building resilient and secure AI applications.

Mapping the Ecosystem: Creators and Hosts Defined

The AI System lifecycle involves a diverse range of entities functioning as either creators or hosts. Creators – encompassing Data Creators, Model Creators, and Data Aggregators – are responsible for the origination and initial quality of the data and models utilized by the system. Conversely, hosts provide the necessary environment for operation, categorized as Data Hosts, Model Hosts, Program Hosts, and Infrastructure Hosts. These hosts maintain and deliver the data, models, and computational resources required throughout the AI system’s operational phases. This division of labor necessitates clear identification and mapping of these entities to facilitate effective risk management and establish accountability within the broader AI ecosystem.

Data Creators generate the initial datasets used for training and evaluation, bearing responsibility for accuracy, relevance, and potential biases present in the raw information. Model Creators design, develop, and train the AI models themselves, and are accountable for the model’s architecture, algorithms, and performance characteristics. Data Aggregators collect, curate, and often re-label datasets from multiple Data Creators, assuming responsibility for the integrity and consistency of the combined dataset and any transformations applied. The quality of information originating from these three entity types directly impacts the reliability, fairness, and overall performance of the AI System, necessitating clear documentation of data provenance and model development processes.

Effective operation of an AI System relies on several distinct hosting entities. Data Hosts provide storage and access to the datasets used for training and inference. Model Hosts deploy and serve the trained AI models, making them available for applications. Program Hosts execute the software code that orchestrates the AI System, including data processing, model invocation, and result handling. Finally, Infrastructure Hosts supply the underlying computing resources – servers, networking, and storage – necessary for all other hosting functions. These roles can be fulfilled by the same entity or distributed across multiple organizations, but their combined function is essential for the AI System’s lifecycle.

Precisely identifying and mapping the entities involved in the AI System lifecycle – data creators, model creators, hosts, and aggregators – is fundamental to establishing clear lines of responsibility and accountability. This process enables effective risk management by providing a framework to determine which entity is responsible for specific aspects of the system, including data quality, model performance, and operational security. The lightweight taxonomy detailed in this paper facilitates this mapping, allowing organizations to assign ownership for potential harms or failures and to implement targeted mitigation strategies. Without this clear attribution, addressing issues such as bias, inaccuracies, or security vulnerabilities becomes significantly more complex and potentially ineffective.

A Holistic Approach: Proactive Risk Management Framework

An effective AI Risk Management Framework is critical for organizations deploying AI systems, as it provides a structured methodology for identifying potential hazards, evaluating their likelihood and impact, and implementing appropriate mitigation strategies across all phases of the AI System lifecycle. This lifecycle encompasses initial planning and data acquisition, model development and training, deployment and operation, and eventual decommissioning. A robust framework facilitates the proactive management of risks related to data quality, model bias, security vulnerabilities, regulatory compliance, and unintended consequences, thereby minimizing potential harm and maximizing the benefits of AI adoption. Consistent application of such a framework enables organizations to demonstrably address AI-related risks and maintain stakeholder trust.

An AI Risk Management Framework requires comprehensive mapping of interdependencies within the AI System. This includes the relationship between input data – its source, quality, and potential biases – and the model’s performance; the code and algorithms comprising the model itself; and the underlying infrastructure supporting both data processing and model deployment. Furthermore, the framework must delineate the responsibilities of all involved parties, including data providers, model developers, software engineers, and the organizations hosting and maintaining the system, to ensure accountability and effective risk mitigation across the entire lifecycle.

Supply chain visibility within an AI system necessitates the tracking of all constituent components – including datasets used for training, model architectures, code libraries, and the underlying infrastructure – from their origin through deployment and ongoing operation. This tracking includes documenting provenance, versioning, and any modifications made to these components. Detailed visibility allows organizations to assess the security and integrity of each element, identify potential vulnerabilities introduced through third-party dependencies, and maintain a comprehensive audit trail for compliance and incident response. Effective implementation requires robust data governance policies, automated dependency tracking tools, and clear documentation of the entire AI system lifecycle, extending beyond the immediate development team to encompass all contributing parties and hosting environments.

A proactive and comprehensive AI risk management approach is enabled by implementing a dedicated framework, and its effectiveness is significantly improved through the application of a standardized taxonomy. This taxonomy facilitates consistent identification, categorization, and assessment of risks across the entire AI system lifecycle – encompassing data sources, model development, deployment infrastructure, and associated personnel. By providing a common language and structure for risk analysis, the framework ensures that potential vulnerabilities are not overlooked, and mitigation strategies can be applied systematically. This methodology moves beyond reactive responses to incidents and allows organizations to anticipate and address risks before they materialize, thereby minimizing potential negative impacts and fostering responsible AI development and deployment.

Standardizing Transparency: The Imperative of SBOMs for AI

Artificial intelligence systems, increasingly complex and interwoven with numerous software and data dependencies, demand rigorous documentation for trust and security. The implementation of Software Bill of Materials (SBOMs) standards, notably SPDX and CycloneDX, addresses this need by providing a standardized, machine-readable inventory of these components. These SBOMs aren’t merely lists; they detail the origin, version, and relationships between each element, encompassing not only code libraries but also training datasets and even hardware configurations. This detailed accounting enables organizations to thoroughly assess the integrity of the AI system, identify potential vulnerabilities stemming from compromised dependencies, and ensure compliance with emerging regulations focused on AI transparency and accountability. By adopting these standards, developers and deployers move beyond opaque “black box” models towards demonstrably trustworthy and resilient AI solutions.

A critical step toward trustworthy artificial intelligence lies in comprehensively documenting its constituent parts, and standardized formats like Software Bill of Materials (SBOMs) provide the necessary framework. These SBOMs detail not only the software dependencies – the libraries, frameworks, and tools used in development – but also the data components, including datasets used for training and validation. By listing these dependencies in a machine-readable, standardized format, organizations gain the ability to rigorously verify the integrity of each component, tracing its origin and confirming it hasn’t been tampered with. This provenance tracking is essential for identifying potential vulnerabilities, managing risks associated with open-source components, and ensuring adherence to evolving regulatory requirements focused on responsible AI development and deployment. The ability to reliably audit the complete lineage of an AI system, from code to data, is quickly becoming a cornerstone of trust and accountability in the field.

Software Bill of Materials (SBOMs) are rapidly becoming essential tools for proactive security and responsible AI development by enabling systematic vulnerability management and comprehensive risk assessment. Through detailed inventories of an AI system’s components – including libraries, models, and datasets – organizations can quickly identify and address potential weaknesses before they are exploited. This granular visibility extends beyond immediate vulnerabilities to encompass license compliance and supply chain risks, facilitating informed decision-making. Furthermore, the adoption of standardized SBOM formats supports adherence to emerging regulatory requirements concerning AI safety and trustworthiness, such as those focused on algorithmic transparency and data provenance, effectively demonstrating due diligence and fostering accountability within the rapidly evolving AI landscape.

The increasing complexity of artificial intelligence systems demands a new level of scrutiny, and the adoption of standardized documentation like Software Bill of Materials (SBOMs) promises to deliver precisely that. By mandating a clear listing of all software and data dependencies, SBOMs move the AI ecosystem towards greater transparency, allowing stakeholders to verify the origins and integrity of these powerful technologies. This heightened visibility isn’t merely about identifying potential vulnerabilities – though that is critical – but also about establishing clear lines of accountability for the AI’s behavior and impact. With standardized SBOMs, organizations can proactively manage risk, ensure compliance, and foster trust in AI systems, ultimately paving the way for responsible innovation and wider adoption.

The pursuit of a comprehensive AI supply chain taxonomy, as detailed in this work, echoes a fundamental principle of mathematical rigor. It necessitates a precise delineation of components-data providers, model developers, deployment infrastructure-and their interdependencies. This mirrors the need for axiomatic clarity. As Henri Poincaré stated, “Mathematics is the art of giving reasons.” The taxonomy proposed isn’t merely a categorization scheme; it’s a reasoned framework for establishing data provenance and enabling robust risk management, particularly vital for critical applications. The work’s emphasis on traceability and visibility within the AI lifecycle isn’t simply about avoiding failure; it’s about ensuring logical completeness and demonstrable correctness, echoing Poincaré’s conviction that a solution must be provable, not just empirically observed.

What’s Next?

The formalization of an AI supply chain, as presented, merely shifts the locus of difficulty. A taxonomy, however meticulously constructed, is a static representation of a dynamic system. The true challenge lies not in identifying the components, but in verifying their internal consistency. One anticipates a proliferation of ‘trustworthiness’ metrics, each inevitably based on assumptions that are themselves unprovable. The field risks becoming awash in certifications devoid of mathematical grounding – a digital Potemkin village of security.

Future work must prioritize the development of verifiable computation techniques applicable to each stage of the supply chain. Data provenance, for instance, is not sufficient; one requires proof that the data itself has not been maliciously altered, or subtly biased through flawed collection methodologies. Similarly, model governance demands more than simply documenting training procedures; it necessitates formal verification of model behavior under adversarial conditions. The consistency of the entire chain is only as strong as its weakest link – a fact too often ignored in the rush to deploy.

Ultimately, the pursuit of ‘trustworthy AI’ will reveal a fundamental truth: the complexity of these systems exceeds the capacity for complete assurance. The goal, therefore, should not be to eliminate risk, but to quantify it with rigor, and to design systems that degrade gracefully in the face of inevitable uncertainty. Only then will the exercise transcend mere rhetoric and approach something resembling genuine engineering.

Original article: https://arxiv.org/pdf/2511.15763.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Algorithmic Foundation: Dependencies and Interconnections

Mapping the Ecosystem: Creators and Hosts Defined

A Holistic Approach: Proactive Risk Management Framework

Standardizing Transparency: The Imperative of SBOMs for AI

What’s Next?

See also: