Designing AI That Works: A New Framework for Agentic Systems

Author: Denis Avetisyan

A structured approach to building and governing autonomous AI projects is essential for realizing their full potential and ensuring responsible development.

This paper introduces the Agentic Automation Canvas, a tool for designing, documenting, and governing agentic AI projects with a focus on FAIR Data principles, RO-Crate packaging, and quantifiable benefits.

Despite the rapid deployment of agentic AI systems across diverse domains, a standardized methodology for their prospective design, governance, and evaluation remains absent. This paper introduces the Agentic Automation Canvas (AAC), a structured framework designed to address this gap by facilitating clearer communication and improved project documentation. The AAC captures key dimensions of automation projects-from defined scope and quantified benefits to data sensitivity and governance staging-and implements them as a FAIR-compliant, machine-readable metadata schema. By enabling the creation of versioned, shareable ‘project contracts’, can the AAC foster more transparent and interoperable agentic AI development workflows?

The Inevitable Mess: Governing the Ghosts in the Machine

The rapid advancement of artificial intelligence is outpacing the development of comprehensive governance structures, creating a landscape where unpredictable outcomes are increasingly common. This absence of robust frameworks isn’t merely a technical oversight; it fundamentally erodes public trust in these powerful systems. Without clear guidelines and oversight mechanisms, AI deployments risk perpetuating biases, violating privacy, or exhibiting unintended and potentially harmful behaviors. This lack of predictability isn’t limited to edge cases; it permeates many AI applications, hindering widespread adoption and limiting the potential benefits of the technology. Consequently, stakeholders – from developers and policymakers to end-users – require more than just ethical principles; they need enforceable standards and verifiable safeguards to ensure AI systems operate responsibly and reliably.

While frameworks like the NIST AI Risk Management Framework (RMF) represent crucial initial steps toward responsible AI development, their current form relies heavily on qualitative assessments and human interpretation – essentially, checklists for auditors. This approach presents a significant bottleneck as AI systems become more complex and deployed at scale. The RMF, and similar guidelines, lack the necessary granularity and formalization to be directly implemented by machines, hindering automated monitoring and enforcement of safety protocols. Truly scalable oversight requires a shift towards machine-readable specifications – defining acceptable behaviors and constraints in a format that AI systems themselves can understand and adhere to, allowing for continuous, automated validation and proactive risk mitigation beyond manual review.

The rise of agentic AI systems – those capable of independent action and goal pursuit – necessitates a fundamental shift in how artificial intelligence is governed. Traditional compliance-based approaches, focused on auditing outputs after deployment, are proving inadequate for these dynamic systems. Effective oversight now demands proactive governance frameworks built on continuous data analysis and real-time monitoring of AI behavior. This means moving beyond static checklists and towards machine-readable specifications that can automatically assess risks and ensure alignment with intended objectives. Such data-driven systems allow for the identification of emergent behaviors and potential harms before they manifest, enabling a more agile and preventative approach to AI safety and responsible innovation. Ultimately, a proactive stance is critical to fostering trust and unlocking the full potential of agentic AI.

The Agentic Automation Canvas: A (Hopefully) Standardized Approach

The Agentic Automation Canvas (AAC) is a standardized framework intended to facilitate the complete lifecycle of agentic automation projects, encompassing initial design through ongoing governance and comprehensive documentation. It offers a visual and modular format for stakeholders to collaboratively define project scope, identify key resources, and establish clear operational parameters. By providing a consistent structure, the AAC aims to reduce ambiguity, streamline development, and ensure alignment between technical implementation and business objectives. This structured approach is applicable across diverse use cases and organizational structures, promoting repeatability and scalability of agentic automation initiatives.

The Agentic Automation Canvas (AAC) builds upon the established Business Model Canvas framework by incorporating explicit considerations for governance and data security. While the Business Model Canvas focuses on value proposition, customer segments, and revenue streams, the AAC adds dedicated blocks to define operational governance policies, risk assessment protocols, and data handling procedures. This extension is critical for agentic systems, which operate with a degree of autonomy, necessitating clearly defined oversight and accountability mechanisms. Specifically, the AAC requires documentation of data lineage, access controls, and compliance with relevant data privacy regulations, ensuring responsible and secure automation practices are integrated from the outset of project design.

The Agentic Automation Canvas (AAC) prioritizes the explicit definition of User Expectations through the incorporation of quantifiable benefit metrics. This process moves beyond simply stating desired outcomes to establishing measurable key performance indicators (KPIs) that demonstrate value realization. These metrics, defined during the initial stages of project design, serve as the baseline for evaluating automation success and ensuring alignment between technical implementation and user-perceived benefits. Specifically, the AAC framework encourages the specification of metrics relating to efficiency gains, cost reduction, error rate improvements, or increased revenue – all directly tied to user needs and expectations. Regular monitoring of these KPIs throughout the automation lifecycle provides data-driven insights for optimization and validates the achieved value proposition.

Developer Feasibility within the Agentic Automation Canvas (AAC) necessitates a pragmatic assessment of technical limitations and resource availability prior to project commencement. This includes evaluating the existing infrastructure’s capacity to support the proposed automation, identifying required integrations with legacy systems, and quantifying the development effort-including coding, testing, and deployment-required for each agentic function. Ignoring these realities can lead to project delays, cost overruns, and ultimately, a failure to realize the anticipated benefits. The AAC framework emphasizes documentation of these technical constraints alongside the defined user expectations and governance policies, allowing for informed decision-making and iterative refinement of the automation design.

RO-Crate and Provenance: Tracing the Ghosts

The Arctic Data Center (AAC) utilizes RO-Crate, a JSON-LD based packaging format, to aggregate research data and associated metadata into a single, self-describing unit. This approach facilitates the reproducibility of research by ensuring all necessary components – datasets, code, workflows, and documentation – are bundled together and versioned. RO-Crate’s reliance on established metadata standards and its machine-readable structure enables automated validation of data integrity and simplifies data discovery and reuse. The use of persistent identifiers (PIDs) within RO-Crate further enhances transparency and allows for unambiguous referencing of research outputs, contributing to a robust audit trail and improved data provenance.

The Architecture for Analytics and Collaboration (AAC) utilizes standardized vocabularies, specifically Schema.org and W3C DCAT, to address data consistency and system integration challenges. Schema.org provides a common vocabulary for structuring data on the web, enabling semantic interoperability across diverse datasets. W3C DCAT (Data Catalog Vocabulary) focuses on describing and cataloging datasets, facilitating discovery and access. By adopting these established vocabularies, the AAC ensures that metadata is consistently defined and machine-readable, simplifying data integration with existing data repositories, catalogs, and analytical tools while promoting wider data sharing and reuse.

The AAC utilizes the Provenance Ontology (PROV-O) to document the lifecycle of governance actions performed on data assets. This ontology models activities, agents, and entities involved in data management, capturing relationships such as derivation, usage, and communication. Specifically, PROV-O within the AAC records details of authorization decisions, data access requests, policy enforcement, and any modifications made to data or metadata. By formally representing these governance activities and their interdependencies, a comprehensive and verifiable audit trail is established, enabling traceability and accountability for data handling practices. This detailed provenance information supports data quality assessment, compliance reporting, and the investigation of any data-related incidents.

The Data Use Ontology within the Architecture for Accessing and Citing (AAC) establishes a formal framework for defining and managing access restrictions and sensitivity levels associated with research data. This ontology utilizes standardized terms to categorize data based on applicable ethical guidelines, legal regulations – including GDPR and HIPAA where relevant – and funder requirements. By explicitly linking data assets to specific usage limitations, the AAC enables automated enforcement of access controls and facilitates compliance auditing. The ontology covers aspects such as data subject consent, data protection agreements, and permissible data usage scenarios, providing a machine-readable representation of data governance policies.

Measuring the Inevitable: Outcome Metrics and the Funding Dance

The Agentic Automation Consortium (AAC) places significant emphasis on quantifying project achievements through rigorously tracked Outcome Metrics. These metrics extend beyond simple task completion to encompass tangible deliverables, peer-reviewed publications disseminating research findings, and comprehensive evaluation results demonstrating real-world impact. By prioritizing these measurable outcomes, the AAC establishes a clear framework for assessing project success, moving beyond anecdotal evidence to provide data-driven insights into the value and effectiveness of agentic automation initiatives. This commitment to quantifiable results allows for informed decision-making, resource allocation, and continuous improvement across all consortium projects, ultimately fostering a culture of accountability and demonstrable progress.

The core of demonstrating the impact of agentic automation lies in a robust Benefit Quantification Model. This framework moves beyond simple cost savings to comprehensively assess value across four key dimensions: time, quality, risk, and enablement. By meticulously detailing how automated agents reduce task completion times, enhance output accuracy, mitigate potential errors and associated risks, and ultimately empower human workers with new capabilities, the model provides a clear and quantifiable value proposition. This approach allows stakeholders to understand not just what agentic automation achieves, but how it delivers tangible benefits, facilitating informed decision-making and justifying continued investment in these advanced technologies. The resulting data offers a compelling narrative for showcasing the return on investment and broader strategic impact of agentic systems.

Successful agentic automation initiatives necessitate a deliberate connection between project objectives and the stipulations of funding sources. Alignment with a defined Funding Ontology – a formal representation of funding priorities and reporting demands – ensures that projects not only achieve technical milestones but also demonstrably address the concerns of stakeholders providing financial support. This structured approach facilitates transparent reporting, simplifies the evaluation process, and maximizes the potential for continued investment. By explicitly mapping project deliverables to funding requirements, organizations can proactively demonstrate value, avoid misalignment, and ultimately secure resources for sustained innovation and growth in the field of agentic technologies.

The Agentic Automation Capability (AAC) relies on a dedicated web application, currently at version 0.12.2, to facilitate the capture and management of critical data related to project outcomes and funding alignment. Built using Vue.js 3.5.27 and TypeScript 5.9.3 with the Vite 7.3.1 build tool, the application offers an interactive interface designed to streamline the governance process. This technological foundation allows for efficient tracking of deliverables, publications, and evaluation results-the core Outcome Metrics used to assess project success. By centralizing this information, the web application not only supports robust reporting against funding requirements, but also enables a clear articulation of the value proposition derived from agentic automation, encompassing improvements in time, quality, risk mitigation, and overall enablement.

The pursuit of structured frameworks, like the Agentic Automation Canvas, often feels like building sandcastles against the tide. One anticipates inevitable entropy. As Alan Turing observed, “We can only see a short distance ahead, but we can see plenty there that needs to be done.” The canvas attempts to impose order on the chaotic emergence of agentic AI, a commendable effort. However, experience suggests that production environments will relentlessly expose the limitations of even the most meticulously designed systems. The focus on FAIR Data and RO-Crate interoperability is sound, yet the true test lies in how gracefully the framework accommodates the inevitable quirks and unforeseen consequences that always surface when theory meets reality. It’s a temporary reprieve, a useful scaffolding-but the suffering will continue, elegantly prolonged, perhaps.

What’s Next?

The Agentic Automation Canvas, as presented, offers a structured approach to a field currently characterized by enthusiastic improvisation. It attempts to impose order on what will inevitably become a complex tangle of interacting agents, data pipelines, and emergent behaviors. The reliance on FAIR Data principles and RO-Crate is… laudable, if history is any guide. It’s a robust foundation, certainly, until someone needs to scale it beyond the curated example datasets. Then the real fun begins.

The emphasis on governance and quantifiable benefits is a telling sign. It suggests even the architects of these systems anticipate the need to justify their existence when production realities diverge from initial projections. The question isn’t whether things will go wrong, but how quickly the inevitable failures will necessitate a redesign of the Canvas itself. One suspects the need for “version 2.0” is already being tacitly acknowledged.

Ultimately, this framework-like all frameworks-will become a layer of abstraction over the underlying mess. It will document how things were intended to work, rather than how they actually do. The pursuit of interoperability, while noble, will likely result in a proliferation of compatibility layers and brittle integrations. Everything new is just the old thing with worse docs.

Original article: https://arxiv.org/pdf/2602.15090.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Mess: Governing the Ghosts in the Machine

The Agentic Automation Canvas: A (Hopefully) Standardized Approach

RO-Crate and Provenance: Tracing the Ghosts

Measuring the Inevitable: Outcome Metrics and the Funding Dance

What’s Next?

See also: