AI’s Safety Net: Using Artificial Intelligence to Validate Artificial Intelligence

Author: Denis Avetisyan

As AI systems become increasingly complex and permeate safety-critical applications, researchers are exploring innovative ways to ensure their reliability and prevent potentially catastrophic failures.

A system translates abstract natural language requirements into concrete monitoring practices, effectively closing the loop between intention and observation.

This review proposes a framework leveraging large language and vision-language models to bridge the semantic gap between natural language requirements and deep neural network implementations for improved AI system assurance.

Assuring the safety of increasingly complex AI-enabled systems presents a paradox: traditional verification methods struggle with the opacity and semantic gap inherent in deep neural networks. This challenge is addressed in ‘Fighting AI with AI: Leveraging Foundation Models for Assuring AI-Enabled Safety-Critical Systems’ by introducing a novel framework, comprising REACT and SemaLens, that leverages large language and vision-language models to bridge the gap between natural language requirements and validated implementations. This approach facilitates early verification and semantic analysis of AI perception systems, offering a pathway toward more reliable and trustworthy AI. Could this paradigm of ‘fighting AI with AI’ become essential for deploying safe and robust autonomous systems across critical domains?

The Illusion of Precision: Why Natural Language Fails Machines

Traditional requirements engineering frequently depends on documenting system needs using everyday language, a practice that introduces inherent difficulties. While seemingly intuitive, natural language is riddled with ambiguity; words and phrases often possess multiple interpretations, leading to inconsistencies in understanding between stakeholders-developers, clients, and end-users alike. This reliance on imprecise communication creates a significant challenge, as subtle variations in phrasing can drastically alter the intended functionality of a system. The very flexibility that makes natural language so useful for everyday conversation becomes a liability when precise, unambiguous instructions are crucial for successful system design and implementation, potentially resulting in costly rework or, in critical applications, dangerous failures.

The inherent imprecision within natural language requirements poses significant risks, escalating from simple project delays to potentially catastrophic failures, especially within safety-critical systems. Ambiguous phrasing, such as vague quantifiers like “often” or “several,” or terms open to multiple interpretations, can result in developers implementing functionalities differing from the original intent. This discrepancy isn’t merely an inconvenience; in applications governing medical devices, aviation software, or nuclear plant controls, such misinterpretations can directly compromise safety and lead to substantial financial losses due to rework, recalls, or even legal liabilities. The cost of rectifying these errors late in the development lifecycle dwarfs the initial investment in more rigorous requirements elicitation and formalization techniques, highlighting the critical need for proactive ambiguity resolution.

The translation of natural language requirements into formal specifications remains a significant challenge in requirements engineering. Current methodologies often rely on manual interpretation, which is susceptible to subjective biases and inconsistencies, or automated tools that struggle with the nuances of human language – including context, implicit assumptions, and ambiguous phrasing. This disconnect creates a “semantic gap” where the intended meaning of a requirement can be lost or altered during the formalization process. Consequently, verification and validation efforts may focus on an incorrect interpretation, leading to systems that do not meet stakeholder needs or, in critical applications, introduce potentially hazardous flaws. Bridging this gap necessitates advancements in natural language processing, knowledge representation, and formal methods to ensure a faithful and unambiguous translation from human-readable descriptions to machine-verifiable specifications.

REACT: Forcing Language to Conform

The REACT framework utilizes Large Language Models (LLMs) to convert natural language requirements into formal specifications, expressed in a logic-based language suitable for automated reasoning. This translation process involves parsing the natural language input, identifying key entities, relationships, and constraints, and then mapping these elements to a formal representation, typically using first-order logic or a similar formalism. The LLM is trained on a corpus of both natural language requirements and their corresponding formal specifications to learn this mapping. The resulting formal specification is unambiguous and machine-readable, facilitating automated verification and validation activities, and eliminating the inherent ambiguity present in natural language descriptions.

The conversion of natural language requirements into formal specifications is historically a labor-intensive task performed by skilled analysts, susceptible to inconsistencies and ambiguities stemming from subjective interpretation. REACT addresses this challenge by utilizing Large Language Models (LLMs) to automate this translation process. This automation minimizes human error and ensures a more consistent application of requirements, thereby substantially decreasing the risk of misinterpretation during system development. The LLM-driven approach not only accelerates the formalization timeline but also provides a degree of verifiability absent in purely manual processes, as the LLM’s output can be reviewed and validated against the original natural language input.

The automated translation of natural language requirements into formal specifications, achieved via the REACT framework, directly facilitates test case generation by providing a structured, machine-readable input. This allows for the programmatic creation of test scenarios designed to verify adherence to the specified behavior. The generated test cases cover a broader range of potential inputs and edge cases than traditional manual methods, improving the comprehensiveness of verification. Specifically, the formal specifications define precise preconditions, actions, and expected postconditions, which are directly translated into executable test assertions. This automated pipeline reduces the time and resources required for testing while increasing confidence in system correctness and reliability.

The core functionality of the REACT framework relies on establishing a direct and auditable connection between initial Natural Language Requirements and the resulting executable test cases. This traceability is achieved through an intermediate layer of formal specification generated by the Large Language Model. Each element within the formal specification is explicitly linked back to its originating requirement, and subsequently, each generated test case is linked to the specific formal specification element it verifies. This end-to-end linkage ensures that any identified issue during testing can be directly correlated to the original requirement, facilitating efficient debugging and validation of the system’s adherence to stated needs. The ability to follow this chain of evidence is critical for regulatory compliance and high-assurance systems.

This integrated framework combines REACT and SemaLens to facilitate comprehensive analysis and interaction.

SemaLens: Seeing What the Machine Sees

Deep Neural Networks (DNNs) present substantial verification difficulties in perception systems due to their non-deterministic nature and high dimensionality. Traditional software verification techniques, designed for deterministic, state-based systems, are inadequate for DNNs where outputs are continuous and influenced by numerous weighted parameters. The complex, layered architecture of DNNs creates a vast input space, making exhaustive testing impractical. Furthermore, the “black box” characteristic of these networks-where internal decision-making processes are opaque-hinders the identification of the root cause of failures. This complexity is compounded when DNNs are deployed in safety-critical applications, demanding rigorous assurance of reliability and predictable behavior despite variations in input data and environmental conditions.

SemaLens employs Vision-Language Models (VLMs) as a primary mechanism for verifying Deep Neural Networks (DNNs) used in perception systems. These VLMs are utilized to generate semantic descriptions of the DNN’s inputs and outputs, enabling a comparison against formally specified requirements. This analysis goes beyond simple accuracy metrics; the VLM assesses whether the DNN’s behavior aligns with the intended meaning of the requirements, effectively translating complex perception tasks into a language understandable by both humans and machines. By leveraging the reasoning capabilities of VLMs, SemaLens can identify discrepancies between the DNN’s output and the expected semantic behavior, providing a robust method for functional verification and validation.

SemaLens employs a combined Semantic Analysis and Runtime Monitoring approach to verify AI perception systems. During operation, the system generates embeddings using Vision-Language Models (VLMs) to represent perceived scenes and associated requirements. These embeddings are then compared, and anomalies or requirement violations are flagged when the similarity score between the scene embedding and the requirement embedding falls below a threshold of 0.4. This quantitative metric allows for automated detection of discrepancies between the system’s perception and its defined operational constraints, providing a means for continuous verification and ensuring reliable performance.

SemaLens incorporates spatial-temporal reasoning to evaluate AI perception systems by analyzing sequences of observations and their relationships over time and space. This is achieved by tracking objects and their interactions within a defined environment, allowing the system to understand not just what an object is, but also where it is and how it moves relative to other elements. The tool constructs a representation of the scene’s evolution, enabling verification of behaviors that depend on dynamic context, such as predicting trajectories or responding to changing conditions. This holistic approach allows SemaLens to detect failures that might not be apparent from static image analysis alone, ensuring reliable performance in real-world, evolving scenarios.

The Price of Assurance: Standards as Guardrails

Verification and Validation (V&V) represent the cornerstone of dependable systems, becoming critically important when those systems operate in safety-critical contexts such as aviation, healthcare, or autonomous vehicles. V&V isn’t simply about testing; it’s a rigorous process ensuring that a system not only functions as designed-validation-but also meets specified requirements and intended uses-verification. This distinction is crucial because a system can flawlessly execute its programmed instructions while still failing to address real-world needs or creating hazardous scenarios. Thorough V&V employs diverse techniques – from code reviews and static analysis to dynamic testing and formal methods – to uncover defects, inconsistencies, and vulnerabilities before deployment. Consequently, a robust V&V strategy drastically reduces the potential for catastrophic failures, protects human life, and fosters public confidence in complex technological innovations.

Rigorous adherence to established industry standards, such as DO-178C – ‘Software Considerations in Airborne Systems and Equipment Certification’ – is not merely a procedural requirement, but a fundamental pillar of ensuring system safety and demonstrating regulatory compliance. This standard provides a structured framework for the development of safety-critical software, demanding meticulous documentation, comprehensive testing, and traceable verification throughout the entire software lifecycle. By systematically addressing potential hazards and validating software behavior against defined safety requirements, DO-178C facilitates a demonstrable audit trail, crucial for certification by aviation authorities and builds confidence in the reliability of complex systems. The standard’s emphasis on objective evidence and rigorous processes minimizes the risk of software-related failures, protecting lives and assets while fostering public trust in technologically advanced applications.

The complex process of verifying and validating safety-critical AI systems benefits significantly from specialized tools designed to establish a clear chain of evidence. Platforms like REACT and SemaLens offer capabilities that move beyond simple testing, instead focusing on capturing and maintaining a traceable record of each development stage. These tools automate the documentation of requirements, design choices, and test results, linking them directly to the original specifications. This creates an audit trail that demonstrates adherence to stringent industry standards, such as DO-178C, and allows developers to pinpoint the source of any potential issues with greater efficiency. By providing a comprehensive and verifiable history of the system’s evolution, REACT and SemaLens are instrumental in building confidence in the reliability and safety of AI-powered applications.

The increasing complexity of AI systems necessitates a shift towards robust verification techniques, and automation coupled with formal methods offers a powerful pathway to minimize the potential for catastrophic failures. Traditional testing, while valuable, struggles to cover the vast state space inherent in modern AI; formal methods, employing mathematical rigor, allow developers to prove system properties rather than simply demonstrate them through testing. Automated tools further amplify this capability by streamlining the formal verification process, identifying potential flaws early in the development lifecycle, and providing traceable evidence of compliance. This proactive approach doesn’t merely detect errors; it fundamentally builds confidence in the reliability and safety of AI, fostering trust among users and regulators alike, and paving the way for wider adoption in critical domains like aerospace and autonomous vehicles.

Towards Resilient and Trustworthy AI

The inherent vagueness of natural language poses a significant challenge to the reliable operation of artificial intelligence systems. To address this, researchers are exploring the use of restricted English – a carefully controlled subset of the language with limited vocabulary and grammatical structures – as a means of specifying requirements for AI. This approach minimizes ambiguity, ensuring that AI systems interpret instructions precisely as intended. By pairing restricted English with automated tools like REACT, which focuses on reasoning and acting, a powerful synergy emerges. The structured nature of restricted English complements REACT’s capabilities, enabling more robust error detection and ultimately bolstering the trustworthiness of AI-driven applications. This combination offers a pathway towards creating AI systems that are not only intelligent but also predictable and safe, as clear requirements form the foundation of reliable performance.

SemaLens, a tool designed to enhance the safety of AI systems, increasingly relies on the capabilities of advanced Vision-Language Models like CLIP to refine its analytical process. These models excel at establishing connections between visual inputs and natural language descriptions, allowing SemaLens to more accurately interpret the intended behavior of AI systems from image-based requirements. This improved understanding directly translates to a more efficient and precise analysis, reducing the potential for misinterpretations and ultimately bolstering the reliability of the AI under evaluation. As Vision-Language Models continue to evolve, SemaLens is poised to benefit from even greater accuracy and speed in identifying potential errors and vulnerabilities within complex AI-driven applications.

The presented research details a novel framework designed to bolster the safety and reliability of AI systems by integrating two key AI-powered components: REACT and SemaLens. This system moves beyond traditional error handling by proactively addressing potential issues through three core strategies: early error detection, which identifies flaws during the development phase; formal verification, a rigorous process ensuring the system adheres to specified requirements; and runtime monitoring, which continuously assesses performance during operation. While the framework demonstrates a promising architecture for comprehensive AI safety, current work focuses on establishing the baseline functionality and integration of these components; precise quantitative measurements of improvement in reliability and error reduction are slated for future investigation, building upon this foundational implementation.

The sustained advancement of artificial intelligence hinges significantly on bolstering formal methods and automated verification techniques. While AI systems demonstrate increasing capabilities, ensuring their reliability, safety, and trustworthiness demands rigorous mathematical proofs of correctness and automated tools to validate complex behaviors. Current approaches often rely on empirical testing, which can be insufficient to uncover subtle errors or guarantee performance in all scenarios. Investment in formal methods – encompassing techniques like model checking, theorem proving, and abstract interpretation – promises a shift towards provably correct AI systems. Automated verification, leveraging these methods, streamlines the process, enabling developers to identify and rectify flaws early in the development lifecycle and ultimately unlocking the full potential of increasingly sophisticated AI applications across critical domains.

The pursuit of safety in AI-enabled systems, as detailed in this research, isn’t about eliminating risk, but about deeply understanding the system’s vulnerabilities. It’s a process akin to reverse engineering, meticulously dissecting the translation from high-level requirements to the intricacies of deep neural network implementations. This aligns perfectly with Kernighan’s observation: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” The REACT and SemaLens framework embodies this principle; by actively probing the semantic gap-essentially, ‘debugging’ the translation between intention and execution-the system strives for robustness, acknowledging that perfect foresight is an illusion. The work demonstrates that a thorough understanding requires, at times, intentionally testing the limits, much like exploiting comprehension to reveal underlying truths.

What’s Next?

The pursuit of assuring AI-enabled systems via AI itself feels, at its core, a delightful paradox. This work, by attempting to formalize the translation between intention – expressed in natural language – and the opaque logic of deep neural networks, doesn’t solve the problem of trust. Rather, it shifts the locus of that trust. One begins to trust not the DNN directly, but the fidelity of the mediating AI-a second, equally inscrutable black box. The inevitable regression beckons: assurance of the assurance.

Future work isn’t simply about scaling these techniques – applying larger models, more data, broader requirement coverage. The more pressing challenge lies in understanding failure modes of the mediating AI. Where does REACT, or a similar framework, introduce its own systematic errors? What classes of requirement are consistently misinterpreted, and why? The real insight won’t come from generating more tests, but from deliberately breaking the system – probing its boundaries with adversarial requirements designed to expose its hidden assumptions.

Ultimately, this line of inquiry forces a reconsideration of ‘safety’ itself. Is it about eliminating all risk-an asymptotic goal forever beyond reach? Or is it about building systems that fail predictably, and in ways that minimize harm? Perhaps the most fruitful avenue isn’t to make AI safer, but to make its failures more legible-to reverse-engineer not just what it does, but why it does it, even when it goes wrong.

Original article: https://arxiv.org/pdf/2511.20627.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/