Predictive Safety Nets: AI Agents for High-Risk Environments

Author: Denis Avetisyan

A new multi-agent system leverages large language models to forecast hazards and enhance safety protocols in complex operational settings.

The HARNESS system architecture embodies an approach to infrastructure where components inevitably degrade, prompting a focus on managing that decay rather than preventing it-a testament to the transient nature of all complex systems.

This paper details HARNESS, a Retrieval-Augmented Generation system for proactive risk analysis, vulnerability assessment, and auditable safety reporting in high-risk Department of Energy facilities.

Despite increasing automation, ensuring operational safety in complex, high-risk environments remains a persistent challenge. This paper introduces HARNESS: Human-Agent Risk Navigation and Event Safety System for Proactive Hazard Forecasting in High-Risk DOE Environments, a novel AI framework designed to proactively identify and mitigate potential hazards. By integrating large language models with structured data and expert knowledge, HARNESS delivers auditable vulnerability reports and enhances predictive safety through iterative agentic reasoning. Can this human-agent collaborative approach fundamentally reshape risk management protocols in critical operational settings?

The Inevitable and the Anticipated: Forecasting Hazard

Conventional hazard identification methods frequently depend on analyzing past incidents and established patterns, a strategy proving increasingly inadequate in rapidly evolving, high-risk settings. This reactive approach struggles to anticipate novel threats or the complex interplay of factors creating unforeseen dangers. Industries such as aerospace, deep-sea exploration, and advanced manufacturing, characterized by constant innovation and unpredictable conditions, face limitations with systems designed for static risk profiles. The reliance on historical data creates a feedback loop where known hazards are addressed, while emerging or previously unobserved risks remain undetected until an incident occurs, potentially leading to significant consequences and hindering proactive safety measures.

The shift towards proactive hazard forecasting represents a fundamental change in operational risk management, moving beyond simply reacting to incidents and instead focusing on predictive analysis. This approach leverages real-time data streams, advanced modeling techniques – including machine learning and predictive analytics – and comprehensive simulations to identify potential hazards before they escalate into costly downtime or, more critically, safety incidents. By anticipating vulnerabilities in equipment, processes, or environmental conditions, organizations can implement preventative measures, optimize maintenance schedules, and allocate resources strategically. This not only minimizes disruptions and extends asset lifecycles but also fosters a stronger safety culture, protecting personnel and the environment through preemptive risk mitigation and a commitment to consistently improving predictive capabilities.

Orchestrated Vigilance: The HARNESS Multi-Agent System

The HARNESS system utilizes a Multi-Agent System (MAS) architecture to automate and coordinate the processes of hazard identification and risk analysis. This approach decomposes the overall task into specialized agent roles, each responsible for specific functions such as data acquisition, hazard detection, risk assessment, and reporting. Agents operate autonomously but communicate and collaborate to achieve a unified understanding of potential risks. This distributed architecture allows for parallel processing, improved scalability, and increased resilience compared to traditional, monolithic systems. The MAS framework facilitates the integration of diverse data sources and knowledge bases, enabling a more comprehensive and proactive risk management capability.

Retrieval Augmented Generation (RAG) is a core component of the HARNESS system, functioning to mitigate the limitations of Large Language Models (LLMs) regarding access to current and specific data. Instead of relying solely on the knowledge embedded within the LLM’s parameters during training, RAG dynamically retrieves relevant information from an external Vector Database during query processing. This database stores data as vector embeddings, enabling semantic similarity searches to identify contextual information pertinent to the user’s request. The retrieved information is then incorporated as context when prompting the LLM, effectively augmenting its knowledge base and improving the accuracy, relevance, and reliability of generated responses, particularly in specialized domains or when dealing with evolving data sets.

The HARNESS system utilizes both GPT-4o and Perplexity AI to address the challenges of natural language processing within hazard and risk analysis. GPT-4o functions as the primary engine for semantic interpretation of textual data and the subsequent generation of reports and insights. To improve comprehension of nuanced or technically dense language, the system incorporates Perplexity, which serves as a supplementary tool for disambiguation and enhanced understanding of complex queries. This dual-model approach allows HARNESS to more accurately extract relevant information and produce reliable outputs, even when presented with ambiguous or specialized terminology.

The HARNESS system is designed for compatibility with established Standards-Based Management Systems (SBMS), facilitating data exchange and operational integration. This integration is achieved through adherence to industry-standard data formats and APIs, allowing HARNESS to ingest data from, and output insights to, existing SBMS platforms without requiring extensive modification of current workflows. Consequently, organizations can leverage HARNESS’s proactive hazard identification and risk analysis capabilities while maintaining compliance with relevant regulations and streamlining existing processes for reporting, auditing, and risk mitigation. This avoids data silos and ensures a unified approach to safety and risk management.

Deconstructing Failure: Semantic Similarity and Predictive Accuracy

Failure Mode Analysis (FMA) within HARNESS is a systematic, bottom-up approach to identifying potential failures in a system or process. This involves analyzing each component to determine how it could fail, the resulting effects of that failure, and the severity of those effects. The process extends beyond simple fault identification to include a detailed evaluation of the failure’s impact on the overall system functionality and safety. By proactively identifying these potential issues, HARNESS enhances risk assessment by providing a comprehensive understanding of vulnerabilities, allowing for the implementation of preventative measures and mitigation strategies. This detailed analysis informs the development of more robust and reliable systems, reducing the likelihood of unexpected failures and improving overall system performance.

The system employs the Qwen3-Embedding-8B model to determine semantic similarity between documents, enabling more accurate information retrieval. This approach yielded a Retrieval-Augmented Generation Assessment (RAGAS) score of 75.3%, indicating a substantial level of answer correctness. Performance benchmarks recorded a query time of 2.0 seconds, demonstrating efficient processing and responsiveness during similarity comparisons. The embedding model converts textual data into numerical vectors, allowing for quantifiable comparison of document meaning beyond keyword matching.

Following initial document retrieval, HARNESS employs a Cross-Encoder to improve analytical precision. This component reranks the retrieved documents based on semantic relevance to the query, moving beyond simple keyword matching. The Cross-Encoder evaluates the relationship between the query and each document to determine a relevance score, effectively prioritizing documents that address the query’s meaning, even if they lack identical terminology. This refinement process ensures the subsequent analysis is informed by the most pertinent information, increasing the reliability and depth of the generated insights.

Report validation within HARNESS utilizes a Large Language Model (LLM) functioning as a judge to assess factual grounding. This process yielded a perfect score of 5.0, indicating a high degree of accuracy and reliability in the generated reports. The LLM-as-Judge methodology evaluates the reports against established knowledge sources to confirm the veracity of the information presented, ensuring that conclusions are supported by evidence and free from inaccuracies. This scoring system provides a quantitative measure of report quality, directly correlating to the system’s ability to generate trustworthy and dependable analyses.

From Foresight to Resilience: Operational Impact and Mitigation

The HARNESS system delivers a comprehensive Vulnerability Report, serving as a central document for proactive safety management. This report doesn’t merely list potential hazards; it synthesizes identified risks with clearly defined, actionable mitigation strategies. Each vulnerability is detailed with its potential impact, likelihood of occurrence, and a prioritized set of steps designed to reduce exposure. By consolidating these critical elements into a single, accessible resource, the system empowers stakeholders to move beyond simple identification towards effective risk resolution, ultimately strengthening system resilience and minimizing potential disruptions. The report facilitates informed decision-making, allowing for the efficient allocation of resources and a targeted approach to enhancing operational safety.

The culmination of vulnerability identification within the HARNESS system delivers a detailed report that serves as the cornerstone of effective Risk Analysis. This document doesn’t simply list potential hazards; it provides a structured and prioritized assessment, enabling stakeholders to move beyond reactive problem-solving. By consolidating findings into a clear, actionable format, the report facilitates informed decision-making regarding resource allocation and mitigation strategies. Consequently, organizations can proactively address the most critical vulnerabilities, minimizing potential disruptions and ensuring the continued safety and efficiency of operations. This shift from identifying risks to actively managing them represents a significant advancement in predictive safety protocols.

The HARNESS system represents a notable step forward in predictive safety analysis, achieved through its innovative multi-agent Retrieval-Augmented Generation (RAG) pipeline. Rigorous evaluation demonstrated an F1 score of 0.384 for event retrieval, a metric indicating the system’s ability to accurately identify relevant safety events from complex data. This performance signifies a substantial improvement over prior methods, allowing for more precise hazard prediction and proactive risk mitigation. The system’s architecture, leveraging multiple agents, facilitates a nuanced understanding of potential failures, contributing to a higher degree of reliability and ultimately, a safer operational environment.

The core benefit of the HARNESS system lies in its capacity to preemptively address weaknesses before they escalate into disruptive incidents. By continuously scanning for and resolving vulnerabilities, the system significantly curtails the likelihood of unexpected downtime, preserving operational continuity and productivity. This proactive stance doesn’t merely avoid reactive fixes, but actively bolsters overall operational safety by diminishing potential hazards for personnel and equipment. Consequently, organizations deploying HARNESS experience a demonstrable reduction in associated costs, encompassing not only repair expenses and lost productivity but also potential legal ramifications and reputational damage stemming from safety breaches or system failures.

The HARNESS system, detailed in this study, embodies a recognition that even the most robust predictive models are subject to temporal decay. Any improvement in hazard forecasting, however precise, ages faster than expected, necessitating continuous refinement and adaptation. This aligns with Ada Lovelace’s observation: “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.” HARNESS doesn’t eliminate risk, but rather augments human capability in navigating it, acknowledging that the system’s predictive power is directly linked to the quality and currency of the data-a continuous feedback loop mirroring the Engine’s dependence on instruction. The framework prioritizes auditable vulnerability reports, accepting that systems, like time itself, are directional-a journey forward requires careful documentation of the path traveled.

What Lies Ahead?

The architecture presented in this work, like all systems, will inevitably exhibit decay. HARNESS offers a snapshot of predictive capability, but the very act of forecasting introduces a temporal paradox; the predicted hazards shift even as mitigation strategies are implemented. The true metric isn’t accuracy, but the rate of obsolescence-how quickly the system’s understanding of risk diverges from the evolving reality of the environment. Improvements age faster than one can fully understand them.

Future iterations will likely focus on refining the retrieval mechanisms, attempting to bridge the gap between static knowledge bases and the dynamic, often tacit, understanding held by experienced operators. However, a fundamental challenge remains: translating qualitative risk assessments – the ‘feel’ for a dangerous situation – into quantifiable data for a large language model. The system’s capacity to learn from near-miss events, those subtle indicators of systemic vulnerabilities, will prove more valuable than any initial training dataset.

Ultimately, this line of inquiry highlights a broader truth about complex systems. Every architecture lives a life, and the pursuit of perfect predictive safety is an asymptotic one. The focus should shift from eliminating risk – an impossible task – to building resilience, creating systems that can gracefully adapt to the inevitable emergence of unforeseen hazards, and learning from the patterns of their own decline.

Original article: https://arxiv.org/pdf/2511.10810.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable and the Anticipated: Forecasting Hazard

Orchestrated Vigilance: The HARNESS Multi-Agent System

Deconstructing Failure: Semantic Similarity and Predictive Accuracy

From Foresight to Resilience: Operational Impact and Mitigation

What Lies Ahead?

See also: