Decoding Process Failures with Intelligent Rules

Author: Denis Avetisyan

A new approach to detecting anomalies in chemical processes uses symbolic machine learning to build understandable models, offering a compelling alternative to ‘black box’ neural networks.

The ethylene oxide process is represented as an interconnected system of components, each contributing to-and ultimately defined by-the inevitable progression toward thermodynamic equilibrium.

This review details a methodology leveraging DisPLAS and Answer Set Programming for improved failure detection in ethylene oxidation processes, demonstrating enhanced interpretability and performance compared to traditional methods like HAZOP.

Despite advances in artificial intelligence, deploying machine learning for critical process control-where transparency and reliability are paramount-remains challenging due to the ‘black box’ nature of many algorithms. This paper, ‘Failure Detection in Chemical Processes using Symbolic Machine Learning: A Case Study on Ethylene Oxidation’, investigates a symbolic machine learning approach to predict failures in chemical processes, demonstrating improved interpretability and performance compared to conventional methods. By leveraging a state-of-the-art symbolic learner and a chemical process simulator, we show that compact, rule-based models can effectively detect failures in an ethylene oxidation process. Could this approach offer a pathway to more robust and explainable AI systems for enhancing safety and decision-making in the chemical industry?

The Limits of Static Understanding in Complex Systems

Conventional process hazard analyses, such as the Hazard and Operability (HAZOP) study, are fundamentally dependent on the experience and cognitive abilities of the involved team members. While valuable, this reliance introduces limitations when assessing increasingly complex industrial processes. Subtle failure modes, those arising from intricate interactions between numerous variables or unforeseen operational states, can easily be overlooked during brainstorming sessions. The human mind, despite its power, struggles to comprehensively map all potential deviations in systems characterized by non-linear behavior and tight couplings. Consequently, even well-executed HAZOP studies may not identify all credible accident scenarios, leaving a residual risk that demands supplementary, data-driven safety measures to ensure robust and reliable process operation.

The ethylene oxidation process, a cornerstone of modern chemical manufacturing responsible for producing essential building blocks for plastics and other materials, presents a unique and escalating challenge to process safety. Though extensively researched and rigorously controlled for decades, the inherent complexity of the reaction – involving highly reactive ethylene, oxygen, and a silver catalyst – coupled with demands for increased production efficiency, necessitates increasingly sophisticated safety methodologies. Traditional hazard analyses, while valuable, struggle to fully account for the subtle interactions and potential failure modes arising from operating at larger scales and tighter margins. Consequently, maintaining safe and reliable operation requires a move beyond static assessments towards dynamic, data-driven approaches capable of anticipating and mitigating unforeseen events within this critical process.

While process simulations like those within AVEVA Process Simulation offer invaluable insights into potential operational scenarios, their predictive power is fundamentally limited by their reliance on pre-programmed models and established parameters. These simulations excel at evaluating known failure modes and optimizing performance under anticipated conditions, but struggle to adapt to unforeseen events or subtle deviations from normal operation. The inherent rigidity of static and dynamic models prevents them from learning from real-time data streams or incorporating the nuances of complex, evolving processes; consequently, they may fail to identify emergent risks or accurately predict system behavior during truly novel situations. This limitation underscores the need for more intelligent, adaptive systems capable of continuous learning and real-time risk assessment to complement traditional simulation techniques and enhance overall process safety.

DisPLAS: A Symbolic Framework for Intelligent Process Control

DisPLAS utilizes Answer Set Programming (ASP) as its foundational machine learning technique, representing process knowledge through logical rules and constraints. This allows the system to model complex relationships within industrial processes and perform reasoning about potential failure scenarios. ASP enables DisPLAS to not merely identify anomalies, but to infer the causes of those anomalies based on the encoded process knowledge. The system formulates process monitoring as a search for stable models – consistent interpretations of the ASP rules given current process data. These stable models represent potential failure states and their underlying causes, providing a symbolic, explainable approach to fault diagnosis compared to traditional statistical methods. The declarative nature of ASP facilitates knowledge acquisition and modification, allowing domain experts to readily incorporate and refine the process model without requiring extensive programming expertise.

DisPLAS builds upon the FastLAS learning system to achieve enhanced process control through a dual-input learning methodology. This involves utilizing both pre-existing historical process data for initial knowledge acquisition and integrating real-time observations as the process operates. The system’s ability to learn incrementally from live data allows it to adapt to changing process conditions and identify deviations from expected behavior more rapidly than systems relying solely on static datasets. This combined approach results in improved accuracy in failure prediction and a faster response time to potential process upsets, contributing to more robust and efficient control.

DisPLAS utilizes Weighted Context Dependent Partial Interpretations (WCDPI) to address the challenges of data imperfections inherent in industrial process control. WCDPI assigns weights to different interpretations of process data, reflecting the confidence in their accuracy given the available evidence and contextual information. This allows the system to continue reasoning and making inferences even when data is missing, inconsistent, or contains errors. The weighting scheme prioritizes interpretations supported by multiple data points or those consistent with known process constraints, effectively filtering out noise and mitigating the impact of incomplete data on failure prediction and control decisions. By quantifying uncertainty and focusing on the most plausible interpretations, WCDPI enhances the robustness and reliability of DisPLAS in real-world applications where data quality is often less than ideal.

Using default learning parameters, our approach demonstrates superior performance, as evidenced by its receiver operating characteristic (ROC) curve exceeding those of the baseline methods.

Learning Failure Modes: Targeted Tasks for Dynamic Understanding

The DisPLAS framework employs targeted learning tasks to analyze process behavior, specifically utilizing Ts_static and Ts_dynamic. Ts_static focuses on establishing the causal relationships between process variables, allowing for a deterministic understanding of how changes in one variable affect others. Conversely, Ts_dynamic is designed for probabilistic failure detection, moving beyond simple causal links to assess the likelihood of failures based on observed process states. This dual approach enables both a foundational understanding of process mechanics and a proactive identification of potential anomalies, improving overall system reliability and performance.

Evaluations demonstrate that the Ts_dynamic learning task exhibits unexpectedly strong performance in failure rule acquisition, consistently surpassing the capabilities of several conventional machine learning algorithms. Specifically, comparisons to Support Vector Machines, Multilayer Perceptrons, Random Forests, Histogram-based Gradient Boosting, and Adaptive Boosting reveal that Ts_dynamic achieves comparable or superior performance, as quantified by the Area Under the Curve (AUC). This indicates its effectiveness in distinguishing between normal and anomalous process states, exceeding the predictive power of these established machine learning techniques in the context of failure detection.

Receiver Operating Characteristic (ROC) curve analysis serves as a key validation method for assessing the discriminatory power of failure detection rules learned through the DisPLAS system. This analysis graphically plots the True Positive Rate against the False Positive Rate at various threshold settings, providing a comprehensive view of the model’s ability to correctly classify both normal and abnormal process states. The Area Under the Curve (AUC) derived from the ROC analysis quantifies this performance; a higher AUC indicates superior discrimination between normal and abnormal conditions. Specifically, a ROC analysis demonstrates the learned rules’ capability to minimize misclassification errors and reliably identify deviations from expected process behavior, confirming their effectiveness as a failure detection mechanism.

Towards Adaptive Industries: The Promise of Intelligent Systems

The integration of Digital Process Automation Systems (DisPLAS) within an artificial intelligence-in-the-loop framework represents a paradigm shift in process industry management. This synergistic approach moves beyond traditional, reactive control strategies by establishing a continuous monitoring system that leverages real-time data streams. The AI component doesn’t simply respond to anomalies; it proactively learns from process behavior, enabling adaptive control that optimizes performance under varying conditions. Critically, this loop facilitates predictive maintenance and failure prevention; the AI identifies subtle deviations indicative of potential issues before they escalate, minimizing downtime and bolstering operational safety. By continuously refining its understanding of the process, the AI ensures a resilient and self-optimizing system, fundamentally altering how industries approach efficiency and reliability.

The integration of artificial intelligence into process industries isn’t simply about automation; it represents a shift towards the core tenets of Industry 5.0, where human expertise and intelligent systems work in concert. This collaborative paradigm prioritizes the augmentation of human capabilities, allowing operators to focus on complex problem-solving and innovation while AI manages routine tasks and identifies potential issues. Furthermore, this human-centered approach inherently promotes sustainability; optimized processes, reduced waste, and proactive maintenance – all facilitated by AI – contribute to more resource-efficient and environmentally responsible industrial practices. The result is not merely smarter factories, but facilities designed to operate in harmony with both people and the planet, fostering a resilient and ethically grounded industrial future.

Process industries are undergoing a significant transformation, shifting from reliance on static simulations – essentially, snapshots of potential scenarios – towards systems capable of dynamic learning. This evolution allows facilities to move beyond predicting outcomes based on pre-programmed conditions and instead adapt in real-time to unforeseen events and changing operational parameters. By continuously analyzing data streams and refining their understanding of complex processes, these industries can proactively identify and mitigate risks, optimize resource allocation, and enhance overall system resilience. This adaptive capability isn’t merely about improving efficiency; it represents a fundamental leap in safety protocols and enables a level of operational agility previously unattainable, ultimately paving the way for more sustainable and robust industrial practices.

The pursuit of robust failure detection, as demonstrated in the study of ethylene oxidation, echoes a fundamental principle of system longevity. The authors’ exploration of symbolic machine learning, particularly DisPLAS, to build interpretable models, suggests an attempt to understand the ‘memory’ inherent in complex processes. This aligns with the observation that any simplification – a necessary step in model building – carries a future cost, potentially obscuring critical failure modes. As Carl Friedrich Gauss noted, “If other sciences are to be advanced, the study of mathematics must be advanced.” This highlights the need for rigorous foundational methods-like those employed in DisPLAS-to ensure that the models created aren’t simply black boxes, but rather, transparent representations of the underlying system dynamics, capable of graceful aging and adaptation as conditions evolve.

What Lies Ahead?

The pursuit of failure detection, as exemplified by this work, is less about achieving a static state of ‘safe’ and more about charting the inevitable decay of any complex system. Every identified anomaly, every flagged deviation, is not a problem solved, but a moment of truth in the timeline of the ethylene oxidation process. The methodology presented – leveraging symbolic machine learning – offers a step towards models that age gracefully, providing interpretable insights even as the underlying process drifts from its initial parameters. However, the true test lies not in immediate performance gains, but in sustained diagnostic capability over extended operational lifespans.

The current emphasis on model accuracy risks obscuring a fundamental truth: all models are approximations, and all approximations degrade with time. Future work should prioritize methods for quantifying and mitigating this decay, perhaps through continuous model refinement informed by historical data and expert knowledge. Consideration must also be given to the inherent limitations of relying solely on process data; external factors, unforeseen interactions, and the unpredictable nature of catalysis will always introduce noise and uncertainty.

Ultimately, the challenge is not to eliminate failure – an impossible task – but to anticipate it. Technical debt, in this context, is the past’s mortgage paid by the present. The long-term viability of any safety-critical system depends on a willingness to acknowledge this debt and invest in models that can not only detect failures, but also forecast their emergence, allowing for proactive intervention before catastrophe strikes.

Original article: https://arxiv.org/pdf/2603.06767.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Limits of Static Understanding in Complex Systems

DisPLAS: A Symbolic Framework for Intelligent Process Control

Learning Failure Modes: Targeted Tasks for Dynamic Understanding

Towards Adaptive Industries: The Promise of Intelligent Systems

What Lies Ahead?

See also: