Unlocking AI’s Hidden Skills

Author: Denis Avetisyan

New research details a method for systematically discovering and modeling what artificial intelligence systems can actually do, even when their inner workings are opaque.

Programmatically Constructed Machine Learning (PCML) successfully discovered salient capabilities within diverse benchmark environments-including Overcooked, Saycan, Blocksworld, and Minigrid-demonstrating its ability to extract meaningful behaviors across a spectrum of task complexities.

This paper introduces Probabilistic Capability Model Learning (PCML) to efficiently map the capabilities of black-box AI in uncertain environments, outperforming random exploration techniques.

Despite the increasing deployment of black-box AI (BBAI) systems for sequential decision-making, ensuring their safe and reliable operation requires interpretable representations of their capabilities. This paper, ‘Discovering and Learning Probabilistic Models of Black-Box AI Capabilities’, introduces a novel approach to systematically learn and model these capabilities in stochastic environments using PDDL-style representations and Monte-Carlo tree search. The resulting probabilistic models not only describe what a BBAI can do, but also when and with what probability, offering both soundness and completeness guarantees. Will this method pave the way for more transparent and trustworthy AI systems capable of robust performance in complex, uncertain scenarios?

The Challenge of Opacity in Artificial Intelligence

The increasing sophistication of artificial intelligence, particularly with the rise of ‘Black-Box AI Systems’ (BBAI), necessitates a critical focus on capability understanding. These systems, characterized by opaque internal workings, often achieve remarkable performance without revealing how they arrive at conclusions. While this can be advantageous, it simultaneously presents significant challenges; as AI increasingly influences critical domains – from healthcare diagnostics to autonomous vehicles – a comprehensive grasp of what these systems can and cannot do becomes paramount. The lack of transparency isn’t merely a matter of intellectual curiosity; it directly impacts safety, reliability, and the potential for unintended consequences, demanding rigorous investigation into the boundaries of BBAI competence.

The increasing sophistication of Black-Box AI systems presents a significant challenge to reliable capability assessment. Traditional evaluation methods, often relying on passive observation of outputs given known inputs, frequently fail to fully characterize the limits and potential of these complex models. This creates substantial risks when deploying BBAI in real-world applications, as unforeseen behaviors or vulnerabilities may remain undetected until after implementation. Consequently, a lack of transparency regarding what a BBAI can truly accomplish-and, crucially, what it cannot-erodes public and professional trust. Without robust and dependable assessment techniques, the potential benefits of advanced AI are tempered by legitimate concerns regarding safety, fairness, and accountability, hindering widespread adoption and responsible innovation.

Assessing artificial intelligence within realistic, unpredictable settings necessitates a shift from passive observation to deliberate investigation. Simply watching an AI operate provides limited insight into the full scope of its capabilities, particularly when dealing with probabilistic outcomes where chance plays a significant role. Active probing – systematically presenting the AI with carefully designed challenges and analyzing its responses – offers a more robust method for uncovering its strengths and weaknesses. This approach doesn’t merely document what an AI does, but seeks to understand why it makes certain decisions, revealing the underlying logic – or lack thereof – driving its behavior. By actively testing the boundaries of an AI’s competence, researchers can gain a far more nuanced and reliable picture of its true capabilities, crucial for safe and effective deployment in complex, real-world scenarios.

The increasing sophistication of artificial intelligence demands a shift from passive observation to proactive evaluation of its capabilities. Current assessment techniques often fall short when faced with the complexities of ‘black-box’ AI, particularly in unpredictable real-world scenarios. Consequently, researchers are developing novel methods centered around active probing – a process of systematically querying the AI with carefully designed inputs to reveal the boundaries of its knowledge and skill. This isn’t simply about determining if an AI provides the ‘correct’ answer, but rather a rigorous mapping of what the AI can reliably accomplish, when its performance might falter, and how it responds to unforeseen circumstances. Such an approach promises to move beyond opaque functionality and establish a foundation of trust and predictability in the deployment of advanced AI systems.

PCML: A Rigorous Mapping of AI Competence

Probabilistic Capability Model Learning (PCML) tackles the challenge of evaluating Broadly Beneficial AI (BBAI) systems by moving beyond passive observation and instead actively querying the agent within a defined ‘Stochastic Environment’. This environment introduces inherent randomness, necessitating a robust evaluation method that accounts for variability in the AI’s responses. PCML doesn’t simply assess what an AI has done, but systematically probes its potential by presenting it with a series of carefully chosen inputs. This active querying approach is crucial because relying solely on observed behavior can provide an incomplete or misleading picture of an AI’s true capabilities, particularly in complex or unpredictable scenarios. The stochastic environment simulates real-world uncertainty and forces the evaluation to consider a distribution of possible outcomes, providing a more reliable assessment of the BBAI’s robustness and generalizability.

The Probabilistic Capability Model Learning (PCML) framework employs a ‘Query Policy’ as a core component of its evaluation methodology. Unlike passive observation of an agent’s behavior, this policy actively selects specific actions to perform within a simulated ‘Stochastic Environment’. The purpose of this strategic action selection is to efficiently probe the boundaries of the agent’s capabilities and gather data that directly informs the construction of a probabilistic model. By choosing actions designed to maximize information gain regarding the agent’s potential, the Query Policy enables PCML to move beyond simply documenting observed behaviors and instead assess the full range of skills the agent can perform, even those not yet demonstrated.

The query policy within Probabilistic Capability Model Learning (PCML) employs Monte Carlo Tree Search (MCTS) as a decision-making algorithm to efficiently explore the space of possible agent actions. MCTS operates by constructing a search tree, where each node represents a state and each edge represents a potential action. The algorithm iteratively expands this tree through four stages: selection, expansion, simulation, and backpropagation. During selection, the algorithm traverses the tree, prioritizing nodes with high estimated values and low uncertainty. Expansion adds new nodes representing unexplored actions. Simulation then involves running a playout from the new node to estimate the value of that action. Finally, backpropagation updates the values of nodes along the path, refining the search tree based on the simulation results. This process is repeated multiple times, allowing the policy to strategically select actions that maximize information gain regarding the agent’s capabilities, rather than randomly sampling or relying on pre-defined heuristics.

Probabilistic Capability Model Learning (PCML) differentiates itself from passive AI evaluation methods by constructing a predictive model of an agent’s potential abilities. Traditional evaluations often rely on observing an AI’s responses to a fixed set of inputs, limiting assessment to demonstrated behaviors. PCML, however, actively queries the agent with specifically chosen actions, allowing it to infer capabilities even if those capabilities have not been previously exhibited. This approach generates a probability distribution over the agent’s potential actions and outcomes, effectively mapping the space of what the AI can do, as opposed to simply documenting what it has done, leading to a more comprehensive and robust capability assessment.

Across four evaluation problems, the pessimistic models of both PCML-E and PCML-S demonstrate lower variational distance, indicating improved performance compared to the random policy agent which exceeds a VD of 0.6 in first responder scenarios, as shown by the shaded standard deviation.

Bounding AI Behavior: Establishing Limits of Competence

PCML employs two distinct Capability Models – an Optimistic Model and a Pessimistic Model – to formally represent an agent’s potential actions and their associated outcomes. These models are not predictive of a single future, but rather define the bounds of possible agent behavior. The Optimistic Model outlines the agent’s maximum achievable capabilities, while the Pessimistic Model defines a conservative lower bound. Both models utilize Conditional Effect rules, which specify the conditions under which an action is taken and the resulting effects on the environment, providing a structured representation of the agent’s functional range.

The Pessimistic Model within the PCML framework functions as a safety mechanism by deliberately underestimating the AI agent’s capabilities. This conservative estimation is achieved through the application of Conditional Effect rules that prioritize proven outcomes and minimize the prediction of successful actions in uncertain scenarios. The resulting model ensures completeness by explicitly identifying potential failure states and bounding the agent’s behavior within safe parameters, thereby preventing unintended consequences or actions outside of defined operational limits. This approach prioritizes reliability and predictable performance over maximizing potential utility, serving as a critical component in risk mitigation and safe AI deployment.

The Optimistic Model within the PCML framework functions as a projection of the agent’s highest possible performance level. This model does not constrain action selection based on observed reliability; instead, it assumes successful completion of all possible actions defined by its ‘Conditional Effect’ rules. Consequently, the Optimistic Model prioritizes maximizing exploration and identifying potential utilities that a more conservative model might overlook. While not necessarily representative of consistently achievable outcomes, this model serves as a crucial component in broadening the scope of the agent’s capabilities assessment and discovering novel opportunities.

Both the Optimistic and Pessimistic Models within the PCML framework are implemented as instances of a ‘Capability Model’, a standardized structure for representing an agent’s potential actions. These models define possible actions and their resulting outcomes using ‘Conditional Effect’ rules, which specify that an action will produce a particular effect if certain conditions are met. These rules take the form of $condition \rightarrow effect$ pairings, allowing the system to reason about the likelihood and consequences of different actions based on the current state of the environment. The consistent use of Capability Models and Conditional Effect rules ensures a unified and predictable approach to defining and evaluating agent behavior across both optimistic and pessimistic scenarios.

Across three environments-Tireworld, Rendered Blocksworld, and Probabilistic Elevators-the sampled variational distance demonstrates the performance of PCML-E and PCML-S, with shaded regions indicating standard error across multiple runs utilizing pessimistic models.

Validating PCML: Quantifying the Accuracy of Capability Assessments

To rigorously evaluate how well a predictive capability model learns an agent’s true abilities, researchers employ Variational Distance as a key measurement. This metric quantifies the statistical difference between the learned, often optimistic and pessimistic, capability models and the actual behavior exhibited by the agent when operating within its environment. Essentially, it determines how closely the predicted range of an agent’s skills aligns with its demonstrated performance; a smaller distance indicates a more accurate prediction. Utilizing this approach, the system doesn’t just assess if a skill is possible, but provides a quantifiable understanding of the confidence in that assessment, offering a nuanced perspective on the agent’s capabilities and limitations. The calculation, formally expressed as $ \frac{1}{2} \sum_{x} |P(x) – Q(x)|$, where $P$ represents the true agent behavior and $Q$ the learned model, provides a precise value for comparison and optimization.

A core principle of Probabilistic Capability Model Learning (PCML) lies in its ability to quantify the accuracy of its capability assessments. This is achieved through the use of Variational Distance, a metric that effectively measures the divergence between the learned models – both optimistic and pessimistic – and the actual behavior of the AI agent. A lower variational distance signifies a stronger alignment between the predicted capabilities and the agent’s true performance, indicating a more accurate and reliable model. Consequently, this precision allows for increased confidence in deploying AI systems, as it provides a robust method for understanding what an agent can and cannot reliably achieve. The smaller the distance, the more faithfully the PCML reflects the agent’s capabilities, establishing a foundation for safe and predictable AI behavior in complex environments.

A core benefit of the proposed Probabilistic Capability Model Learning (PCML) lies in its ability to significantly refine the understanding of an AI agent’s true capabilities. Through rigorous testing across varied environments, PCML consistently demonstrates a substantial reduction in uncertainty, as quantified by the Variational Distance metric. Results indicate that PCML achieves up to a 60% lower Variational Distance compared to traditional random exploration methods, meaning the learned models more closely reflect actual agent behavior. This improvement isn’t merely statistical; it translates to a more reliable and accurate assessment of what an AI can and cannot accomplish, fostering increased trust and enabling safer, more effective deployment in complex, stochastic environments.

Evaluations across complex environments demonstrate the efficacy of the proposed approach in quantifying agent capability. Specifically, when tested on the cooperative cooking game Overcooked, a 60% reduction in Variational Distance was observed, indicating substantially improved accuracy in modeling the agent’s potential actions compared to traditional random exploration methods. Furthermore, in the task-oriented environment of SayCan, where agents must reason about physical possibilities, a 20% reduction in Variational Distance was recorded. These results highlight PCML’s capacity to refine capability estimations – moving beyond the uncertainty inherent in random approaches and providing a more reliable assessment of what an AI agent can actually achieve within a given stochastic environment.

To rigorously quantify the accuracy of capability model learning, the research employs Total Variation Distance (TVD), a metric offering a more refined assessment than simpler measures. TVD calculates the maximum difference in probability that two probability distributions – in this case, the learned capability model and the true agent behavior – assign to the same event. A lower TVD indicates a stronger alignment between the predicted and actual agent performance, signifying a more accurate capability assessment. Unlike metrics susceptible to being misled by overlapping uncertainties, TVD offers a precise, statistically sound evaluation, enabling researchers to confidently determine how well PCML captures an agent’s true abilities, particularly within complex and stochastic environments. This precision is crucial for building trustworthy and reliable AI systems, as it moves beyond simply identifying if an agent can perform a task, to understanding how likely it is to succeed.

Rigorous validation of the Proposed Capability Model Learning (PCML) framework hinges on testing it within complex, unpredictable environments; therefore, a suite of stochastic scenarios were employed to assess its performance. Utilizing platforms like PDDLGym – a toolkit for creating and evaluating planning domains – researchers subjected PCML to diverse challenges where outcomes are not fully determined by actions alone. This approach allows for a nuanced understanding of how well PCML can accurately gauge an agent’s capabilities when faced with inherent uncertainty. The resulting data provides critical insights into the framework’s robustness and reliability, demonstrating its potential for real-world applications where environments are rarely, if ever, fully predictable and require adaptability and informed decision-making under conditions of risk.

The pursuit of understanding black-box AI, as detailed in this work, necessitates a rigorous methodology. Ada Lovelace observed, “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.” This sentiment echoes the core principle of Probabilistic Capability Model Learning (PCML); the system doesn’t inherently know its limitations, but rather reveals them through structured exploration and probabilistic inference. The paper’s focus on moving beyond random exploration to actively learn these limitations aligns directly with Lovelace’s view – the engine, or in this case, the AI, executes precisely what it is instructed, and discerning its boundaries requires a defined, provable process-a ‘proof of correctness’ over mere empirical observation.

What Lies Ahead?

The pursuit of modeling black-box AI, as demonstrated by Probabilistic Capability Model Learning, skirts dangerously close to a fundamental paradox. One builds a model of a system whose internal workings are deliberately obscured. The efficacy of such modeling, therefore, rests not on mirroring internal mechanisms – an impossible task by design – but on accurately predicting external behaviors. It is a pragmatic, if slightly unsettling, exercise in statistical mimicry.

Future work must confront the inherent limitations of this approach. Current methods, while demonstrably superior to naive exploration, still rely on sampling and inference within a finite state space. The true complexity of these black-box systems may well reside beyond the reach of any computationally tractable model. A critical next step involves developing rigorous bounds on model accuracy and quantifying the risks associated with generalization to unseen scenarios. Optimization without such analysis remains self-deception, a trap for the unwary engineer.

Furthermore, the assumption of stationarity within stochastic environments warrants careful reconsideration. Real-world systems are rarely static; their capabilities evolve, and their limitations shift. A truly robust modeling framework must incorporate mechanisms for detecting and adapting to these dynamic changes, perhaps through continual learning or meta-modeling. The goal should not simply be to describe what an AI can do, but to anticipate what it will learn to do.

Original article: https://arxiv.org/pdf/2512.16733.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Challenge of Opacity in Artificial Intelligence

PCML: A Rigorous Mapping of AI Competence

Bounding AI Behavior: Establishing Limits of Competence

Validating PCML: Quantifying the Accuracy of Capability Assessments

What Lies Ahead?

See also: