Can AI Know What It Doesn’t Know?
![The study dissects the confidence scaling of large language models-specifically GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508-across three distinct tasks, revealing disparities in their ability to align reported confidence levels with task accuracy, as evidenced by metrics like [latex]d'\relax[/latex] and [latex]\text{Mrati}\relax[/latex], and further refined by the exclusion of outlier data points-approximately 0.1% for Mistral-Medium-2508 in task B-to ensure a robust assessment of confidence calibration across a trial count of [latex]2 \times \Gamma_{3}0^{\relax}[/latex] for task A and [latex]\Gamma_{3}0^{\relax}[/latex] for tasks B and C.](https://arxiv.org/html/2603.29693v1/x1.png)
New research explores whether large language models possess the capacity for metacognition – the ability to assess their own confidence and uncertainty.
![The study dissects the confidence scaling of large language models-specifically GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508-across three distinct tasks, revealing disparities in their ability to align reported confidence levels with task accuracy, as evidenced by metrics like [latex]d'\relax[/latex] and [latex]\text{Mrati}\relax[/latex], and further refined by the exclusion of outlier data points-approximately 0.1% for Mistral-Medium-2508 in task B-to ensure a robust assessment of confidence calibration across a trial count of [latex]2 \times \Gamma_{3}0^{\relax}[/latex] for task A and [latex]\Gamma_{3}0^{\relax}[/latex] for tasks B and C.](https://arxiv.org/html/2603.29693v1/x1.png)
New research explores whether large language models possess the capacity for metacognition – the ability to assess their own confidence and uncertainty.

As increasingly complex AI systems begin to collaborate, unforeseen and potentially harmful behaviors can arise from the interactions of individually rational agents.

A new study reveals improved methods for predicting abrupt changes in dynamic systems subjected to slow, repeating forces.
A growing body of research demonstrates that topological methods offer a powerful new lens for understanding organization and change in complex systems, moving beyond traditional approaches.

A new approach combines the power of deep learning with interpretable statistics to better predict mortgage defaults and understand the factors driving credit risk.

Artificial intelligence is transforming network defense, but its performance isn’t guaranteed in the face of evolving threats and real-world data challenges.
A new framework quantifies plasticity by linking network structure to dynamical regimes, offering a measurable way to understand a system’s responsiveness to change.
![The system demonstrates that, given conditioning on variable 55, the distribution [latex]J^{\{5,6\}}\_{6}(\cdot\,|\,x\_{5};x\_{\{2,4\}})[/latex] becomes independent of variables 2 and 4, effectively nullifying the parent set of node 6-a consequence of the structural equilibrium model applied to the chain-connected anterial graph [latex]\mathcal{G}\_{1}[/latex]-and highlighting the propagation of error variable distributions within the interconnected system.](https://arxiv.org/html/2603.24859v1/x4.png)
Researchers have developed a graphical approach to reliably identify causal relationships in systems where multiple factors interact and confounding variables obscure the true drivers.
![The study demonstrates variation across three distinct multimodal datasets[7], each offering unique samples reflective of inherent systemic differences in data representation.](https://arxiv.org/html/2603.25103v1/dataset_3_sample.png)
Researchers have developed a self-supervised learning framework that enhances the reliability of AI systems handling multiple data types, making them more resilient to errors and anomalies.
New research shows a deep learning model can predict health risks across multiple body systems simply by analyzing 3D skeletal motion.