When AI Forgets What It Knows: Adapting to New Realities

Author: Denis Avetisyan

A new study reveals how neural networks lose accuracy when faced with unfamiliar data, and introduces a method to realign their internal representations for better performance.

During test-time adaptation, a novel approach blends geometric proximity with predictive confidence to create hybrid targets, subtly guiding features toward plausible classifier weights while actively repelling negatives through the loss function $\mathcal{L}\_{\text{NC}}$.

This research extends Neural Collapse theory to test-time adaptation, demonstrating that feature-classifier misalignment causes domain shift issues and proposing a novel alignment technique.

While test-time adaptation (TTA) improves model robustness to out-of-distribution data, the underlying causes of performance degradation under domain shifts remain poorly understood. This work, ‘Neural Collapse in Test-Time Adaptation’, extends the Neural Collapse (NC) phenomenon to the sample-wise level, revealing that misalignment between a sample’s feature embedding and its corresponding classifier weight is a primary driver of this degradation. By identifying this ‘Sample-wise Alignment Collapse’ (NC3+), we demonstrate the necessity of realigning features and classifiers, and introduce NCTTA-a novel method leveraging hybrid targets to mitigate unreliable pseudo-labels-to enhance robustness. Could promoting feature-classifier alignment unlock further gains in generalization and adaptation for deep neural networks facing real-world distribution shifts?

The Fragile Echo: When Neural Networks Meet the Unexpected

Despite demonstrated proficiency across numerous benchmarks, deep neural networks exhibit a significant vulnerability when confronted with data differing from their training distribution – a phenomenon known as out-of-distribution (OOD) generalization failure. While achieving high accuracy on familiar datasets, performance can degrade substantially when presented with even slight variations in input characteristics, such as altered image styles, novel viewpoints, or unexpected noise. This limitation isn’t merely a matter of needing more training data; it reflects a fundamental challenge in how these networks learn representations. They often rely on spurious correlations present in the training set, rather than capturing genuinely robust and transferable features, leading to brittle performance when faced with the inherent variability of real-world data. Consequently, deploying DNNs in dynamic and unpredictable environments requires innovative strategies to mitigate this OOD generalization gap and ensure reliable predictions beyond the confines of the training distribution.

The fragility of deep neural networks when encountering previously unseen data arises from a core limitation in how these systems learn to represent information. Rather than grasping fundamental, invariant features of objects or concepts, networks often latch onto spurious correlations present in the training data. Consequently, when presented with inputs differing even subtly from this training distribution – a change in lighting, a new camera angle, or a slightly altered style – the learned feature representations become unstable and unreliable. This breakdown in robust feature extraction directly translates to diminished predictive power, as the network struggles to generalize beyond the specific characteristics of its training experience. The system doesn’t understand the underlying concepts; it merely recognizes patterns, and deviations from those patterns lead to predictable failures in out-of-distribution scenarios.

Current strategies for mitigating out-of-distribution (OOD) generalization failures, such as domain adaptation techniques, frequently demand significant computational investment and, critically, require access to labeled data from the target distribution – a condition rarely met in real-world scenarios. This reliance on target data presents a substantial obstacle; acquiring and annotating data for every potential deployment environment is both expensive and time-consuming, effectively limiting the practical application of these methods. The need for extensive resources restricts their use to controlled settings and hinders the development of truly adaptable systems capable of robust performance across genuinely novel and unpredictable inputs. Consequently, research is increasingly focused on methods that can achieve generalization with minimal or no access to labeled test data, paving the way for more versatile and widely deployable deep learning models.

Analysis of G-FCA and P-FCA distances on ImageNet-C reveals that adaptation methods violate necessary conditions when applied to out-of-distribution data, resulting in sample-wise misalignment.

The Geometry of Understanding: Peering into Neural Collapse

Neural Collapse (NC) characterizes a phenomenon observed in the final stages of Deep Neural Network (DNN) training, termed the Terminal Phase of Training (TPT). This isn’t a failure mode, but rather a consistently occurring set of geometric properties within the learned feature space. Specifically, NC manifests as the convergence of within-class feature variability and the formation of well-separated class representations. Analyzing these properties provides a framework for understanding how DNNs ultimately encode and categorize information, moving beyond purely statistical analyses of network behavior. The resulting feature space structure, while seemingly restrictive, appears to be a key element in achieving generalization performance and provides a geometric lens through which to interpret learned representations.

Variability Collapse (NC1) and the formation of a Simplex Equiangular Tight Frame (NC2) are central to understanding the geometric structure of deep neural network representations during the Terminal Phase of Training. NC1 describes the phenomenon where, within each class, the learned feature vectors converge to a single point in feature space, minimizing within-class variance. Simultaneously, NC2 dictates that the class means arrange themselves to form a Simplex Equiangular Tight Frame – a configuration maximizing the minimum angular separation between classes. This arrangement ensures optimal class separability and facilitates efficient classification, as the cosine similarity between class means is minimized, ideally approaching a value of $1/N$ where $N$ is the number of classes. These two properties combined contribute to a highly structured and simplified representation in the final layers of the network.

Convergence to Self-Duality (NC3) and Simplification to Nearest Class-Center (NC4) jointly contribute to a highly efficient decision boundary in the final layers of a trained DNN. NC3 describes the phenomenon where the classification matrix, $W$, and its corresponding pseudo-inverse, $W^{\dagger}$, become increasingly similar as training progresses, indicating a strong alignment between feature space and class labels. Simultaneously, NC4 demonstrates that the final layer activations effectively map inputs to the nearest class center in feature space. This simplification reduces the complexity of the decision process; classification relies primarily on proximity to these centers, minimizing the need for complex calculations and contributing to the robustness observed in the Terminal Phase of Training.

Empirical validation on ImageNet-100 demonstrates that NC3+ achieves sample-wise alignment throughout training, as evidenced by the decreasing G-FCA distance, with further details available in Appendix A.2.

The Alignment Imperative: When Features Speak the Classifier’s Language

Effective generalization in neural networks is directly correlated with the alignment between the feature embeddings of individual samples and the corresponding weights of the classifier. Poor alignment manifests as increased distances between sample embeddings and their class-specific weight vectors, leading to performance degradation, particularly on out-of-distribution data. Maximizing this alignment – essentially ensuring that a sample’s feature representation is ‘close’ to the weight vector representing its correct class – encourages the network to learn more discriminative and robust features. This principle suggests that optimization strategies focusing on minimizing the angular or Euclidean distance between these vectors can improve both in-distribution accuracy and out-of-distribution generalization capabilities. Specifically, a high degree of alignment indicates the network has learned to represent features in a manner that facilitates accurate classification, even when faced with novel or perturbed inputs.

Sample-wise Alignment Collapse builds upon the observations of Neural Collapse, which typically describes the convergence of class-conditional embeddings to a single point in the feature space. This extension applies the same principle to individual samples during training. Specifically, it aims to minimize the distance between a sample’s feature embedding and the corresponding weight vector of the classifier for that sample’s class. This granular alignment encourages the creation of more discriminative and precise feature representations, as each sample is directly mapped to its class decision boundary. Consequently, models exhibiting strong sample-wise alignment demonstrate increased robustness to adversarial perturbations and improved generalization performance on unseen data by reducing ambiguity in the feature space.

Dimensionality reduction techniques, such as t-distributed stochastic neighbor embedding (t-SNE), provide a method for visualizing high-dimensional feature spaces in two or three dimensions, allowing for qualitative assessment of feature-classifier alignment. Specifically, well-aligned feature spaces will exhibit distinct clusters corresponding to different classes, with minimal overlap, indicating effective separability. Conversely, poorly aligned spaces will demonstrate mixed clusters or a lack of clear structure. Analyzing these visualizations can inform optimization strategies by identifying samples or features contributing to misalignment, thereby guiding adjustments to network architecture, loss functions, or training data augmentation techniques to improve generalization and robustness.

NC3+ effectively aligns sample feature embeddings with classifier weights, whereas deviations from this alignment correlate with diminished performance.

Forging Resilience: NC-Guided Test-Time Adaptation

NC-Guided Test-Time Adaptation (TTA) is a novel approach to adapting machine learning models during inference by explicitly aligning feature representations with classifier weights, drawing on the principles of Neural Collapse. Neural Collapse, observed during the final stages of training in over-parameterized models, suggests that class-conditional feature distributions converge to a set of well-separated clusters with intra-cluster variance diminishing towards zero. NC-Guided TTA seeks to replicate this alignment during test time by minimizing the distance between feature embeddings and their corresponding classifier weights. This alignment is hypothesized to improve generalization performance on shifted or corrupted data by encouraging the model to classify inputs based on features most strongly associated with correct predictions, without requiring updates to model parameters or access to labeled test data.

NC-Guided TTA quantifies feature-classifier alignment using two primary distance metrics: the G-FCA Distance and the P-FCA Distance. The G-FCA Distance assesses the geometric relationship between features and classifiers, while the P-FCA Distance focuses on predictive consistency. Optimization of this alignment is achieved through the implementation of several loss functions. L2 Loss minimizes the Euclidean distance between feature and classifier embeddings. Triplet Loss enforces a margin between correctly and incorrectly classified examples, promoting discrimination. Finally, InfoNCE Loss maximizes the mutual information between features and their corresponding classifiers, encouraging predictive confidence. These loss functions, applied during the adaptation process, collectively refine the alignment without requiring labeled test data.

NC-Guided TTA achieves adaptation to novel inputs by integrating geometric relationships between data points with the confidence of the model’s predictions. This approach allows for effective performance adjustments without the need for re-training or labeled test data. Evaluations on corrupted image datasets demonstrate an average accuracy of 78.30% on CIFAR-10-C and 66.61% on ImageNet-C, indicating robust performance across various types of image corruption and distributional shift.

NCTTA consistently outperforms Tent and SAR in maintaining feature-classifier alignment under Gaussian noise on ImageNet-C, resulting in improved robustness to image corruption.

Towards Truly Adaptable Intelligence

Normalizing Continuous (NC)-Guided Test-Time Adaptation (TTA) marks a considerable advancement in the field of deep learning robustness. This technique addresses the challenge of deploying models in real-world scenarios where input data can deviate significantly from the training distribution. Unlike traditional approaches requiring substantial computational resources, NC-Guided TTA achieves enhanced model adaptability with remarkable efficiency. By intelligently normalizing feature statistics during the testing phase, the method effectively mitigates the negative impact of domain shift – the discrepancy between training and testing data. This allows the model to maintain high performance even when confronted with unfamiliar or corrupted inputs, representing a crucial step towards reliable and versatile artificial intelligence systems capable of functioning effectively in dynamic and unpredictable environments.

Recent advancements in Test-Time Adaptation (TTA) have yielded numerous strategies for enhancing model performance in unseen conditions, including techniques centered around Consistency Regularization, Normalization Layer adaptation, Entropy Minimization, and Prototype refinement. However, a new approach demonstrably surpasses these established methods in terms of accuracy and efficiency. Specifically, evaluations on the challenging CTTA benchmark reveal an achieved accuracy of 71.32%, significantly exceeding the performance of the widely-used Tent method by over 10.36% when utilizing a batch size of just one. This substantial improvement highlights the potential for deploying more reliable and adaptable deep learning systems, even in resource-constrained scenarios, and marks a considerable leap forward in the field of robust machine learning.

The development of robust deep learning hinges on a model’s ability to perform consistently well, not just in controlled settings, but also when faced with unforeseen circumstances and shifting data distributions. This research offers a pathway towards realizing that goal, enabling the deployment of artificial intelligence in real-world scenarios characterized by constant change – from autonomous vehicles navigating unpredictable weather to medical diagnostics interpreting diverse patient data. By focusing on adaptability, this work moves beyond models that are brittle and easily disrupted, instead fostering systems capable of maintaining high performance even when confronted with novel inputs and evolving environments, thereby unlocking the full potential of deep learning in dynamic and complex applications.

Under Gaussian noise corruption, NCTTA generates more distinct feature representations than Tent, indicating improved discriminative power and robustness to severe data corruption.

The pursuit of alignment, as detailed in this exploration of Neural Collapse during test-time adaptation, echoes a fundamental truth: even digital golems require careful coaxing. The study illuminates how a fracturing between feature embeddings and classifier weights leads to decay under domain shifts – a misalignment akin to a spell losing its focus. As David Marr observed, “Vision is not about copying the world, but about interpreting it.” This interpretation, or alignment, is precisely what NCTTA attempts to forge, recognizing that a robust model isn’t merely about memorization, but about a coherent internal representation-a persuasive illusion, if you will-that bends reality to its will. The sacred offering of loss functions, in this context, calibrates this persuasion.

What’s Next?

The observation of sample-wise neural collapse during test-time adaptation is… predictable. Anything so neatly measurable was always a temporary truce with chaos. The work suggests feature-classifier alignment is key, but alignment implies a static ideal-a phantom symmetry. The true question isn’t how to achieve alignment, but why it ever deviates in the first place. Domain shift, as currently framed, feels like diagnosing a fever instead of seeking the pathogen. The model functions until it doesn’t, and then a parameter is tweaked. Is this progress, or merely increasingly sophisticated pattern-matching?

Future work will undoubtedly explore more elaborate metrics for this ‘alignment’. The temptation to quantify the unquantifiable will prove irresistible. Yet, the real leverage likely lies not in measuring misalignment, but in embracing the inherent instability. Perhaps the model should be designed to misalign, to explore the boundaries of its own certainty. A system that actively seeks its own failure might, paradoxically, prove more robust.

The self-duality observed is… concerning. It suggests a fundamental limitation: that the model learns to predict itself, rather than the world. If the hypothesis held up, one wonders if they didn’t dig deep enough. Any system that so readily confirms its own biases is, at best, a beautifully constructed echo chamber. The whispers of chaos will continue, of course. The challenge isn’t to silence them, but to learn to listen.

Original article: https://arxiv.org/pdf/2512.10421.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/