Inside the Black Box: How Neural Networks Reshape Information

Author: Denis Avetisyan

New research reveals how the internal dynamics of deep learning models transform data, offering crucial insights into their representational power.

NerVE reveals that feedforward network nonlinearities within GPT-2 actively reshape information flow by injecting variance to revive dormant pathways-evidenced by post-activation signal enhancement-and simultaneously flattening the eigenspectrum, diminishing dominance of leading eigenvectors and concentrating this redistribution within specific network depths, as visualized by a localized transition band in the JS heatmap.

This paper introduces NerVE, a framework for analyzing nonlinear eigenspectrum dynamics in the feed-forward networks of large language models to understand variance reshaping and its impact on representational capacity.

Despite the dominance of feed-forward networks (FFNs) within large language models, their high-dimensional internal dynamics remain poorly understood, creating a gap in our ability to predictably optimize model architecture and training. This paper introduces NerVE-a novel eigenspectral framework for analyzing how FFNs organize and regulate information flow, revealing that nonlinearities fundamentally reshape variance across latent dimensions and impact representational capacity. By tracking spectral entropy, participation ratio, eigenvalue enrichment, and distributional shifts, NerVE consistently recovers stable signatures correlating with generalization ability across diverse architectures and optimizers-from normalization schemes to positional encodings. Can these insights move beyond trial-and-error, enabling a more principled design of future language models?

Unlocking the Transformer: Peering Behind the Curtain

The recent surge in artificial intelligence capabilities is largely driven by Large Language Models (LLMs), and at the heart of these models lies the Transformer architecture. Unlike previous sequential models, Transformers process entire input sequences in parallel, enabling significantly faster training and improved performance on tasks like text generation, translation, and question answering. This parallel processing is achieved through a mechanism called “self-attention,” allowing the model to weigh the importance of different parts of the input when making predictions. The Transformer’s ability to capture long-range dependencies within data, coupled with its scalability, has made it the dominant architecture in the field of natural language processing, consistently delivering state-of-the-art results and pushing the boundaries of what’s possible with AI.

Feed-Forward Networks (FFNs) represent a pivotal element within the Transformer architecture, serving as the primary engine for introducing nonlinearity and enabling the model to learn complex patterns from data. While the attention mechanism within Transformers excels at capturing relationships between different parts of an input sequence, it’s the FFN that allows the model to transform and refine this information. These networks, typically consisting of two linear layers with a non-linear activation function – often ReLU – in between, operate independently on each position in the sequence. This localized processing permits the FFN to increase the dimensionality of the data, effectively creating a richer representation before it’s passed on for further processing. Without these critical nonlinear transformations, the Transformer would be limited to linear operations, significantly hindering its ability to model the intricate relationships present in natural language and other complex datasets. The FFN, therefore, is not merely a component, but a fundamental building block that unlocks the full potential of the Transformer architecture.

The performance of Large Language Models is deeply intertwined with the often-overlooked Feed-Forward Networks (FFNs) embedded within the Transformer architecture. These networks, while seemingly simple in concept, perform crucial nonlinear transformations of data, allowing models to learn complex relationships and generate nuanced outputs. A thorough understanding of how information flows and is processed within these FFNs – including the impact of layer size, activation functions, and initialization strategies – is paramount for optimizing LLM efficiency and scalability. Furthermore, investigating the internal dynamics of FFNs may reveal potential limitations, such as susceptibility to adversarial attacks or biases learned from training data, opening avenues for developing more robust and reliable language models. Ultimately, dissecting the FFN’s role is not merely an academic exercise; it’s a vital step toward unlocking the full potential – and mitigating the risks – of increasingly powerful LLMs.

During GPT-2 training on CodeParrot, feedforward network nonlinearities dynamically regulate information flow-as evidenced by shifts in the eigenspectrum and layer-wise distributional changes-and correlate with decreasing evaluation loss <span class="katex-eq" data-katex-display="false"> (rr > 0.7) </span>. — During GPT-2 training on CodeParrot, feedforward network nonlinearities dynamically regulate information flow-as evidenced by shifts in the eigenspectrum and layer-wise distributional changes-and correlate with decreasing evaluation loss $(rr > 0.7)$ .

The Latent Landscape: Mapping Variance with Eigenspectra

The eigenspectrum of a Feed-Forward Network (FFN) weight matrix represents the distribution of variance across the dimensions of the network’s latent space. Specifically, the eigenvalues derived from this matrix quantify the amount of variance explained by each corresponding eigenvector, which defines a principal axis in the latent space. A larger eigenvalue indicates a dimension with high variance, suggesting that the network utilizes this dimension extensively for representation. Conversely, small eigenvalues indicate dimensions with low variance, potentially representing redundant or less informative directions. Analyzing the full eigenspectrum, therefore, provides a complete picture of how information is structured and distributed within the FFN’s learned representations, revealing insights into the network’s capacity to capture and process data.

Eigenspectrum analysis quantifies the distribution of variance within the latent space of a Feed-Forward Network (FFN) by examining the eigenvalues of the network’s weight matrix. The resulting eigenspectrum – a plot of eigenvalues – reveals how variance is allocated across different principal components. A spectrum with a few dominant eigenvalues indicates that variance is concentrated in a low-dimensional subspace, suggesting potential redundancy or limited expressiveness. Conversely, a flatter spectrum, with more evenly distributed eigenvalues, implies a higher-dimensional representation and greater potential capacity. Analyzing the eigenspectrum allows for the assessment of an FFN’s ability to capture complex relationships in data, and provides a means to evaluate the efficiency with which it utilizes its parameters; networks with more concentrated spectra may exhibit lower effective dimensionality despite a large number of parameters.

Eigenvalue Early Enrichment and the Participation Ratio (PR) serve as quantitative metrics for characterizing variance distribution within a Feed-Forward Network’s (FFN) weight matrix, effectively gauging the concentration and dimensionality of the latent space. Eigenvalue Early Enrichment quantifies the proportion of total variance explained by the leading eigenvalues, indicating the degree of variance concentration along principal components. The Participation Ratio, calculated as $\frac{1}{\sum_{i=1}^{N} \lambda_i^2}$ where $\lambda_i$ are the eigenvalues, inversely relates to the effective dimensionality; a lower PR indicates variance concentrated in fewer dimensions. Analysis demonstrates a post-activation increase in PR, suggesting that the FFN nonlinearity expands the effective dimensionality of the latent space by distributing variance more broadly across the dimensions after the activation function is applied.

Analysis of feedforward network layers reveals insights into activation dynamics via eigenspectral metrics-spectral entropy, participation ratio, and eigenvalue early enrichment-and Jensen-Shannon divergence, which collectively characterize activation dispersion, effective dimensionality, top-heaviness, and distributional shifts caused by nonlinearities, using a flattened activation matrix <span class="katex-eq" data-katex-display="false">X \in \mathbb{R}^{(B \times S) \times D}</span> where <i>B</i> is the batch size, <i>S</i> the sequence length, and <i>D</i> the hidden dimension. — Analysis of feedforward network layers reveals insights into activation dynamics via eigenspectral metrics-spectral entropy, participation ratio, and eigenvalue early enrichment-and Jensen-Shannon divergence, which collectively characterize activation dispersion, effective dimensionality, top-heaviness, and distributional shifts caused by nonlinearities, using a flattened activation matrix $X \in \mathbb{R}^{(B \times S) \times D}$ where B is the batch size, S the sequence length, and D the hidden dimension.

NerVE: A Framework for Dissecting Internal Geometry

NerVE is a framework designed to analyze the internal geometry of Feed-Forward Networks (FFNs) by employing eigenspectrum analysis. This involves computing the eigenvalues and eigenvectors of the Jacobian matrix of the FFN, providing insight into the network’s sensitivity and the distribution of variance across its latent dimensions. The framework systematically decomposes the variance within the network’s hidden states to reveal how information is represented and transformed. By analyzing changes in the eigenspectrum before and after nonlinear activations, NerVE quantifies the impact of these activations on the network’s internal representation. The resulting eigenspectra are then used as the basis for calculating metrics, such as Spectral Entropy and Participation Ratio, to assess the uniformity and distribution of variance, offering a rigorous, quantitative method for understanding FFN behavior.

NerVE employs Spectral Entropy (SE) and Jensen-Shannon Divergence (JSD) as quantitative metrics to assess the distribution of variance within the Feature-Forward Network (FFN) latent space and to measure the effect of the nonlinear activation function. $SE$ calculates the entropy of the eigenvalue distribution, providing a measure of spectral flatness – higher values indicate a more uniform distribution of variance across latent dimensions. $JSD$ quantifies the divergence between the pre-activation and post-activation eigenvalue distributions; a greater divergence signifies a substantial alteration in variance distribution due to the nonlinearity. These metrics enable a numerical assessment of how effectively the FFN redistributes variance, moving away from concentration in a few dimensions towards a more dispersed representation, and provides a standardized approach to compare the impact of different nonlinearities.

Analysis of Feedforward Network (FFN) latent spaces using NerVE demonstrates that nonlinear activations redistribute variance across dimensions, a process termed VarianceReinjection. Comparison of eigenspectra before and after activation reveals quantifiable changes: Spectral Entropy (SE) and Participation Ratio (PR) increase post-activation, indicating a more uniform distribution of variance across the latent dimensions. Simultaneously, Eigenvalue Early Enrichment (EEE) decreases, signifying a reduction in the concentration of variance within the initial few eigenvectors. These metrics collectively suggest that nonlinearities effectively inject variance into previously less-utilized dimensions, resulting in a flattening of the eigenspectrum and a more isotropic representation within the FFN latent space.

Different normalization methods-weight, spectral, and hyperspherical-applied to GPT-2’s feedforward networks induce unique internal dynamics, as evidenced by variations in Jensen-Shannon divergence (JS), latent capacity (<span class="katex-eq" data-katex-display="false">PR_{post}</span>), and spectral regularization (<span class="katex-eq" data-katex-display="false">\Delta\Delta EEE</span> and <span class="katex-eq" data-katex-display="false">EEE_{post}</span>). — Different normalization methods-weight, spectral, and hyperspherical-applied to GPT-2’s feedforward networks induce unique internal dynamics, as evidenced by variations in Jensen-Shannon divergence (JS), latent capacity ( $PR_{post}$ ), and spectral regularization ( $\Delta\Delta EEE$ and $EEE_{post}$ ).

Ripple Effects: Architectural Implications and Optimization Strategies

Analysis of Feedforward Networks (FFNs) reveals a surprising characteristic: a highly concentrated eigenspectrum. This means that while FFNs may have numerous parameters and representational dimensions, only a small subset of these dimensions actively contribute to the network’s learned representations. The eigenspectrum, which describes the distribution of singular values, demonstrates a steep drop-off, indicating that most of the representational ‘energy’ is focused within a few dominant dimensions. Consequently, the network isn’t fully utilizing its potential capacity; a significant portion of its parameters are effectively redundant. This finding suggests that strategies to more effectively distribute information across all dimensions, or to prune redundant ones, could lead to substantial improvements in model efficiency and generalization performance.

The observed concentration of eigenvalues within the eigenspectrum of Feedforward Networks (FFNs) implies a potential for significant optimization through techniques designed to promote Spectral Flattening. This phenomenon suggests that much of the network’s representational capacity remains untapped, as information processing is dominated by a limited number of principal components. By encouraging a more uniform distribution of eigenvalues, Spectral Flattening aims to activate a broader range of network dimensions, effectively increasing model capacity without necessarily increasing the number of parameters. Consequently, this approach may lead to improved generalization performance, allowing the network to better adapt to unseen data and avoid overfitting, as the representation becomes less reliant on a few dominant features and more robust to variations in the input.

The architecture of a feedforward network’s normalization layers significantly influences the distribution of its eigenvalues, thereby impacting both training stability and representational capacity. Research indicates that employing normalization techniques like RMSNorm and LayerNormalization actively reshapes the network’s eigenspectrum, preventing extreme values and promoting more efficient information flow. Notably, Adafactor consistently achieves a higher post-activation Precision-Recall (PR) score, suggesting superior performance in preserving information after activation, while Muon demonstrates the highest pre-activation PR, indicating an enhanced ability to capture relevant features before activation. These findings highlight the critical role of normalization not merely as a training stabilizer, but as a key component in optimizing the network’s inherent representational power and ultimately, its ability to generalize to unseen data.

Training MLP-Mixer models on CIFAR-100 reveals that while replacing LayerNorm with RMSNorm maintains stable FFN eigenspectra, LayerNorm ultimately achieves higher effective dimensionality and a flatter spectrum in later training phases.

Towards Variance-Aware Design: Charting a Course for Future Research

Researchers are increasingly focused on designing neural network architectures capable of actively managing their internal variance during the learning process. This involves explicitly controlling the eigenspectrum – the distribution of eigenvalues – of the network’s weight matrices, with the goal of improving both training stability and generalization performance. By manipulating these eigenvalues, architects aim to prevent undesirable behaviors such as exploding or vanishing gradients, and to encourage a more balanced representation of information throughout the network. Future designs will likely incorporate mechanisms for regularizing or directly shaping the eigenspectrum, potentially through novel layer constructions or adaptive learning rate schemes that respond to changes in the network’s variance profile, ultimately leading to more robust and efficient deep learning models.

Current neural network training often overlooks the critical relationship between optimization algorithms and the eigenspectral properties of weight matrices. Research suggests a strong connection: the distribution of singular values – the eigenspectrum – profoundly influences the training dynamics and generalization ability of a network. Investigating how different optimization methods – such as stochastic gradient descent or Adam – interact with and shape this eigenspectrum could unlock significantly more efficient training procedures. Specifically, tailoring optimization algorithms to explicitly control or stabilize the eigenspectrum – perhaps by encouraging a more isotropic distribution of singular values – may mitigate common issues like vanishing or exploding gradients and accelerate convergence. This approach moves beyond simply minimizing loss and towards actively sculpting the internal representation learned by the network, potentially leading to models that are both faster to train and more robust in deployment.

Extending the application of the Neural Variance Estimator (NerVE) beyond current architectures represents a crucial step towards establishing universally applicable principles in neural network design. By systematically analyzing a diverse set of models – encompassing transformers, convolutional networks, and recurrent systems – researchers aim to identify consistent relationships between architectural choices, eigenspectral properties, and generalization performance. This broader investigation isn’t simply about applying a tool to new models; it’s about uncovering fundamental characteristics of robust neural representations. The goal is to move beyond ad-hoc design principles and towards a theoretically grounded understanding of how variance control can be proactively integrated into network construction, ultimately leading to more efficient, reliable, and adaptable artificial intelligence systems.

Training GPT-2 models with GELU or ReLU nonlinearities on the CodeParrot dataset reveals that these functions regulate information flow by reshaping the eigenspectrum and inducing layerwise distributional shifts, as measured by eigen-metrics like singular entropy (SE), participation ratio (PR), eigenvalue entropy of eigenvalues (EEE), and Jensen-Shannon divergence (JS).

The exploration within this paper, detailing NerVE and its analysis of feed-forward networks, echoes a sentiment akin to Marvin Minsky’s assertion: “The more we learn about intelligence, the more we realize how much of it is simply good bookkeeping.” NerVE, in essence, performs precisely this bookkeeping – meticulously tracing the transformations of variance through the eigenspectrum. By dissecting how these networks reshape information, the framework reveals the underlying ‘accounting’ of representational capacity. It’s not merely about what the network learns, but how it manages and manipulates the data’s inherent structure – a process of elegantly tracking and transforming information, much like a sophisticated system of ledgers.

What Breaks Next?

The NerVE framework, by dissecting the eigenspectrum dynamics within LLM feed-forward networks, doesn’t so much solve the mystery of representational capacity as it meticulously maps the fault lines. One wonders, naturally, what happens when those lines give way. Current work assumes a certain stability in these eigenspectra during optimization – but what if intentionally destabilizing them yields more robust, less brittle models? The framework allows for a systematic exploration of variance reshaping, yet largely treats LayerNorm as a fixed constraint. A true test would involve dynamically altering normalization parameters during training and observing the resulting cascade of spectral shifts.

Furthermore, NerVE rightly highlights the importance of spectral properties, but remains largely confined to the feed-forward block. The assumption that recurrent or attention mechanisms operate on similarly well-behaved eigenspectra deserves rigorous challenge. It’s a comfortable simplification, but potentially a limiting one. The real leverage likely lies in understanding how these different network components interact at the spectral level – where information is amplified, suppressed, or fundamentally altered.

Ultimately, this work isn’t about finding the ‘correct’ eigenspectrum, but acknowledging that the current ones are simply a local minimum in a vastly more complex landscape. The next step isn’t refinement, it’s controlled demolition – systematically perturbing these networks to expose their hidden vulnerabilities and, perhaps, stumble upon configurations that truly defy expectation.

Original article: https://arxiv.org/pdf/2603.06922.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/