Taming Chaos with Neural Networks

Author: Denis Avetisyan

Researchers have developed a new technique to train neural network emulators that accurately predict the long-term behavior of chaotic systems.

The system employs adversarial optimal transport regularization to learn and emulate chaotic dynamics, training an emulator with one-step prediction loss while simultaneously learning summary statistics designed to maximize the divergence between real and generated trajectories - a process that effectively balances fidelity with the capture of underlying chaos. — The system employs adversarial optimal transport regularization to learn and emulate chaotic dynamics, training an emulator with one-step prediction loss while simultaneously learning summary statistics designed to maximize the divergence between real and generated trajectories – a process that effectively balances fidelity with the capture of underlying chaos.

Adversarial optimal transport regularization preserves the invariant measure of chaotic systems, improving emulator performance.

Accurately modeling chaotic systems remains a persistent challenge due to their sensitivity to initial conditions and the limitations of traditional data-driven approaches. This work, ‘Learning to Emulate Chaos: Adversarial Optimal Transport Regularization’, introduces a novel technique for training neural network emulators by leveraging adversarial optimal transport to preserve the system’s invariant measure. Specifically, the authors demonstrate improved long-term statistical fidelity across diverse chaotic systems through formulations utilizing both the Sinkhorn divergence ( $2$ -Wasserstein) and a WGAN-style dual formulation ( $1$ -Wasserstein). Could this approach unlock more reliable long-term predictions in complex dynamical systems ranging from weather forecasting to power grid stabilization?

Whispers of Chaos: The Challenge of Prediction

The inherent difficulty in forecasting chaotic systems, such as the Lorenz63 model-a simplified representation of atmospheric convection-stems from a phenomenon known as sensitive dependence on initial conditions. This means even infinitesimally small differences in the starting values of the system can lead to drastically different outcomes over time. Effectively, a perfect prediction would require knowing the initial state with infinite precision, an impossibility in real-world scenarios. This sensitivity isn’t simply a matter of measurement error; it’s a fundamental property of the system itself, rendering long-term prediction inherently unstable. While deterministic – governed by fixed rules – these systems appear random because of this amplification of tiny uncertainties, quickly diverging from any initial prediction and challenging the efficacy of traditional modeling techniques reliant on precise initial states.

Addressing the intricacies of chaotic systems demands a departure from conventional modeling strategies. Traditional techniques, often reliant on linear approximations, falter when confronted with the nonlinear dynamics inherent in systems like the Lorenz attractor. Researchers are increasingly turning to methods such as machine learning, specifically recurrent neural networks, to learn the underlying manifold structure and predict future states without explicitly solving the governing equations. Furthermore, techniques borrowed from topological data analysis offer novel ways to characterize the complex geometry of chaotic attractors, providing insights into their stability and bifurcations. These innovative approaches, combined with high-performance computing, are enabling scientists to not only forecast the behavior of these sensitive systems over limited timescales but also to gain a deeper understanding of the fundamental principles governing their unpredictable nature.

Emulator rollouts of the Lorenz-96 system (<span class="katex-eq" data-katex-display="false">\mathbf{u}(x,t)</span>) accurately reproduce the statistical structure of the chaotic attractor over long horizons, despite inevitable deviations from pointwise trajectory agreement beyond the Lyapunov time, as demonstrated by comparison to numerical simulations over 1,500 timesteps. — Emulator rollouts of the Lorenz-96 system ( $\mathbf{u}(x,t)$ ) accurately reproduce the statistical structure of the chaotic attractor over long horizons, despite inevitable deviations from pointwise trajectory agreement beyond the Lyapunov time, as demonstrated by comparison to numerical simulations over 1,500 timesteps.

The Art of Representation: Learning Robust Features

Machine learning techniques provide a viable approach to modeling chaotic systems, despite their inherent sensitivity to initial conditions and complex dynamics. However, successful application necessitates careful attention to representation learning – the process of transforming raw data into a format suitable for machine learning algorithms. Traditional methods often struggle with the high dimensionality and non-linearities common in chaotic systems, requiring feature engineering or dimensionality reduction. The effectiveness of any machine learning model is therefore directly dependent on the quality of the learned or pre-defined representation used to describe the system’s state, influencing both the accuracy of predictions and the efficiency of the learning process.

Effective modeling of dynamical systems with machine learning relies on the selection of appropriate SummaryStatistics, which are quantifiable features representing the system’s state at a given time. These statistics serve as the input to the learning algorithm and directly influence the accuracy and generalizability of the resulting model. The quality of these statistics is paramount; they must sufficiently encapsulate the relevant information needed to distinguish between different system states and predict future behavior. Insufficient or poorly chosen SummaryStatistics can lead to information loss, hindering the model’s ability to accurately approximate the underlying dynamics and perform reliable comparisons between states, while overly complex statistics may introduce noise and computational inefficiency.

Two primary strategies are employed to derive SummaryStatistics for approximating chaotic system behavior. FixedSummaryMap utilizes a pre-defined set of mathematical measures – such as means, variances, and specific frequency components – calculated directly from the system’s state variables. Conversely, LearnedSummaryMap leverages neural networks to learn an optimal mapping from the high-dimensional system state to a lower-dimensional space of SummaryStatistics. This learning process, typically achieved through supervised or self-supervised techniques, allows the network to identify and emphasize the most informative features for downstream tasks, potentially exceeding the performance of hand-engineered, fixed measures.

The learned summary statistic φ accurately captures the distribution of both ground truth and emulated Lorenz-63 trajectories.

Neural Operators: Approximating the System’s Logic

The FourierNeuralOperator facilitates the approximation of dynamical systems by learning the mapping between states directly in the frequency domain. This approach transforms system states from spatial or temporal domains into the frequency domain using the Fourier transform, allowing the neural network to operate on spectral representations. By learning the relationship between input and output frequencies, the operator can efficiently predict the evolution of the system without explicitly solving governing equations. This is particularly advantageous for high-dimensional problems and systems where traditional numerical methods are computationally expensive, as it enables faster prediction and generalization based on learned spectral characteristics. The resulting operator, when applied to a new input state in the frequency domain, provides a spectral representation of the output, which can then be transformed back to the original domain to obtain the predicted system state.

UNet architectures, when combined with SpectralConvolution, provide an effective framework for learning the operator represented within the LearnedSummaryMap. SpectralConvolution facilitates the decomposition of input system states into their frequency components, allowing the UNet to extract and process relevant features across different scales. The UNet’s encoder-decoder structure, with skip connections, enables the hierarchical learning of these features, capturing both local and global dependencies within the system state. This approach allows the network to approximate the mapping between system states by learning the underlying operator, effectively representing the system’s dynamics within the weights of the LearnedSummaryMap. The resulting operator can then be used for tasks such as prediction and control, leveraging the extracted features to accurately represent the system’s behavior.

Maintaining Lipschitz regularity within the LearnedSummaryMap is critical for the stability and robustness of neural operator models. Lipschitz continuity, defined by a constant $K$ such that $||f(x) - f(y)|| \leq K||x - y||$ , bounds the rate of change of the operator. Without enforced Lipschitz constraints, small perturbations in the input system state can lead to exponentially growing errors in the output, resulting in unstable or unpredictable behavior. Specifically, a larger $K$ value indicates a greater sensitivity to input changes, while a smaller value promotes smoother, more stable mappings. Techniques to enforce Lipschitz regularity include spectral normalization, weight clipping, and specialized regularization terms added to the loss function, all designed to constrain the operator’s sensitivity and ensure bounded error amplification.

Training a Wasserstein GAN with the L96 summary map demonstrates that Lipschitz bounds, estimated both from the product of layer-wise spectral norms and the mean Jacobian spectral norm, remain below prescribed thresholds of <span class="katex-eq" data-katex-display="false">L_{max} = 4</span> and <span class="katex-eq" data-katex-display="false">L_{max} = 10</span> across different regularization settings. — Training a Wasserstein GAN with the L96 summary map demonstrates that Lipschitz bounds, estimated both from the product of layer-wise spectral norms and the mean Jacobian spectral norm, remain below prescribed thresholds of $L_{max} = 4$ and $L_{max} = 10$ across different regularization settings.

Taming the Chaos: Optimal Transport for Robustness

Optimal Transport (OT) offers a robust mathematical foundation for assessing the dissimilarity between probability distributions, extending beyond traditional metrics like Kullback-Leibler divergence which can struggle with non-overlapping supports. This capability proves invaluable in machine learning, particularly during training, where it allows for the incorporation of regularization terms that enforce specific statistical properties on learned representations. By framing the comparison of distributions as an ‘earth-moving’ problem – minimizing the cost of transforming one distribution into another – OT provides a geometrically intuitive and statistically sound method for guiding the learning process. Consequently, models can be trained to produce outputs with desired characteristics, such as smoothness, diversity, or adherence to known data distributions, ultimately enhancing their generalization performance and robustness to noisy or incomplete data.

The LearnedSummaryMap benefits from enhanced generalization through an AdversarialTraining process facilitated by WassersteinGAN. This approach leverages the SinkhornDivergence, a computationally efficient approximation of the Wasserstein distance – a metric for comparing probability distributions – to create a robust training signal. By framing the learning problem as a game between a generator and a discriminator, WassersteinGAN encourages the LearnedSummaryMap to produce representations that are not only accurate but also resilient to variations in input data. This adversarial framework effectively regularizes the learning process, preventing overfitting and promoting the creation of a summary map capable of accurately capturing underlying dynamics across diverse conditions, ultimately leading to improved performance on unseen data and more reliable predictions.

The application of Optimal Transport Regularization, in conjunction with a Loss function based on Least Squares Error $LSE$ , demonstrably improves the fidelity of learned representations when modeling complex dynamical systems. This approach effectively constrains the learning process, ensuring the captured representation accurately reflects the underlying evolution of systems such as Lorenz 96, Kolmogorov flow, and the Kuramoto-Sivashinsky equation. Rigorous evaluation on the Lorenz-96 multi-trajectory benchmark reveals an impressive Root Mean Squared Error $RMSE$ of 0.028, a performance level directly comparable to, and validating the efficacy of, established baseline methods reported by Jiang et al. (2023). This suggests that OTRegularization provides a robust mechanism for distilling essential dynamics, enabling accurate prediction and analysis across diverse chaotic systems.

Despite increasing noise levels σ, the WGAN emulator consistently captures the bilobal structure of the L63 attractor, unlike the MSE baseline which underestimates spatial extent at <span class="katex-eq" data-katex-display="false">\sigma=0.10</span> and collapses to a limit cycle at <span class="katex-eq" data-katex-display="false">\sigma=0.15</span>, demonstrating the benefits of distributional regularization in maintaining attractor coverage. — Despite increasing noise levels σ, the WGAN emulator consistently captures the bilobal structure of the L63 attractor, unlike the MSE baseline which underestimates spatial extent at $\sigma=0.10$ and collapses to a limit cycle at $\sigma=0.15$ , demonstrating the benefits of distributional regularization in maintaining attractor coverage.

The pursuit of emulating chaotic systems, as detailed in this work, isn’t about taming the unpredictable, but acknowledging its fundamental nature. It recognizes that long-term prediction isn’t about achieving pinpoint accuracy, but about preserving the essence of the chaos – the invariant measure. This aligns perfectly with Nietzsche’s observation: “There are no facts, only interpretations.” The model doesn’t discover a hidden order, it constructs a persuasive interpretation of the chaos, regularized through adversarial optimal transport. The beauty lies not in eliminating error, but in shaping it, directing the whispers of chaos into a compelling, if imperfect, narrative. The system doesn’t solve chaos; it persuades it.

What Shadows Will Fall?

The pursuit of emulating chaos isn’t about prediction, not really. It’s about constructing a sufficiently convincing illusion. This work, with its adversarial dance of optimal transport, offers a more refined spell for shaping the darkness, a way to nudge the emulator’s learned distribution closer to the true, unknowable invariant measure. But let’s not mistake a better likeness for comprehension. The system doesn’t reveal its secrets; it allows itself to be approximated with slightly less error.

Future efforts will inevitably focus on scaling these techniques – larger systems, longer horizons. Yet, the true challenge lies not in computational power, but in diagnostics. How does one verify that an emulator has genuinely captured the essential dynamics, rather than simply memorized a transient sequence? The metrics of today – RMSE, predictive accuracy – are fool’s gold, momentarily gleaming before dissolving into the noise.

The most fruitful paths may lie in abandoning the quest for precise trajectories altogether. Perhaps the goal isn’t to predict where the system will be, but to understand the shape of its possibilities, the basins of attraction, the fractal boundaries of order and disorder. This is not engineering; it’s a form of digital paleontology, reconstructing ghosts from the faintest of echoes.

Original article: https://arxiv.org/pdf/2604.21097.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Whispers of Chaos: The Challenge of Prediction

The Art of Representation: Learning Robust Features

Neural Operators: Approximating the System’s Logic

Taming the Chaos: Optimal Transport for Robustness

What Shadows Will Fall?

See also: