Predicting the Ocean’s Future with AI

Author: Denis Avetisyan


A new data-driven system harnesses the power of machine learning to forecast global ocean conditions with impressive accuracy and efficiency.

The FuXi-ONS framework generates ensemble forecasts by perturbing the current ocean state-using spatially correlated noise derived from a state-dependent Matérn formulation-and mapping the resulting perturbations, alongside encoded atmospheric conditions and auxiliary inputs, to predict subsequent ocean states, with multiple ensemble members created through resampling at each forecast step.
The FuXi-ONS framework generates ensemble forecasts by perturbing the current ocean state-using spatially correlated noise derived from a state-dependent Matérn formulation-and mapping the resulting perturbations, alongside encoded atmospheric conditions and auxiliary inputs, to predict subsequent ocean states, with multiple ensemble members created through resampling at each forecast step.

FuXi-ONS, a data-driven ensemble forecasting system, offers a computationally efficient alternative to traditional numerical methods for global ocean prediction and uncertainty quantification.

Despite advances in deterministic ocean forecasting, extending probabilistic prediction capabilities globally remains a significant challenge. Here, we present FuXi-ONS, a novel machine-learning ensemble system detailed in ‘Data-driven ensemble prediction of the global ocean’, designed to address this limitation by providing up to 365-day forecasts of key ocean variables on a global 1^\circ grid. FuXi-ONS achieves competitive skill with traditional numerical systems-including for sea-surface temperature and Niño3.4 variability-while offering orders-of-magnitude faster computation through learned physical perturbations and atmospheric encoding. Can this data-driven approach pave the way for efficient, probabilistic ocean forecasting and improved climate risk assessment?


Predicting Chaos: The Illusion of Oceanic Control

Oceanic systems present a formidable predictive challenge due to their inherent complexity and chaotic nature. Unlike simpler physical models, the ocean is governed by a multitude of interacting forces – wind, temperature, salinity, currents, and the Earth’s rotation – creating nonlinear dynamics that amplify small uncertainties into significant forecast errors. Traditional forecasting techniques, often relying on physics-based simulations, struggle to accurately represent these intricate interactions at all relevant scales, leading to diminished predictive skill. This is not merely an academic concern; reliable ocean forecasts are vital for managing fisheries, predicting marine ecosystem changes, optimizing shipping routes, and, crucially, informing global climate models that project future weather patterns and sea-level rise. The ocean’s chaotic behavior demands innovative approaches that move beyond deterministic predictions toward probabilistic, data-driven methods capable of acknowledging and quantifying the inherent uncertainties within this vast and dynamic system.

Oceanic forecasting routinely encounters limitations when attempting to model the full spectrum of natural variability, resulting in predictive shortcomings, especially for complex events like El Niño. Traditional methods, frequently relying on simplified physics or statistical extrapolation, often prove no more skillful than simply assuming the ocean will remain in its current state – a benchmark known as ‘persistence’. This indicates a fundamental difficulty in capturing the intricate, nonlinear dynamics that govern ocean behavior, hindering the development of robust long-term predictions essential for climate modeling and effective resource management. The inability to accurately represent these processes casts doubt on the reliability of forecasts extending beyond short timescales, highlighting the need for innovative approaches that can better account for the ocean’s inherent complexity.

Oceanic prediction is increasingly reliant on a shift from traditional forecasting techniques to sophisticated, data-driven ensemble methods. These approaches move beyond merely extrapolating past observations – a practice often inadequate for the ocean’s complex behavior – and instead generate multiple possible future scenarios. By leveraging vast datasets from satellites, buoys, and ocean models, these ensembles capture the inherent uncertainty in oceanic dynamics. Each member of the ensemble represents a plausible future state, allowing forecasters to assess the range of potential outcomes and assign probabilities to different events. This probabilistic approach is particularly vital for predicting phenomena like El Niño, where subtle initial conditions can dramatically alter long-term impacts, and ultimately improves the reliability of climate models and resource management strategies.

Across salinity, temperature, currents, and sea surface height, FuXi-ONS consistently demonstrates superior forecasting skill-yielding the lowest continuous ranked probability scores and root mean square errors, and the highest spread-skill ratio and anomaly correlation coefficient-compared to baseline models over lead times up to 360 days.
Across salinity, temperature, currents, and sea surface height, FuXi-ONS consistently demonstrates superior forecasting skill-yielding the lowest continuous ranked probability scores and root mean square errors, and the highest spread-skill ratio and anomaly correlation coefficient-compared to baseline models over lead times up to 360 days.

FuXi-ONS: Another Model in the Machine

FuXi-ONS is a data-driven ensemble forecasting system developed for global ocean prediction, focusing on variables crucial to climate and marine ecosystems, including sea-surface temperature and ocean currents. The system generates forecasts by combining multiple model runs – an ensemble – to provide a probabilistic prediction, quantifying the uncertainty inherent in ocean modeling. This data-driven approach contrasts with purely physics-based models by leveraging historical observations to learn complex relationships and improve predictive skill. The resulting forecasts cover the entire global ocean and are intended for a range of applications including seasonal climate prediction, marine resource management, and operational oceanography.

The FuXi-ONS system employs an autoregressive rollout scheme wherein predictions are generated iteratively. This process begins with an initial forecast based on current observations. Subsequently, that forecast becomes the input for predicting the next time step, and this cycle repeats to generate a time series of predictions. Effectively, each forecast relies on the preceding forecast as a component of its input, creating a sequential dependency and allowing the model to propagate information forward in time. This approach contrasts with single-step forecasting, enabling multi-step predictions without requiring external data for each future time step beyond the initial analysis period.

FuXi-ONS incorporates atmospheric forcing data from the ERA5 reanalysis dataset, providing realistic surface boundary conditions for ocean modeling. Crucially, the system relies on the GLORYS12 ocean reanalysis to generate a comprehensive historical record of ocean states; this dataset serves a dual purpose, functioning as both the training data for the FuXi-ONS model and an independent benchmark for validating forecast accuracy. GLORYS12 provides a high-resolution, globally-consistent dataset spanning multiple decades, enabling robust model calibration and performance assessment across various climate conditions and ocean basins.

The FuXi-ONS architecture utilizes Swin Transformer Blocks to address the computational demands of global ocean forecasting. These blocks employ a hierarchical transformer structure with shifted windows, enabling efficient processing of long-range dependencies in the spatiotemporal data. This shifted-window approach reduces computational complexity compared to traditional global attention mechanisms, scaling more effectively with increasing data resolution. Specifically, the local windows allow for linear computational complexity with respect to input size, while the shifting mechanism facilitates cross-window connections and global context awareness, crucial for accurate ocean state estimation and prediction.

FuXi-ONS consistently outperforms the North American Multi-Model Ensemble (NMME) in forecasting sea-surface temperature, as demonstrated by lower root mean squared error (<span class="katex-eq" data-katex-display="false">RMSE</span>), higher accuracy (<span class="katex-eq" data-katex-display="false">ACC</span>), lower continuous ranked probability score (<span class="katex-eq" data-katex-display="false">CRPS</span>), and lower spread-skill relationship (<span class="katex-eq" data-katex-display="false">SSR</span>) across various lead times based on 2021 initializations.
FuXi-ONS consistently outperforms the North American Multi-Model Ensemble (NMME) in forecasting sea-surface temperature, as demonstrated by lower root mean squared error (RMSE), higher accuracy (ACC), lower continuous ranked probability score (CRPS), and lower spread-skill relationship (SSR) across various lead times based on 2021 initializations.

Chasing Uncertainty: Ensemble Diversity & the Illusion of Control

FuXi-ONS addresses the inherent uncertainty in ocean forecasting by generating an ensemble of plausible future states through structured noise generation. This process does not rely on random perturbations; instead, it creates variations based on physically-informed noise models. The ensemble members represent different, yet plausible, evolutions of the ocean state, allowing for a probabilistic forecast rather than a single deterministic prediction. By explicitly representing uncertainty, the system provides a more complete picture of potential future conditions and enables better risk assessment and decision-making in ocean-dependent applications.

FuXi-ONS employs Stochastic Partial Differential Equation (SPDE)-based sampling to generate noise that perturbs forecasted ocean fields. This method ensures spatial correlation within the generated noise, meaning that nearby points in the forecast field are perturbed in a related manner, mirroring the natural coherence of oceanographic processes. Unlike methods generating independent noise at each grid point, SPDE-based sampling creates realistic perturbations by accounting for the underlying physics governing ocean dynamics, resulting in a more plausible ensemble of future states and improved probabilistic forecasting skill. The SPDE approach models the covariance structure of the ocean field, effectively capturing dependencies and preventing the creation of unrealistic or spatially disjointed noise patterns.

Atmospheric conditioning in FuXi-ONS involves the continuous integration of current atmospheric data into the ocean model during the forecast generation process. This is achieved by utilizing the latest observations of surface winds, air temperature, and precipitation to directly influence the ocean state calculations. By consistently accounting for atmospheric forcing, the model avoids relying solely on historical ocean conditions and dynamically adjusts its forecasts to reflect the prevailing and predicted atmospheric environment, thereby improving short-term forecast accuracy and responsiveness to real-world events.

Quantitative evaluation demonstrates FuXi-ONS achieves improved forecast accuracy as measured by Root Mean Squared Error (RMSE), exhibiting substantially lower values compared to a persistence forecasting method. Furthermore, FuXi-ONS attains competitive RMSE results when benchmarked against traditional numerical ocean models. Beyond point predictions, FuXi-ONS also provides superior probabilistic forecasts, consistently achieving lower Continuous Ranked Probability Score (CRPS) values than the FuXi-Aim-Perlin model, indicating a better-calibrated and more reliable representation of forecast uncertainty.

Fully Sharded Data Parallel (FSDP) was implemented to address the computational demands of training FuXi-ONS on extensive oceanographic datasets. FSDP operates by partitioning the model states – including parameters, gradients, and optimizer states – across multiple devices, reducing the memory footprint on each individual device. This allows for the utilization of larger batch sizes and more complex models that would otherwise be infeasible due to memory constraints. Communication between devices is optimized to minimize overhead, enabling efficient scaling of the training process across a cluster of GPUs or other parallel processing units. The implementation leverages techniques such as all-gather and reduce-scatter operations to synchronize model updates while maintaining data locality and minimizing communication bandwidth requirements.

FuXi-ONS demonstrates depth-dependent improvements in forecast accuracy-measured by CRPS, SSR, RMSE, and ACC-across forecast lead times for salinity, temperature, and zonal/meridional currents, normalizing gains against FuXi-Aim-Perlin (for CRPS/SSR) and FuXi-Aim (for RMSE/ACC).
FuXi-ONS demonstrates depth-dependent improvements in forecast accuracy-measured by CRPS, SSR, RMSE, and ACC-across forecast lead times for salinity, temperature, and zonal/meridional currents, normalizing gains against FuXi-Aim-Perlin (for CRPS/SSR) and FuXi-Aim (for RMSE/ACC).

Data Assimilation & the Promise of Delayed Information

FuXi-ONS distinguishes itself through a continuous refinement of predictive capabilities via data assimilation, a technique that marries model forecasts with real-time observational data. This isn’t a static prediction; instead, the system actively ingests information – such as sea surface temperature, salinity, and current velocities – as it becomes available. By systematically comparing the model’s predictions to these observations, any discrepancies are identified and used to adjust the model’s parameters, effectively ‘steering’ the forecast towards greater accuracy. This iterative process, repeated continuously, allows FuXi-ONS to move beyond simply predicting ocean behavior and instead offer a dynamically updated, observation-informed portrayal of the marine environment – a critical advantage over traditional, static modeling approaches.

FuXi-ONS distinguishes itself through a sophisticated integration of oceanic modeling and data assimilation, yielding demonstrably improved forecast accuracy compared to conventional techniques. The system doesn’t simply rely on pre-programmed physical laws; it continuously learns and adjusts its predictions by incorporating real-time observational data, allowing it to more effectively capture the intricate and often chaotic dynamics of the ocean. This data-driven refinement is reflected in consistently higher Accuracy (ACC) values, a key metric for evaluating forecast performance, indicating a superior ability to predict ocean states. By dynamically adjusting to observed conditions, FuXi-ONS minimizes the accumulation of errors inherent in long-range forecasts, providing a more reliable and nuanced understanding of ocean behavior than traditional methods.

FuXi-ONS establishes a robust framework for advancing climate modeling and sustainable marine resource management. The system’s capacity to accurately forecast ocean conditions up to a year in advance-demonstrated through rigorous testing-offers unprecedented opportunities for proactive planning. This extended predictive horizon allows for improved anticipation of events like harmful algal blooms, shifts in fish stocks, and the intensification of marine heatwaves. Consequently, stakeholders-from fisheries managers to coastal communities-can leverage these forecasts to make data-driven decisions, optimize resource allocation, and mitigate potential ecological and economic impacts, ultimately fostering greater resilience in the face of a changing ocean.

Ongoing development of FuXi-ONS prioritizes advancements in ensemble generation, aiming to enhance the robustness and reliability of its forecasts. Current research centers on optimizing the methods used to create multiple simulations, thereby better representing the inherent uncertainty within oceanographic predictions. Simultaneously, efforts are directed toward extending the system’s predictive horizon, with the goal of accurately forecasting ocean conditions well beyond the current 365-day limit. Successfully achieving these improvements will not only refine short-term predictions but also unlock the potential for skillful long-range forecasting, providing invaluable insights for proactive marine management and climate change adaptation strategies.

Forecasts from GLORY, NMME, IRI-D (dynamical models only), IRI-ALL (dynamical and statistical models), and FuXi-ONS (initialized every 5 days in February 2021) demonstrate varying predictions of the Niño3.4 index across overlapping 3-month seasons like FMA (February-April).
Forecasts from GLORY, NMME, IRI-D (dynamical models only), IRI-ALL (dynamical and statistical models), and FuXi-ONS (initialized every 5 days in February 2021) demonstrate varying predictions of the Niño3.4 index across overlapping 3-month seasons like FMA (February-April).

The pursuit of increasingly complex ocean prediction systems feels… familiar. This FuXi-ONS, with its data-driven ensemble approach, promises efficiency gains, sidestepping some of the computational burdens of traditional numerical models. It’s a pragmatic step, certainly. But one wonders if, in chasing ‘skill’ against established systems, it’s merely delaying the inevitable moment when production data reveals unforeseen limitations. Wilhelm Röntgen, a pioneer of seeing the unseen, observed, “I have been able to observe the effects of these rays on a variety of objects.” It’s a fitting sentiment; these models, like Röntgen’s rays, reveal patterns, but the true test lies in how those patterns hold up under real-world scrutiny. Better one functioning, understandable system than a hundred ‘optimized’ black boxes, even if the latter boasts marginally better scores on a benchmark.

What’s Next?

FuXi-ONS presents a familiar story: trade complexity for speed. It’s a pragmatic move, certainly. Numerical ocean prediction has always been a heroic effort against chaos, and this system acknowledges that ‘good enough’ is often more useful than ‘perfectly simulated.’ The claim of computational efficiency is, predictably, the real headline. Because if a system crashes consistently, at least it’s predictable – and cheaper to run before it does.

The inevitable question isn’t whether FuXi-ONS performs well today, but how quickly its predictive skill degrades as the ocean data landscape shifts. Machine learning models are, after all, elaborate extrapolations. They don’t understand the ocean; they memorize patterns. The field will likely pivot toward hybrid approaches-systems that attempt to inject some physical constraints into these data-driven frameworks. Though, experience suggests that’s just adding another layer of potential failure.

Ultimately, this research, like so much of ‘cloud-native’ innovation, feels less like progress and more like a re-packaging of existing problems. It’s a faster way to generate notes for digital archaeologists, a more efficient method for creating the datasets future generations will analyze to understand why the ocean behaved as it did. Perhaps they’ll find that all of this effort just moved the error elsewhere.


Original article: https://arxiv.org/pdf/2603.19591.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-24 07:05