Forging Financial Futures: A New Approach to Synthetic Time Series

Author: Denis Avetisyan

Researchers have developed a novel generative model that leverages graph neural networks and advanced mathematical techniques to create more realistic synthetic financial data.

A generative adversarial network architecture is proposed, establishing a framework wherein two neural networks-a generator and a discriminator-compete to refine the generation of synthetic data, ultimately achieving a Nash equilibrium defined by the minimax objective function: <span class="katex-eq" data-katex-display="false">min_G max_D V(D, G) = E_{x \sim p_{data}(x)}[log D(x)] + E_{z \sim p_z(z)}[log(1 - D(G(z)))] </span>. — A generative adversarial network architecture is proposed, establishing a framework wherein two neural networks-a generator and a discriminator-compete to refine the generation of synthetic data, ultimately achieving a Nash equilibrium defined by the minimax objective function: $min_G max_D V(D, G) = E_{x \sim p_{data}(x)}[log D(x)] + E_{z \sim p_z(z)}[log(1 - D(G(z)))]$ .

The Sig-Graph GAN combines Graph Neural Networks, LSTM networks, and signature calculus to capture complex geometric patterns and temporal dependencies in time series data.

Generating realistic synthetic financial time series remains challenging due to the non-stationary and complex dependencies inherent in market data. This paper introduces ‘A Generative Adversarial Graph Neural Network for Synthetic Time Series Data’, a novel Generative Adversarial Network (GAN) model-Sig-Graph GAN-that integrates Long Short-Term Memory networks, time-series signatures, and Graph Neural Networks to capture both temporal dynamics and geometric patterns. By leveraging visibility graphs to represent time-series data, Sig-Graph GAN demonstrably outperforms baseline methods in replicating the statistical properties of financial returns across multiple exchanges. Could this approach unlock new avenues for stress-testing, algorithmic trading strategy development, and data augmentation in financial modeling?

The Inherent Structure of Financial Time Series

Financial time series form the bedrock of modern financial analysis, meticulously documenting the progression of asset values – from stock prices and bond yields to currency exchange rates and commodity prices – over specific intervals. These series aren’t merely historical records; they are dynamic representations of collective investor sentiment, economic forces, and unforeseen events, offering a quantifiable lens through which to examine market behavior. The granular data captured within these time series enables professionals to build predictive models, assess risk, and formulate investment strategies. Beyond pricing data, financial time series also encompass trading volumes, interest rates, and macroeconomic indicators, creating a complex and interconnected web of information crucial for understanding the functioning of global financial markets.

Volatility clustering is a pervasive characteristic of financial time series, describing the tendency of large price changes to cluster together in time. This means periods of relatively stable prices are often followed by periods of increased turbulence, and vice versa – high volatility is likely to be followed by more high volatility, while low volatility tends to persist as well. This isn’t random noise; statistical analysis reveals a demonstrable correlation between current and past volatility. For example, observing a significant price swing today suggests a higher probability of another substantial move tomorrow, even if the direction is unpredictable. This phenomenon challenges the assumption of constant variance inherent in many traditional financial models, and its understanding is critical for accurate risk assessment and option pricing. The implications of volatility clustering are substantial, influencing trading strategies and the development of more sophisticated forecasting techniques.

Many conventional financial models are built upon the principle of stationarity – the idea that the statistical properties of a time series, such as its mean and variance, remain constant over time. However, real-world financial data demonstrably violates this assumption; asset prices exhibit trends, seasonality, and, crucially, changing levels of volatility. This non-stationarity introduces significant inaccuracies when applying these models, as forecasts and risk assessments are predicated on a stable statistical landscape that simply doesn’t exist in practice. Consequently, model outputs may underestimate or overestimate potential price swings, leading to flawed investment strategies and inadequate risk management. Researchers are increasingly focused on developing techniques – like differencing, transformations, and the use of models specifically designed for non-stationary data – to address this fundamental limitation and improve the reliability of financial predictions.

The Limitations of Traditional Financial Modeling

The Autoregressive Integrated Moving Average (ARIMA) model, a common time series forecasting method, fundamentally assumes data stationarity – meaning the statistical properties like mean and variance remain constant over time. Because financial time series data often exhibits trends or seasonality, violating this assumption, a pre-processing step involving differencing is typically required. Differencing calculates the difference between consecutive observations, effectively removing the trend component. The order of differencing needed – the number of times the differencing operation must be applied – is determined through analysis of the autocorrelation and partial autocorrelation functions. Failure to properly address non-stationarity prior to model fitting can lead to spurious regressions and unreliable forecasts.

The Black-Scholes Model, a cornerstone of modern financial theory, posits that asset prices follow a $Geometric Brownian Motion$ (GBM). This implies that price changes are continuous, normally distributed, and proportional to the current price level. However, real-world asset price dynamics often deviate from these assumptions. Empirical evidence demonstrates that financial time series frequently exhibit characteristics not captured by GBM, including volatility clustering, skewness, kurtosis, and jumps. These deviations can lead to mispricing of options and inaccuracies in risk management calculations, necessitating the development of more sophisticated models that account for these complexities.

Classical time series models, such as ARIMA and early iterations of the Black-Scholes model, frequently exhibit limitations when applied to financial data due to their underlying assumptions of linearity and short-range dependency. Financial time series often demonstrate non-linear relationships, where the effect of a change in one variable is not proportional to the change itself, and exhibit long-range correlations – meaning events distant in time can still be statistically related. These characteristics violate the core assumptions of these models, leading to an inability to accurately capture complex patterns and ultimately reducing the reliability of forecasts and predictions. The presence of these non-linear dependencies and long-range correlations introduces systematic errors that are not adequately addressed by linear models, resulting in diminished predictive power and potential inaccuracies in financial analysis.

The visibility graph algorithm successfully processes the Standard and Poor’s 500 closing price data from December 12, 2017, to May 7, 2018.

A Graph-Theoretic Representation of Financial Time Series

Visibility Graphs (VGs) offer a method for converting a one-dimensional time series into a network-based geometric representation. This transformation is achieved by representing each data point in the time series as a node within a two-dimensional space, with nodes connected if a straight line drawn between them does not intersect any other data points in the series. The resulting graph structure inherently encodes the temporal dependencies present in the original time series; the connectivity pattern reflects the sequential relationships between data points, with closer points in time more likely to be directly connected. This allows for the application of graph-based analytical techniques to time series data, providing a different perspective compared to traditional statistical methods. The geometric properties of the Visibility Graph, such as node degree and path lengths, can then be used as features for further analysis or modeling.

Signature Transformation is a mathematical technique used to encode time series data into a hierarchical representation capturing its geometric properties and temporal evolution. The process involves computing iterated line integrals of the time series path, resulting in a sequence of signatures that progressively summarize the path’s shape at different scales. These signatures are then concatenated to form a feature vector, providing a robust and informative descriptor less sensitive to noise and parameter variations than traditional methods. The resulting signature vector effectively captures both local and global characteristics of the time series, enabling accurate comparison and reconstruction of complex temporal patterns. $\text{Signature}(x(t)) = \in t_{0}^{t} \text{Path}(x(s)) ds$

The Sig-Graph GAN, integrating Visibility Graph transformation and Signature Transformation, generates synthetic financial time series data exhibiting improved realism and diversity. Performance evaluations, utilizing the Earth Mover’s Distance (EMD) and leverage effect as key metrics, consistently demonstrate the superiority of this approach when compared to baseline generative models. Specifically, testing across the IXIC, N225, and S&P 500 datasets reveals lower EMD scores and more accurately replicated leverage effects in the generated data, indicating a more faithful reproduction of statistical properties found in actual financial time series.

An ablation study of the Sig-Graph GAN, trained with either a Kullback-Leibler divergence (KLD) loss on Nikkei225 data or a mean squared error (MSE) loss on S&P500 data, demonstrates the impact of loss function and dataset choice on performance.

The pursuit of synthetic data, as demonstrated in this research, demands a level of algorithmic rigor often overlooked. The Sig-Graph GAN model, blending Graph Neural Networks with signature calculus, exemplifies this need for provable accuracy. As Vinton Cerf aptly stated, “Any sufficiently advanced technology is indistinguishable from magic,” but that ‘magic’ relies on a foundation of mathematical certainty. This work isn’t merely about generating data that appears realistic; it’s about capturing and replicating the underlying geometric and temporal dependencies inherent in financial time series – a correctness that surpasses simple empirical validation. The model’s architecture, therefore, prioritizes a demonstrable understanding of these dependencies, moving beyond superficial imitation to a provable representation of market dynamics.

Future Directions

The presented Sig-Graph GAN, while demonstrating a capacity for synthetic time series generation, merely scratches the surface of a fundamentally difficult problem. The reliance on LSTM networks, despite the integration of signature calculus, introduces a degree of approximation. The true geometric invariants of financial processes – if they exist beyond mere statistical mimicry – demand a representation that is provably robust to time’s distortions, not one approximated through recurrent layers. A formal specification of desirable properties for synthetic financial data-properties beyond simple distributional similarity-remains conspicuously absent.

Future work must address the limitations of relying on data-driven approximations of underlying dynamics. Rigorous exploration of alternative network architectures, perhaps those derived from differential geometry or optimal transport theory, is warranted. The current methodology implicitly assumes the generated data will conform to observed patterns; however, a more compelling approach would be to explicitly model the absence of certain patterns-to define, mathematically, what constitutes ‘unrealistic’ behavior and enforce it through the generative process.

Ultimately, the field requires a shift in perspective. The goal should not be to generate data that ‘looks’ realistic, but data that satisfies demonstrably correct mathematical principles, even if those principles diverge from observed market behavior. Only then can synthetic data truly serve as a reliable tool for stress-testing models and exploring the limits of financial theory.

Original article: https://arxiv.org/pdf/2605.22215.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inherent Structure of Financial Time Series

The Limitations of Traditional Financial Modeling

A Graph-Theoretic Representation of Financial Time Series

Future Directions

See also: