Rewinding Time: Synthesizing Data for Causal Time Series Models

Author: Denis Avetisyan

A new framework empowers researchers to generate realistic time series data, including interventional data, for training more robust causal models.

The study demonstrates that a causal time prior effectively generates a diverse range of graphical structures, intervention types, and effect magnitudes across a substantial sample of time series causal models.

This work introduces CausalTimePrior, a method for generating synthetic time series data using temporal structural causal models to improve causal inference.

Despite recent advances in causal inference, extending foundation models to time series data requires overcoming the lack of synthetic datasets with interventional targets. This paper, ‘Interventional Time Series Priors for Causal Foundation Models’, addresses this limitation by introducing CausalTimePrior, a framework for generating realistic temporal structural causal models with paired observational and interventional time series data. Our approach supports complex dynamics-including nonlinearities, regime switching, and diverse intervention types-enabling the training of robust causal foundation models. Will this principled generation of interventional time series data unlock a new era of reliable causal discovery and forecasting?

The Elusive Nature of Causal Stability

Many established techniques for determining cause and effect presume a level of stability in relationships that rarely exists in practice. Traditional methods, designed for relatively static systems, frequently falter when confronted with the fluctuating dynamics of real-world phenomena – where the influence of one variable on another isn’t constant, but shifts over time. This poses a significant challenge across diverse fields, from predicting economic trends and understanding disease progression to optimizing marketing strategies and modeling climate change. Because these approaches often fail to account for temporal causality – how effects can change as conditions evolve – their predictive power diminishes, and interventions based on their conclusions may prove ineffective or even counterproductive. Consequently, a need exists for more sophisticated tools capable of discerning and quantifying how causal links themselves evolve, rather than assuming they remain fixed.

The ability to discern how causal relationships evolve is paramount when dealing with complex systems, as static analyses often fail to capture the full scope of influence. Reliable prediction hinges on understanding not just that a cause affects an outcome, but how that effect changes over time-a subtly different question with profound implications. Interventions, similarly, require dynamic causal models to anticipate unintended consequences; a strategy effective at one point may prove ineffective, or even detrimental, as the system adapts. Therefore, accurately mapping these temporal shifts in causal influence is not merely an academic exercise, but a practical necessity for informed decision-making in fields ranging from epidemiology and climate science to economics and engineering, ensuring interventions are targeted and predictions hold true amidst constant change.

Many current causal inference techniques operate under constraints that frequently misrepresent the intricacies of real-world systems. These methods often presume stable relationships between variables, or assume effects are uniformly distributed across populations and time – simplifications that introduce bias when dealing with dynamic processes. For instance, a model assessing the impact of an educational intervention might assume a consistent effect on all students, failing to account for individual learning rates or changes in the educational environment. This reliance on static assumptions can lead to inaccurate estimations of causal effects, potentially guiding flawed policy decisions or ineffective interventions. Consequently, researchers are increasingly recognizing the necessity for more nuanced approaches that can accommodate the inherent complexity and temporal variability present in most causal systems, moving beyond methods that prioritize mathematical convenience at the expense of ecological validity.

The advancement of causal reasoning hinges on developing frameworks that move beyond static relationships and embrace the inherent temporal dependencies within complex systems. Current methodologies frequently falter when applied to scenarios where cause and effect are not fixed, but evolve over time – a limitation that severely restricts their utility in fields like economics, epidemiology, and climate science. A robust framework would not simply identify whether a causal link exists, but also how that link changes – its strength, direction, and even its very presence – across different points in time. This necessitates novel approaches to data analysis and model building, potentially leveraging techniques from time series analysis, recurrent neural networks, and dynamic Bayesian networks to capture the nuances of causal effects that unfold dynamically. Successfully modeling these temporal dependencies promises to unlock more accurate predictions, more effective interventions, and a deeper understanding of the intricate mechanisms governing real-world phenomena.

The divergence between observational (blue) and interventional (red) time series trajectories within the shaded intervention period reveals the causal effect of the applied intervention.

Constructing Temporal Causal Maps

CausalTimePrior is a framework for generating Temporal Structural Causal Models (TSCMs) using both observational and interventional time series data. The process involves sampling potential TSCMs, effectively creating probabilistic models of dynamic causal relationships. By combining passively observed data with data generated through controlled experiments – interventions – the framework aims to more accurately identify causal links that may change over time. This pairing of data types allows CausalTimePrior to move beyond static causal models and explore the evolution of causal structures within a time-dependent system, providing a more comprehensive understanding of the underlying generative process.

CausalTimePrior employs a prior distribution over discrete-time dynamic Structural Causal Models (SCMs) to probabilistically generate a range of possible TSCM structures and parameter values. This prior is crucial for addressing the underdetermined nature of causal discovery from observational data; by defining a probability distribution over potential models, the framework facilitates sampling diverse TSCMs consistent with the observed time series. The prior encompasses both the graph structure – defining the causal relationships between variables – and the parameters governing the functional relationships. This allows CausalTimePrior to explore a space of plausible causal models, enabling more robust causal inference, particularly when combined with interventional data. The probabilistic formulation also allows for the quantification of uncertainty in the learned causal structure and parameters.

CausalTimePrior addresses the limitations of static causal discovery by incorporating temporal dependencies into the model. Traditional Structural Causal Models (SCMs) assume constant relationships, whereas CausalTimePrior explicitly models how causal effects can change over discrete time steps. This is achieved through the construction of Temporal Structural Causal Models (TSCMs) which represent causal relationships as functions of variables at prior time steps $t-1$ , enabling the identification of time-varying causal effects. The framework allows for the discovery of causal links where the direction or strength of the relationship is not fixed, but rather evolves as a function of time, providing a more nuanced understanding of dynamic systems. This capability is crucial for applications involving non-stationary processes where relationships are susceptible to change.

CausalTimePrior accommodates multiple intervention types to facilitate nuanced causal analysis of time series data. Hard interventions represent complete manipulation of a variable’s value, effectively severing its natural causal influences. Soft interventions, conversely, model probabilistic alterations to a variable, maintaining a degree of its original causal structure while introducing a targeted influence. Furthermore, the framework supports time-varying interventions, allowing the intervention strategy itself to change over time, which is critical for modeling real-world scenarios where interventions are not static. This multi-faceted approach to intervention modeling enhances the framework’s ability to identify causal effects under diverse conditions and provides greater flexibility in simulating and analyzing complex systems.

In a sampled 6-variable time series causal model, a hard intervention on one variable (highlighted in yellow) alters the values of causally affected downstream variables while leaving unrelated variables unaffected, demonstrating the propagation of causal effects through the graph structure.

Synthesizing Data for Causal Foundation Models

CausalTimePrior generates synthetic time-series data specifically designed to train and evaluate `Foundation Model`s focused on causal reasoning. This data is not simply random; it’s constructed with known causal relationships embedded within it, allowing models to learn the underlying mechanisms governing the data generation process. By training on this synthetically generated data, the models develop the ability to infer causal effects and generalize to unseen scenarios, a capability crucial for reliable predictions and interventions in complex systems. The process allows for controlled experimentation and benchmarking of causal reasoning abilities, independent of the limitations of observational data which often suffers from confounding variables and limited coverage of possible interventions.

The model architecture utilized in our initial experiments is a Gated Recurrent Unit (GRU) Encoder. GRUs are a type of recurrent neural network specifically designed to process sequential data by maintaining a hidden state that captures information about past inputs. This architecture is particularly well-suited for capturing temporal dependencies present in the synthetically generated time series data produced by CausalTimePrior. The GRU’s gating mechanism allows it to selectively update and forget information in the hidden state, effectively learning long-range dependencies without the vanishing gradient problem often encountered in traditional recurrent neural networks. The encoder transforms the time series input into a fixed-length vector representation which is then used for downstream causal effect estimation tasks.

Evaluations of models trained using CausalTimePrior demonstrate improved performance in causal discovery tasks when contrasted with traditional methodologies. Specifically, a proof-of-concept model utilizing a Predictive Feature Network (PFN) achieved a Root Mean Squared Error (RMSE) of 176.4. This result is comparable to the RMSE of 176.5 obtained using a Vector Autoregression – Ordinary Least Squares (VAR-OLS) baseline, indicating successful in-context causal effect estimation. The comparability of RMSE values suggests that the CausalTimePrior-trained model can estimate causal effects with a similar degree of accuracy to established statistical methods, while benefitting from the synthetically generated data’s temporal richness.

Evaluation of the model demonstrates a prediction accuracy of 0.95, expressed as the ratio of predicted to ground truth values, for queries involving interventions. Furthermore, in scenarios characterized by high spurious correlation – specifically, when the absolute value of the correlation coefficient ρ exceeds 0.3 – the model achieves a 45% reduction in prediction error compared to a Vector Autoregression – Ordinary Least Squares (VAR-OLS) baseline. These results suggest that training on synthetically generated, temporally-rich data effectively mitigates the impact of confounding variables and improves the accuracy of causal effect estimation under complex conditions.

Adapting to Non-Stationary Systems

Many systems encountered in the real world aren’t static; their causal relationships aren’t fixed but rather shift and evolve over time. This non-stationarity arises from a multitude of sources, including external influences – such as changes in economic policy or seasonal variations – and internal dynamics, like evolving user behavior or technological advancements. Consider, for instance, a marketing campaign’s effectiveness; its impact isn’t consistent indefinitely, as consumer preferences and market conditions change. Similarly, in climate modeling, the relationships between greenhouse gas emissions and temperature aren’t constant, influenced by feedback loops and complex atmospheric processes. Understanding and accurately modeling these time-varying causalities is crucial for reliable prediction and effective intervention in diverse fields, requiring methodologies that move beyond the assumption of a stable underlying structure.

The challenge of modeling systems that change over time is addressed through an extension of the CausalTimePrior framework. This advancement incorporates Regime-Switching Temporal Causal Models (TSCMs), which utilize the principles of Markov Switching Models to represent shifts in underlying causal relationships. Rather than assuming a static causal structure, this approach allows the model to identify and adapt to distinct ‘regimes’ – periods where different causal mechanisms dominate. By probabilistically switching between these regimes, the model captures the non-stationary behavior inherent in many real-world systems, offering a more nuanced and accurate representation of temporal dependencies and improving predictive capabilities where causal relationships are not constant.

Many dynamic systems aren’t static; their causal relationships aren’t fixed but rather shift over time due to both external influences and internal changes. This research addresses this complexity by enabling the modeling of systems where the very rules governing cause and effect aren’t constant. Instead of assuming a fixed causal structure, the approach incorporates a probabilistic framework, allowing the underlying mechanisms to evolve. This means the system’s behavior isn’t dictated by a single, unchanging model, but by a distribution of possible models, each with an associated probability. Consequently, the model can adapt to changing circumstances, capturing the nuances of non-stationary environments and offering a more realistic representation of real-world phenomena where causality itself is fluid and dynamic.

Evaluations demonstrate a significant performance increase when incorporating regime-switching capabilities into causal modeling. A mixed intervention model, designed to account for shifts in underlying causal mechanisms, achieved an Effect Size Correlation of 0.821. This result represents a substantial improvement over the 0.691 correlation observed with a traditional, “hard-only” intervention model – one that assumes static causal relationships. The enhanced correlation underscores the model’s ability to more accurately capture and predict outcomes in non-stationary systems, where causal effects are not fixed but rather evolve probabilistically over time, providing a more nuanced and reliable analysis.

The pursuit of robust causal foundation models, as detailed in the article, demands a relentless simplification of complex temporal dependencies. The framework, CausalTimePrior, actively seeks to distill essential relationships within time series data, focusing on interventional mechanisms to achieve clarity. This resonates with David Hilbert’s assertion: “One must be able to say at any moment what one knows and what one does not know.” The paper embodies this principle by explicitly modeling causal structures, thereby delineating what is known about the data-generating process and, crucially, acknowledging the gaps addressed through synthetic data generation. The focus isn’t merely on capturing patterns but on understanding why those patterns exist, a move towards a more transparent and interpretable foundation.

Where Do We Go From Here?

The current work addresses a practical need-synthetic time series data imbued with causal structure-but does not, thankfully, present it as a final solution. The elegance of CausalTimePrior lies in its explicit acknowledgement of intervention, yet the true complexity of real-world temporal systems resists such neat encapsulation. Future effort must confront the inherent limitations of structural causal model assumptions when applied to high-dimensional, non-stationary time series. Simply generating data consistent with a specified SCM is not the same as capturing the underlying generative process, a distinction often blurred by enthusiasm for synthetic data.

A critical path forward involves relaxing the assumptions of stationarity and linearity. Time, after all, rarely behaves so predictably. Exploring techniques that allow for dynamically evolving causal relationships-perhaps borrowing from the literature on regime switching or Kalman filtering-would be a natural progression. Furthermore, the evaluation metrics used to assess the fidelity of synthetic data remain surprisingly rudimentary. Metrics beyond simple statistical similarity are needed-metrics that directly probe the ability of downstream causal inference algorithms to recover the true causal graph.

Ultimately, the pursuit of causal foundation models for time series analysis should not be driven by the desire for ever-larger datasets, but by a commitment to parsimony. The goal is not to simulate reality, but to understand it. Code, as it should be, must be as self-evident as gravity, and intuition remains the best compiler.

Original article: https://arxiv.org/pdf/2603.11090.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Elusive Nature of Causal Stability

Constructing Temporal Causal Maps

Synthesizing Data for Causal Foundation Models

Adapting to Non-Stationary Systems

Where Do We Go From Here?

See also: