Decoding Time: A New Approach to Forecasting

Author: Denis Avetisyan

Researchers have developed a hierarchical forecasting model that captures complex patterns in time series data with unprecedented accuracy and interpretability.

PRISM leverages wavelet decomposition and a multiscale tree structure for robust time series forecasting across diverse applications.

Despite advances in time series forecasting, accurately capturing the complex interplay of global trends, local dynamics, and multiscale features remains a significant challenge. This paper introduces ‘PRISM: A hierarchical multiscale approach for time series forecasting’, a novel method employing a learnable tree-based partitioning of the signal coupled with time-frequency decomposition. By recursively revealing localized views and aggregating scale-specific features, PRISM effectively captures both broad patterns and fine-grained details, achieving state-of-the-art forecasting performance. Could this hierarchical framework unlock new possibilities for interpretable and robust time series analysis across diverse domains?

The Limits of Traditional Forecasting Approaches

Traditional time series analysis, heavily reliant on statistical models like Autoregressive Integrated Moving Average (ARIMA), frequently encounters difficulties when applied to genuine, uncurated datasets. These methods are built upon assumptions of stationarity – that the statistical properties of the series, such as mean and variance, remain constant over time – an expectation rarely met in dynamic real-world phenomena. Economic indicators, climate patterns, and even social media trends exhibit inherent non-stationarity, manifesting as trends, seasonality, or unpredictable shifts. Consequently, applying ARIMA directly often necessitates complex data transformations – differencing, detrending, and seasonal adjustment – to artificially induce stationarity. This pre-processing, while sometimes effective, can introduce distortions or discard valuable information, limiting the model’s capacity to accurately capture the underlying complexities and ultimately hindering reliable forecasting performance.

Traditional time series models frequently demand substantial data manipulation before analysis can even begin. This pre-processing often involves smoothing, detrending, and the careful selection of relevant features, a process that is both time-consuming and heavily reliant on domain expertise. The need for such meticulous preparation significantly hinders the adaptability of these models to new datasets or changing conditions, as each scenario may necessitate a unique pre-processing pipeline. Furthermore, the manual feature engineering component limits scalability; as the volume and complexity of data increase, the effort required to identify and create informative features grows exponentially, making it difficult to apply these models to large-scale forecasting problems efficiently. Consequently, the dependence on extensive pre-processing and feature engineering represents a critical bottleneck in the practical application of traditional time series analysis.

Initial enthusiasm surrounding deep learning for time series forecasting encountered obstacles with recurrent neural networks (RNNs). While theoretically capable of modeling sequential data, early RNN architectures suffered from the “vanishing gradient” problem, where gradients diminish exponentially over long sequences, hindering the learning of long-term dependencies. This meant the network struggled to retain information from earlier time steps when predicting future values. Furthermore, the inherent sequential nature of RNN processing limited their ability to parallelize computations, making them slow and computationally expensive for handling extensive datasets or real-time applications. These limitations spurred research into more sophisticated architectures, such as LSTMs and GRUs, designed to mitigate these challenges and unlock the full potential of deep learning for time series analysis.

Despite decades of refinement in forecasting techniques, reliably predicting outcomes far into the future remains elusive across numerous disciplines. From accurately anticipating fluctuations in financial markets and optimizing energy grid management, to modeling climate change and forecasting disease outbreaks, the inherent complexity of real-world systems poses substantial hurdles. Traditional statistical methods, while useful for short-term predictions, often fail to capture the nuanced, non-linear dynamics at play over extended periods. Even advanced machine learning models, despite their initial promise, frequently struggle with the long-range dependencies and evolving patterns that characterize complex time series, limiting their practical utility and necessitating ongoing research into more robust and adaptable forecasting strategies. This persistent challenge underscores the need for innovative approaches capable of navigating uncertainty and delivering actionable insights for long-term planning.

Unveiling Deep Learning and Multiscale Representations

Autoformer and ETSformer represent recent advancements in deep learning for time series forecasting, both utilizing attention mechanisms to weigh the importance of different time steps and improve prediction accuracy. These models specifically incorporate trend-seasonal decomposition, a statistical method that separates observed data into its constituent trend, seasonal, and residual components. By explicitly modeling these components, the models can more effectively capture underlying patterns and reduce forecast error. The attention mechanisms within these architectures allow the models to focus on the most relevant historical data when making predictions, especially crucial for long-range forecasting where distant time steps can still significantly influence future values. This combination of decomposition and attention enables these models to outperform traditional methods and earlier deep learning approaches on various time series datasets.

While recent deep learning models demonstrate proficiency in capturing long-range dependencies within time series data, their computational demands are significant, scaling with sequence length due to the nature of attention mechanisms. This computational expense limits their applicability to very long time series or large datasets. Furthermore, these models can exhibit reduced performance when confronted with highly complex, non-linear patterns or data containing substantial noise, requiring extensive training data and careful hyperparameter tuning to achieve optimal results. The ability to generalize to unseen, intricate patterns remains a challenge, particularly in scenarios where the underlying data-generating process is poorly understood or constantly evolving.

Multiscale representations in time series analysis involve decomposing the data into components analyzed at varying levels of granularity. This is typically achieved through techniques like wavelet transforms or similar signal processing methods, allowing the model to simultaneously examine both fine-grained, high-frequency fluctuations and coarse-grained, low-frequency trends. By representing the time series at multiple scales, the model gains the ability to isolate and interpret patterns that might be obscured when analyzing the data at a single resolution. This approach improves the model’s capacity to capture complex temporal dependencies and enhances its robustness to noise and variations present in the data.

Multiscale representation in deep learning time series models involves decomposing the input data into components analyzed at varying temporal resolutions. This allows the model to simultaneously process high-frequency fluctuations, representing short-term variability, and low-frequency components that define long-term trends. By extracting features at multiple scales, the model avoids being overwhelmed by either noise or broad patterns, improving its ability to discern underlying signals. Techniques employed include wavelet transforms, Fourier analysis, and learned decomposition layers within the neural network architecture, enabling the model to capture dependencies across different time horizons and ultimately enhance forecasting performance for complex time series data.

PRISM: A Hierarchical and Frequency-Based Approach to Forecasting

PRISM utilizes a hierarchical forecasting approach by simultaneously analyzing time and frequency domains to improve prediction accuracy. This is achieved through a decomposition of the time series into multiple levels, representing different temporal resolutions and frequency components. Rather than treating these domains separately, PRISM integrates information from both, allowing the model to capture complex dependencies and patterns that might be missed by traditional time series methods. This joint organization allows for a more comprehensive understanding of the underlying data generating process, enabling improved short- and long-term forecasting performance across various time series datasets.

PRISM constructs a multiscale representation of the time series through recursive partitioning and frequency decomposition. Recursive partitioning divides the time series into progressively smaller segments, enabling the capture of localized patterns at varying granularities. Simultaneously, frequency decomposition, utilizing techniques such as the Discrete Fourier Transform, analyzes the signal’s constituent frequencies to identify dominant periodicities and trends. These two processes are applied iteratively, creating a hierarchical structure where each level represents the time series at a different scale of both time and frequency. This multiscale representation allows PRISM to model complex temporal dependencies and capture information across a broad spectrum of frequencies, improving forecasting accuracy for time series exhibiting non-stationary behavior.

PRISM’s combined time and frequency analysis enables the simultaneous capture of time series characteristics at varying scales. Traditional time series forecasting often focuses on sequential dependencies, potentially missing long-range patterns or being sensitive to noise. Conversely, frequency-domain analysis identifies dominant cyclical behaviors but may lose temporal resolution. PRISM addresses these limitations by decomposing the time series into multiple frequency bands and then recursively partitioning these bands based on temporal characteristics. This process allows the model to identify and leverage both short-term, localized fluctuations – reflecting immediate changes – and long-term, global patterns – representing sustained trends or seasonality – within the data, leading to improved forecasting accuracy and robustness.

PRISM utilizes a binary tree structure to enable computationally efficient and scalable time series forecasting. This structure recursively partitions the input time series, allowing for parallel processing of subseries at each node. The depth of the tree controls the granularity of the decomposition, with deeper trees providing finer resolution but increased computational cost. By organizing the data in this hierarchical manner, PRISM reduces the overall complexity of the forecasting process from $O(n)$ to $O(log(n))$ for certain operations, where $n$ represents the length of the time series. This scalability is crucial for handling large datasets and real-time forecasting applications, as the computational burden grows logarithmically rather than linearly with the input size.

Optimizing Frequency Decomposition for Robust Forecasting

PRISM employs frequency decomposition as a core component of its time series analysis. This process involves breaking down the observed data into its constituent frequencies, effectively separating the signal into different cyclical patterns. By analyzing these frequencies, the model can identify dominant trends, seasonal variations, and recurring anomalies present within the data. The technique allows PRISM to move beyond simply observing raw values and instead understand the underlying dynamics driving the time series, leading to improved forecasting accuracy and robustness. The decomposition facilitates the isolation of key patterns which are then used as inputs for the predictive components of the model.

Alternatives to the Haar wavelet transform for frequency decomposition include the Fast Fourier Transform (FFT) and Exponential Moving Average (EMA); however, the Haar wavelet generally demonstrates superior performance in time series forecasting applications. The FFT, while efficient for stationary signals, can suffer from variance issues when applied to non-stationary data common in forecasting. EMA, while simple to implement, relies on a smoothing factor that impacts responsiveness and can lag behind rapid changes. The Haar wavelet, based on discrete wavelet transforms, provides both time and frequency localization, allowing for efficient signal representation and reconstruction with reduced computational cost and improved accuracy, particularly in capturing abrupt changes and transient patterns within the time series data.

The effectiveness of frequency decomposition in time series forecasting is heavily dependent on parameter selection, specifically the decomposition level and wavelet choice. A decomposition level that is too low may fail to capture subtle, high-frequency patterns indicative of important shifts, while an excessively high level can introduce artifacts and over-decompose the signal, increasing computational cost without improving forecast accuracy. Similarly, the wavelet selected-such as Daubechies, Symlets, or Coiflets-impacts the granularity and representation of frequency components; the optimal choice is data-dependent and requires consideration of the signal’s characteristics, such as stationarity and the presence of abrupt changes. Incorrect parameter settings can lead to the inclusion of noise or the exclusion of relevant signals, ultimately degrading the model’s predictive performance.

Frequency decomposition, when applied effectively, enhances forecasting accuracy by separating time series data into constituent frequency components. This process enables the identification and isolation of signal – the predictable patterns driving the series – from noise, which represents random fluctuations or irrelevant variations. By focusing on the dominant, lower-frequency components that represent long-term trends and seasonality, the model minimizes the impact of high-frequency noise. This filtering effect improves the model’s ability to generalize from historical data and make more reliable predictions, particularly in scenarios with substantial data irregularities or measurement errors. The resulting model is less susceptible to overfitting to spurious data points and more responsive to the true underlying dynamics of the time series.

PRISM’s Broad Applicability and Future Directions

Evaluations reveal PRISM’s substantial advancement in time series forecasting, consistently surpassing the accuracy of well-established methods. Across a comprehensive suite of benchmark datasets, PRISM demonstrably outperforms traditional statistical models such as ARIMA, alongside more recent data-driven approaches like DLinear and N-HiTS. This superior performance isn’t limited to simple forecasting scenarios; PRISM maintains its edge even when compared against sophisticated deep learning architectures, indicating a robust and adaptable methodology. The consistent outperformance across diverse datasets highlights PRISM’s potential to address a broad spectrum of forecasting challenges and sets a new standard for accuracy in the field.

PRISM’s versatility extends beyond simple statistical models, successfully integrating with sophisticated deep learning architectures such as Autoformer, N-BEATS, and ETSformer. This adaptability highlights the method’s capacity to enhance the performance of existing, complex forecasting systems rather than requiring a complete overhaul of established workflows. By functioning as a complementary component, PRISM allows researchers and practitioners to leverage the strengths of deep learning while simultaneously benefiting from its unique frequency-domain insights. This synergistic potential suggests that PRISM isn’t merely a standalone forecasting technique, but a broadly applicable enhancement tool capable of elevating the predictive power of a diverse range of time series models.

Evaluations across a comprehensive suite of time series datasets reveal PRISM’s consistent and reliable predictive power. In a significant majority – 17 out of 32 distinct evaluation settings – the method demonstrably outperformed all benchmark models, establishing it as a leading approach in the field. This success wasn’t limited to specific data characteristics or forecast lengths; PRISM exhibited robust performance across diverse datasets and varying forecast horizons, indicating its adaptability and generalizability. The consistent achievement of top results underscores PRISM’s potential to deliver accurate predictions in a wide range of real-world applications, from economic forecasting to resource management.

Evaluations utilizing the GIFT (General Industry Forecasting Timeseries) dataset reveal PRISM’s substantial predictive power, consistently achieving the lowest Mean Squared Error (MSE) across 61 distinct datasets and the lowest Mean Absolute Error (MAE) on 52 datasets. This performance indicates PRISM’s ability to accurately model a wide range of temporal patterns present in real-world industrial data, surpassing the accuracy of competing forecasting methods within this benchmark. The breadth of datasets where PRISM attains the best results underscores its robustness and generalizability, positioning it as a leading technique for diverse time series forecasting applications.

Rigorous ablation studies demonstrate the efficacy of PRISM by systematically evaluating the contribution of its core components relative to the D-PAD model. These analyses reveal a consistent performance lift, with PRISM achieving improvements ranging from 3 to 10% across various evaluation metrics and datasets. This substantial gain highlights the benefits of PRISM’s hierarchical and frequency-based approach to time series decomposition, indicating that the method effectively captures and leverages underlying patterns in the data that D-PAD may overlook. The observed performance difference underscores PRISM’s potential as a more robust and accurate forecasting technique, capable of delivering meaningful improvements in real-world applications.

Ongoing development of PRISM prioritizes both computational efficiency and expanded forecasting capabilities. Current research aims to streamline the model’s architecture, reducing processing time and resource demands without sacrificing accuracy-a crucial step for real-time applications and deployment on resource-constrained devices. Simultaneously, investigations are underway to extend PRISM’s applicability beyond univariate time series to the more complex realm of multivariate forecasting, where predicting a variable requires considering the influence of multiple interconnected time-dependent factors. This expansion involves adapting the hierarchical frequency-based approach to effectively capture inter-variable dependencies and improve predictive power in scenarios involving numerous interacting time series, potentially unlocking applications in areas like financial modeling, climate prediction, and complex systems analysis.

The success of PRISM highlights the potential of combining hierarchical decomposition with frequency-domain analysis for time series forecasting. Traditional methods often struggle to capture the complex, multi-scale patterns inherent in real-world data; PRISM addresses this by dissecting the series into hierarchical components and then analyzing these at different frequencies. This allows the model to isolate and model both long-term trends and short-term fluctuations more effectively than approaches that treat the time series as a monolithic entity. The demonstrated improvements over established benchmarks suggest this combined strategy is not merely incremental, but represents a significant step towards more accurate and robust time series analysis, opening avenues for future research into adaptable, frequency-aware forecasting models across diverse application domains.

The presented work on PRISM underscores a fundamental tenet of systems design: understanding the whole is paramount. The model’s hierarchical decomposition, combined with wavelet transforms, mirrors an organism’s structure-analyzing time series at multiple scales to discern underlying patterns. This approach echoes Bertrand Russell’s observation: “To be happy, one must be able to give up the idea of permanence.” PRISM doesn’t seek a singular, fixed prediction, but rather adapts to the temporal and spectral variations within the data. It acknowledges the inherent impermanence of time series and constructs a flexible, interpretable framework. Good architecture is invisible until it breaks, and only then is the true cost of decisions visible.

Beyond the Horizon

The introduction of PRISM represents not so much a destination, but a clearing within a complex forest. The model skillfully decomposes time series, revealing patterns at multiple scales, yet this very success highlights the enduring challenge: truly understanding the generative processes behind those patterns. One can meticulously map the tributaries of a river, but that tells little of the mountain from which it springs. Future work must address the limitations inherent in any purely data-driven approach; models built on elegant structure alone will inevitably encounter phenomena born of chaotic or genuinely novel systems.

A natural extension lies in integrating PRISM’s hierarchical framework with causal discovery methods. Currently, the model excels at describing temporal dependencies, but struggles to articulate why those dependencies exist. Just as one cannot repair a failing engine by simply observing its vibrations, forecasting demands an understanding of the underlying mechanisms. Furthermore, the computational cost of wavelet transforms, while manageable, scales with data volume; a truly universal model will require algorithmic efficiencies that do not compromise accuracy.

Ultimately, the field should move beyond benchmarking on curated datasets. The true test of any forecasting system is its robustness in the face of unforeseen events-the black swans that routinely invalidate even the most sophisticated predictions. PRISM offers a promising architectural foundation; the next step is to subject it to the relentless pressure of real-world complexity, and to refine its structure not merely for performance, but for resilience.

Original article: https://arxiv.org/pdf/2512.24898.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/