Mapping the Waves: Forecasting Trends in Epidemics and Human Behavior

Author: Denis Avetisyan

A new graph-based forecasting framework leverages network structures and trend similarities to predict the spread of diseases, beliefs, and behaviors with improved accuracy and insight.

The TrendGNN pipeline constructs a block-diagonal matrix-comprising $N_{state}$ similarity matrices computed via Dynamic Time Warping and smoothing-to represent temporal relationships within multivariate time series data, ultimately transforming an input of size $(N_{state} \times N_{signal}) \times \text{window}$ into a condensed square matrix of size $(N_{signal} \times N_{state})^2$ using a four-week window for analysis.

This review details TrendGNN, a system employing GraphSAGE and trend similarity graphs for interpretable time series forecasting in complex networked systems.

Predicting the complex dynamics of epidemics, beliefs, and behaviors requires moving beyond simplistic models that lack mechanistic insight. This paper introduces TrendGNN: Towards Understanding of Epidemics, Beliefs, and Behaviors, a novel graph-based forecasting framework leveraging trend similarity to construct interpretable relationships between interdependent signals. By applying graph neural networks, we demonstrate improved predictive accuracy alongside the ability to identify key drivers of change. Could this approach pave the way for more robust and interpretable simulation models capable of anticipating the impact of interventions on complex socio-epidemiological systems?

Beyond Prediction: Understanding the Signals of Epidemic Change

Current epidemic forecasting approaches, as demonstrated by initiatives like Forecast Hubs, frequently concentrate on predicting aggregate outcomes – such as hospitalization rates or case numbers – while often overlooking the underlying behavioral factors driving transmission. This emphasis on statistical prediction, though valuable, can limit understanding of why an epidemic is unfolding as it is. The models frequently treat populations as homogenous entities, failing to account for how changes in public behavior – influenced by factors like mask-wearing, social distancing, or vaccine hesitancy – directly impact disease spread. Consequently, these forecasts may struggle to anticipate shifts caused by evolving public response, potentially leading to inaccurate predictions when behavioral patterns change unexpectedly. A deeper integration of behavioral science into forecasting models is therefore crucial for generating more robust and actionable insights.

The precision of epidemic forecasting hinges on a holistic understanding of the factors influencing disease spread, extending beyond simple case counts. Accurate predictions demand the integration of diverse COVID-19 signals – data encompassing public health metrics like hospitalizations and mortality, but crucially also incorporating behavioral indicators such as mobility patterns and mask-wearing rates. Demographic information, including age distribution and socioeconomic status, further refines the picture, as does comprehensive testing data that reveals the true scope of infection. By synthesizing these varied signals, models can begin to decipher the evolving public response to the pandemic – how interventions are adopted, how behavior changes with perceived risk, and ultimately, how the virus spreads through the population. This nuanced approach moves beyond predicting what will happen, to understanding why, and allows for more adaptable and reliable forecasts.

Conventional time series models, while useful for predicting trends based on past data, encountered significant limitations when applied to the COVID-19 pandemic due to its intricate web of interacting factors. These models typically assume data points are independent or related in a simple, linear fashion, failing to account for the dynamic interplay between public health interventions, behavioral shifts, demographic vulnerabilities, and testing rates. The pandemic demonstrated that increases in testing, for instance, didn’t just reflect more cases, but also altered public perception and potentially prompted behavioral changes-a feedback loop largely ignored by traditional approaches. Similarly, the impact of lockdowns wasn’t solely determined by the duration, but by adherence, which itself was influenced by factors like economic hardship and trust in authorities. Consequently, these models often struggled to accurately forecast infection rates, hospitalizations, and deaths, highlighting the need for methods capable of representing and leveraging these complex interdependencies.

Predictive accuracy during a pandemic hinges not simply on tracking individual data streams, but on understanding the intricate web of connections between them. Recent research demonstrates that a static analysis of health metrics, mobility data, or testing rates provides an incomplete picture; instead, a dynamic modeling approach is crucial. This involves techniques capable of identifying and quantifying how shifts in one signal – for example, increased mask usage reflected in behavioral data – influence others, like the rate of transmission observed in health statistics. These interconnected relationships aren’t fixed; they evolve over time, necessitating a forecasting method that continuously learns and adapts its representation of these dependencies. Consequently, a system that can dynamically represent and leverage these inter-signal relationships offers a significantly improved capacity to anticipate future trends compared to traditional, siloed forecasting techniques.

Mean absolute error distributions reveal performance differences between models in forecasting up to four weeks ahead.

Modeling Interdependence: A Graph-Based Approach to Forecasting

GraphSAGE, a graph neural network architecture, was implemented to model interdependencies observed between various COVID-19 Signals. Unlike traditional neural networks designed for grid-like data, GraphSAGE operates directly on graph structures, enabling it to learn representations of nodes – in this case, individual signals – based on the features of their connected neighbors. This approach allows for inductive learning, meaning the model can generalize to unseen signals and their relationships without retraining. Specifically, GraphSAGE aggregates feature information from a node’s local neighborhood using learnable functions, effectively capturing complex dependencies that extend beyond simple pairwise correlations. The model’s architecture facilitates the learning of signal embeddings that encode both individual signal characteristics and their contextual relationships within the broader network of COVID-19 indicators.

Trend Similarity Graphs are constructed to represent the relationships between COVID-19 Signals by analyzing their temporal evolution. These graphs utilize signals as nodes, with edge weights determined by the similarity of their respective trends over time. The resulting graph structure enables the identification of dependencies where signals do not necessarily need to be directly correlated at a given time point, but rather exhibit similar patterns of change. This approach moves beyond traditional correlation-based methods by focusing on the shape of the time series, allowing for the detection of leading or lagging relationships between signals and capturing more complex interdependencies within the dataset.

Graph construction utilized two primary methodologies to establish relationships between COVID-19 Signals. Dynamic Time Warping plus Spectral Clustering (DTW+S) was implemented to assess signal similarity based on overall shape, accommodating time-series with varying speeds or temporal distortions. This involved calculating the DTW distance between signal pairs and subsequently applying spectral clustering to group signals with low distances. Complementarily, Lagged Correlation analysis identified dependencies where one signal predictably follows another with a defined time delay; correlation was calculated across a range of time lags to determine the maximum correlation value and associated delay, indicating potential causal relationships or predictive power between signals. Both methods generated edge weights representing the strength of the relationship, which were then used to construct the Trend Similarity Graph.

The construction of Trend Similarity Graphs enables GraphSAGE to perform message passing between interconnected COVID-19 Signals, effectively aggregating feature information from neighboring nodes within the graph. This aggregation process extends beyond simple Pearson correlations by considering the structural relationships defined in the graph, capturing dependencies based on time series shape similarity or lagged correlations. By analyzing these local neighborhoods, GraphSAGE identifies complex, non-linear relationships that are not readily apparent through traditional statistical methods, allowing for a more nuanced understanding of inter-signal dependencies and improved predictive performance. The aggregated information represents a contextualized feature embedding for each signal, reflecting its relationship to other signals in the network.

A 3D visualization of signal graphs, categorized by indicator type, reveals key connections and highlights central signals to demonstrate relationships within each category.

Enhancing Forecast Accuracy: Robustness and Scalability

The Rolling Window Strategy enhances model robustness by simulating real-world forecasting conditions through continuous data updates. Instead of training on a static dataset, the model is retrained iteratively as new data becomes available, using a fixed-size window of historical data for both training and testing. This approach allows the model to adapt to evolving patterns and trends, mitigating the impact of non-stationarity and improving its ability to generalize to future observations. Specifically, the window slides forward in time, discarding the oldest data point and incorporating the newest, thereby maintaining a consistent training and testing framework across time steps and ensuring the model remains current with the latest epidemiological information.

The utilization of a block-diagonal matrix structure provides a computationally efficient method for representing aggregated similarity matrices in multi-state epidemic forecasting. Traditional similarity matrices, which grow quadratically with the number of states, become intractable for large-scale systems. By partitioning states into geographically or epidemiologically relevant blocks and computing similarities only within and between these blocks, the overall matrix can be represented as a block-diagonal form. This structure reduces computational complexity from $O(n^2)$ to $O(k^2 + n – k)$, where $n$ is the total number of states and $k$ is the average block size, significantly accelerating computations such as graph construction and spectral analysis required for forecasting across numerous geographic locations or demographic groups. This enables scalable application of graph-based forecasting methods to large-scale epidemiological systems.

Evaluation of the GraphSAGE model, trained on dynamically constructed graphs, indicates superior performance compared to traditional time series forecasting methods. Specifically, GraphSAGE demonstrably reduces the Mean Absolute Error (MAE) across forecast horizons of 2 to 4 weeks. Comparative analysis reveals that GraphSAGE consistently outperforms both Autoregressive Integrated Moving Average (ARIMA) models and Transformer architectures when assessed using the MAE metric. This improvement signifies the efficacy of the graph-based approach in capturing temporal dependencies for enhanced epidemic forecasting accuracy.

Evaluations demonstrate that graph-based forecasting models utilizing Dynamic Time Warping with Smoothing (DTW+S) and lagged correlation for graph construction consistently achieve the highest and second-highest forecasting accuracy, surpassing both randomly generated graphs and fully connected graphs. This performance indicates the efficacy of these methods in capturing complex, time-varying dependencies present in epidemic data. Specifically, these graph structures effectively model relationships between states that account for temporal shifts and similarities in disease progression, leading to improved forecasts compared to methods that do not explicitly incorporate these dynamics.

Across one- to four-week-ahead forecasting, model performance varied by signal category, with positive relative improvements indicating gains and negative values representing performance degradation compared to the baseline.

Illuminating the Drivers: Interpretable Forecasting and its Impact

To discern the factors driving forecast accuracy, researchers employed CF-GNNExplainer, a technique designed to illuminate the reasoning behind complex machine learning models. This method meticulously analyzes GraphSAGE – a graph neural network used for forecasting – to pinpoint the specific COVID-19 signals that exert the most substantial influence on its predictions. Rather than treating the model as a ‘black box’, CF-GNNExplainer deconstructs its decision-making process, revealing which combinations of health indicators, behavioral data, and demographic factors are most strongly correlated with forecast outcomes. The resulting insights offer a detailed understanding of the model’s internal logic, paving the way for more informed and targeted public health interventions.

The analytical process pinpointed specific COVID-19 signals as having disproportionate influence on forecasting accuracy. Health indicators, such as hospitalization rates and ICU occupancy, consistently ranked among the most impactful factors, demonstrating a clear link between disease severity and predictive outcomes. Equally important were behavioral response variables – mobility data reflecting adherence to social distancing measures and mask-wearing compliance – which revealed how public actions directly shaped the trajectory of the pandemic. Demographic factors, including age distribution and population density, also emerged as key drivers, highlighting the varying levels of vulnerability and transmission risk across different communities. This detailed understanding of signal importance allows for more nuanced and effective public health strategies, shifting focus towards the most influential levers for pandemic control.

The identification of crucial COVID-19 signals driving forecast outcomes enables public health officials to move beyond reactive measures and implement precisely targeted interventions. Recognizing which health indicators, behavioral patterns, and demographic factors most strongly influence predictions allows for the efficient allocation of resources, focusing efforts where they will yield the greatest impact on disease control and mitigation. This nuanced understanding facilitates the development of tailored strategies – from localized vaccination campaigns addressing specific demographic vulnerabilities to targeted public health messaging promoting behaviors that demonstrably reduce transmission – ultimately enhancing the effectiveness of public health responses and improving overall population health outcomes. Prioritization based on these key drivers represents a shift towards proactive, data-informed decision-making, maximizing the return on investment for public health initiatives.

The capacity to discern why a forecasting model arrives at a particular prediction represents a significant leap forward in public health preparedness. Unlike ‘black box’ models – those offering outputs without revealing the underlying rationale – interpretable forecasting empowers officials to move beyond simply reacting to trends. This transparency illuminates the specific health indicators, behavioral patterns, and demographic factors driving forecasts, allowing for proactive interventions precisely targeted at the most influential variables. Consequently, resources can be allocated with greater efficiency, and strategies refined to maximize impact, ultimately fostering a more resilient and responsive public health system capable of anticipating and mitigating future outbreaks with greater precision.

“`html

The pursuit of predictive modeling, as demonstrated by TrendGNN, echoes a fundamental principle of systemic understanding. The framework’s emphasis on trend similarity within a graph-based structure isn’t merely about improving forecasting accuracy; it’s about recognizing that elements within a complex system are intrinsically linked. Ada Lovelace observed, “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.” This resonates with TrendGNN’s approach – the system leverages existing data and relationships to extrapolate future behaviors. The architecture allows for discerning patterns and understanding how interconnected elements influence outcomes. Good architecture is invisible until it breaks, and only then is the true cost of decisions visible.

Beyond the Horizon

The pursuit of predictive accuracy, as demonstrated by this work, often obscures a more fundamental question: what are systems actually optimized for? TrendGNN offers a compelling method for forecasting complex phenomena, but the true test lies not in minimizing error, but in understanding the emergent properties revealed by the model’s structure. The emphasis on trend similarity, while intuitively appealing, begs further investigation into the nature of ‘trends’ themselves – are these merely statistical artifacts, or do they reflect underlying causal mechanisms? Future work should prioritize the development of methods for discerning correlation from causation within these graph-based frameworks.

Interpretability remains a persistent challenge. While visualizing trends offers some insight, it does not necessarily illuminate the why behind observed behaviors. The field must move beyond simply identifying predictive features and toward constructing models that offer genuine explanatory power. This necessitates a shift in focus, away from purely data-driven approaches and toward the integration of domain knowledge and theoretical constraints. Simplicity is not minimalism; it is the discipline of distinguishing the essential from the accidental.

Ultimately, the success of this line of inquiry will depend on recognizing that epidemics, beliefs, and behaviors are not isolated phenomena, but interconnected elements of a larger, self-organizing system. A truly comprehensive model will require a holistic perspective, one that acknowledges the complex interplay between individual actions, social networks, and environmental factors. The challenge, as always, is to build models that are both accurate and meaningful, reflecting not just what happens, but how and why.

Original article: https://arxiv.org/pdf/2512.00421.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Prediction: Understanding the Signals of Epidemic Change

Modeling Interdependence: A Graph-Based Approach to Forecasting

Enhancing Forecast Accuracy: Robustness and Scalability

Illuminating the Drivers: Interpretable Forecasting and its Impact

Beyond the Horizon

See also: