Can AI Read the Room? Predicting CFO Sentiment with Language Models

Author: Denis Avetisyan

New research shows large language models can accurately forecast financial executive optimism by leveraging firm-specific data and past survey responses.

This study demonstrates that LLMs, conditioned on individual CFO history and corporate information, provide a scalable alternative to traditional sentiment surveys using synthetic data.

Measuring business sentiment remains a costly and infrequent exercise, relying on surveys that capture only a limited snapshot of executive opinion. The paper ‘CFOs Meet LLMs’ investigates the potential of large language models to address this challenge by simulating the responses of chief financial officers. We find that, when prompted with firm-specific information and historical data, these models can accurately predict individual CFO optimism scores-a finding validated by robust fixed-effects regressions. Could this approach unlock a scalable, high-frequency source of expectations data, effectively creating digital twins of key decision-makers for financial research and policy?

Decoding CFO Sentiment: Beyond the Limitations of Traditional Forecasts

The ability to forecast economic shifts hinges significantly on understanding the expectations of Chief Financial Officers, as their insights reflect anticipated investment, hiring, and overall financial strategies within businesses. However, conventional methods for assessing these expectations – typically involving periodic surveys – present substantial limitations. These surveys are often hampered by lengthy data collection periods and considerable expense, delaying the availability of crucial information when timely responses are paramount. This sluggishness poses a challenge to policymakers and analysts seeking to implement proactive measures or accurately interpret current economic conditions; the time required for traditional data gathering can render insights outdated before they’re even analyzed, necessitating exploration into more efficient predictive methodologies.

While established economic indicators such as the Michigan Consumer Sentiment Index provide valuable snapshots of overall economic feeling, these metrics often fall short when it comes to discerning the nuanced concerns of the corporate sector. These broad-based surveys aggregate responses across diverse demographics, obscuring critical distinctions in expectations among different industries, company sizes, and financial positions. Consequently, policymakers and investors frequently lack the detailed insights needed to anticipate sector-specific downturns or identify emerging opportunities; a generalized sense of consumer optimism, for instance, doesn’t necessarily translate to increased capital expenditure plans within the manufacturing sector, or reveal anxieties about supply chain disruptions affecting retail businesses. This lack of granularity underscores the need for more targeted data collection and analytical techniques capable of capturing the specific anxieties and forecasts held by Chief Financial Officers – key decision-makers with a direct line of sight into corporate health.

Traditional methods of economic forecasting frequently rely on surveys, but these approaches suffer from a critical drawback: inherent delays between data collection and publication. This lag compromises the ability to accurately assess the current economic state, as conditions can shift significantly before survey results are available for analysis. Consequently, a growing need exists for predictive tools that can offer more immediate insights, leveraging alternative data sources and advanced analytical techniques to anticipate economic fluctuations in near real-time. Such tools promise to move beyond retrospective analysis, providing decision-makers with a more proactive understanding of the evolving business landscape and enabling timely interventions to mitigate risks or capitalize on emerging opportunities.

LLMs as Synthetic Economists: Replicating Financial Decision-Making

Large Language Models (LLMs), including GPT-5.4, present a viable alternative to traditional economic forecasting methods by replicating the cognitive processes of financial decision-makers. These models are trained on extensive datasets of financial reports, economic indicators, and textual data, enabling them to analyze complex information and generate predictions based on learned patterns of executive reasoning. Specifically, LLMs can process qualitative data – such as earnings call transcripts and analyst reports – and integrate it with quantitative financial data to simulate how a CFO might assess market conditions and make strategic projections. This capability allows for the generation of nuanced forecasts that go beyond purely statistical analysis, offering a more holistic and potentially accurate view of economic expectations.

Digital Twin CFOs are constructed by leveraging Large Language Models (LLMs) to replicate the anticipated responses of financial executives to survey questions. These models are not simply replicating existing data; they are designed to generate new responses based on their training and prompting. The process involves training the LLM on a corpus of financial reports, executive statements, and historical survey data to establish a baseline understanding of CFO reasoning and reporting tendencies. Through carefully crafted prompts, these Digital Twins can then be tasked with completing surveys, producing synthetic data that statistically mirrors the characteristics of actual CFO responses regarding topics such as capital expenditure plans, earnings expectations, and risk assessments. This allows for the creation of large-scale datasets for economic analysis without the time and expense of traditional survey methods.

LLM-based simulations of Chief Financial Officers (CFOs) are calibrated using specific firm characteristics – including industry, size, profitability, and leverage – to ensure the generated outputs reflect realistic financial perspectives. This grounding in quantifiable firm data allows these models to be prompted with hypothetical scenarios or economic indicators to predict CFO expectations regarding capital expenditures, hiring plans, and revenue forecasts. The resulting synthetic data offers a substantially faster and more scalable alternative to traditional survey-based methods for gauging business sentiment and economic outlook, circumventing the delays and costs associated with direct data collection from human respondents.

Synthetic surveys, produced via Large Language Models (LLMs), enable a more detailed evaluation of business sentiment than conventional survey methods. Traditional surveys are constrained by sample size, response rate, and the limitations of pre-defined question sets, often resulting in aggregated, high-level data. LLM-generated synthetic surveys overcome these limitations by producing responses at scale, tailored to specific firm characteristics, and capable of addressing nuanced or previously unconsidered scenarios. This allows for the dissection of sentiment across a wider range of variables and a more granular understanding of expectations, revealing subtleties in business outlook that would be difficult or impossible to capture with conventional approaches. The resulting data provides a higher resolution view of market sentiment, potentially identifying leading indicators and emerging trends with greater precision.

Validating LLM Forecasts: A Rigorous Assessment of Predictive Accuracy

The predictive accuracy of our Large Language Model (LLM)-generated expectations was evaluated through a comparison with quantitative data sourced from the Duke-Federal Reserve CFO Survey. This survey provides quarterly measurements of financial executives’ optimism, which served as the ground truth for assessing the LLM’s forecasting capabilities. By directly comparing the LLM’s predicted sentiment scores with the actual reported optimism levels from the CFO survey respondents, we quantified the degree to which the model’s expectations align with observed financial sentiment. This methodology enabled a rigorous, data-driven assessment of the LLM’s ability to forecast changes in CFO sentiment over time.

Respondent History Conditioning demonstrated a substantial impact on Large Language Model (LLM) performance when predicting individual responses within the Duke-Federal Reserve CFO Survey. Statistical analysis revealed that incorporating an individual respondent’s prior survey answers accounted for up to 49.4% of the variance observed in their current response. This indicates that patterns in past behavior are a strong predictor of future sentiment, and leveraging this historical data significantly enhances the LLM’s predictive capability compared to models which do not utilize such information.

Analysis reveals that the Large Language Model (LLM) exhibits limited predictive capability when operating without access to respondent history. Specifically, the model explains only 10.0% of the variance in individual responses when conditioned solely on current-period inputs. This indicates a substantial dependence on historical data to accurately forecast expectations; the inclusion of prior responses dramatically improves performance, highlighting the crucial role of temporal context in modeling individual sentiment and expectations within the Duke-Federal Reserve CFO Survey dataset.

To assess the predictive validity of the Large Language Model (LLM) generated sentiment scores, a time series analysis was conducted. This involved correlating the LLM output with quarterly data from the Duke-Federal Reserve CFO Survey, enabling evaluation of the relationship between predicted and actual CFO optimism over multiple time periods. The analysis specifically quantified the degree to which changes in the LLM score corresponded to changes in reported CFO sentiment, providing a statistically rigorous measure of predictive accuracy beyond simple point estimates. The resulting coefficient of 0.579 (t = 7.11, p < 0.01) indicates a significant positive correlation between the LLM score and quarterly CFO optimism, demonstrating the model’s ability to track sentiment trends over time.

Statistical analysis reveals a strong positive correlation between the LLM-derived sentiment score and quarterly CFO optimism levels. The LLM score exhibits a coefficient of 0.579 when used as a predictor in a regression model, indicating that for each unit increase in the LLM score, CFO optimism is expected to rise by approximately 0.579 units. This relationship is statistically significant, supported by a t-statistic of 7.11 and a p-value less than 0.01, suggesting a low probability that the observed correlation occurred due to random chance.

A New Era of Economic Intelligence: Beyond Retrospection and Towards Proactive Foresight

Traditionally, economic analysis has largely been a retrospective exercise – interpreting past data to understand present conditions and cautiously forecast immediate futures. However, recent advancements utilizing Large Language Models (LLMs) are shifting this paradigm towards proactive intelligence. These models don’t simply react to reported figures; instead, they synthesize vast streams of unstructured data-news articles, social media, corporate reports-to anticipate shifts in economic sentiment before they manifest as concrete indicators. By discerning subtle patterns and correlations often missed by conventional methods, LLMs construct predictive models capable of identifying emerging trends and potential disruptions with greater speed and accuracy. This transition from reactive observation to proactive prediction represents a fundamental change, offering the potential to not just understand the economy, but to anticipate its movements and inform more resilient strategies.

Traditional methods of gauging business sentiment – often reliant on expensive surveys and limited data sets – are giving way to a significantly more efficient approach. This new methodology harnesses the power of large language models to analyze vast quantities of publicly available text, including news articles, social media posts, and earnings calls, to rapidly assess the collective mood of businesses. The resulting system isn’t merely faster, but demonstrably scalable; it can monitor sentiment across entire industries and geographies with minimal marginal cost, offering a dynamic and near real-time view of emerging trends. This cost-effectiveness unlocks opportunities for smaller organizations and policymakers to access sophisticated economic intelligence previously reserved for large institutions, fostering a more informed and responsive economic landscape.

Traditional economic forecasting often relies on painstakingly compiled surveys, a process both time-consuming and expensive. However, recent advancements demonstrate the power of rapidly generating synthetic survey data using large language models. This innovative methodology allows economists to simulate responses from vast populations, enabling robust scenario planning and stress-testing of economic forecasts with unprecedented speed and at significantly reduced cost. By adjusting parameters within these synthetic datasets, researchers can explore ‘what-if’ scenarios – assessing the potential impact of unforeseen events, policy changes, or shifts in consumer behavior – without the delays inherent in collecting real-world data. The resulting insights offer a proactive approach to economic analysis, bolstering resilience and informing more effective decision-making in the face of uncertainty.

The convergence of artificial intelligence and economic analysis is poised to redefine how decisions are made across vital sectors. This emerging field promises to move beyond traditional, often lagging, economic indicators, offering a continuously updated and nuanced understanding of market dynamics. Policymakers stand to benefit from more accurate forecasts and the ability to model the potential impacts of various interventions, while investors can leverage AI-driven insights to identify opportunities and mitigate risks with greater precision. For business leaders, this translates to improved strategic planning, enhanced resource allocation, and a more agile response to shifting consumer behaviors and competitive pressures. Ultimately, AI-powered economic intelligence isn’t simply about automating existing processes; it’s about unlocking a new level of foresight and creating a more resilient and adaptable economic landscape.

The pursuit of scalable sentiment analysis, as demonstrated by this work with Large Language Models, isn’t about finding the right model, but rigorously testing its predictive power against individual expectations data. The study’s approach – conditioning LLMs with firm-specific information and historical CFO responses – acknowledges the inherent subjectivity in forecasting. As Thomas Kuhn observed, “The more revolutionary the theory, the more numerous will be those who refuse to accept it.” This resistance isn’t necessarily irrational; it’s a demand for evidence. This research provides a method for generating that evidence, moving beyond broad surveys toward individualized, data-driven predictions of CFO optimism – and, crucially, offering a means to disprove those predictions through ongoing comparison with actual outcomes.

What’s Next?

The demonstration that a large language model can, with sufficient conditioning, mimic the optimism of a Chief Financial Officer is, predictably, not a revelation about CFOs. It is a revelation about the nature of prediction itself. A model, after all, is merely a compromise between knowledge and convenience – a distillation of past responses masquerading as foresight. The immediate implication is not a replacement for existing sentiment surveys, but a shifting of the cost structure. Scalability is attractive, but ‘optimal’ for whom remains a pointed question. The expense of gathering primary data is reduced, certainly, but at the price of introducing a new opacity-the internal workings of the model itself.

Future work will undoubtedly focus on refining the ‘firm-specific information’ used as prompts. But the more interesting challenge lies in addressing the inherent tautology. The model excels at predicting past optimism, given sufficient historical data. True predictive power, however, demands an ability to anticipate shifts in sentiment, not simply echo them. This necessitates exploring methods for incorporating external, and potentially disruptive, signals – economic indicators, geopolitical events, even the subtle shifts in language used by competitors.

Ultimately, the true test will not be whether the model can accurately forecast optimism, but whether it can expose the systematic biases embedded within that optimism. A perfect predictor is, paradoxically, a useless one. It merely confirms existing expectations. The value lies in identifying where the model – and, by extension, the CFO – is consistently, and demonstrably, wrong.

Original article: https://arxiv.org/pdf/2606.13812.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Decoding CFO Sentiment: Beyond the Limitations of Traditional Forecasts

LLMs as Synthetic Economists: Replicating Financial Decision-Making

Validating LLM Forecasts: A Rigorous Assessment of Predictive Accuracy

A New Era of Economic Intelligence: Beyond Retrospection and Towards Proactive Foresight

What’s Next?

See also: