Author: Denis Avetisyan
New research reveals that analyzing past text data with frozen language models can reveal economically relevant information missed by current market valuations.

Frozen large language models demonstrate the capacity to extract predictive signals from historical text, offering potential for improved portfolio construction based on previously uncaptured qualitative data.
Despite decades of research into market efficiency, extracting predictive signals from the vast and rapidly evolving landscape of public text remains a challenge. This paper, ‘ChatGPT as a Time Capsule: The Limits of Price Discovery’, investigates whether frozen large language models (LLMs) can capture economically relevant qualitative information-beyond contemporaneous valuations-embedded within historical textual data. We find that an LLM-derived outlook score, constructed from snapshots of public text, is positively associated with future equity returns, suggesting LLMs act as a ‘time capsule’ of predictive information. Could this approach unlock new avenues for alpha generation by efficiently processing dispersed qualitative data currently overlooked by traditional valuation methods?
Decoding the Noise: Uncovering Signals in a Congested Information Landscape
The contemporary financial landscape is increasingly defined by a deluge of qualitative data – news articles, social media posts, earnings call transcripts, and regulatory filings – creating a phenomenon termed ‘Narrative Congestion’. This isn’t simply about information overload; it actively impedes accurate price discovery as the sheer volume of textual data obscures genuine signals amidst the noise. Traditional analytical methods, largely reliant on quantitative metrics, struggle to efficiently process and synthesize these narratives, leading to market inefficiencies and potentially mispriced assets. Consequently, discerning investors find it increasingly difficult to differentiate between substantive insights and transient sentiment, hindering their ability to identify truly undervalued or overvalued opportunities and exacerbating the potential for market bubbles or corrections.
Conventional financial analysis often relies on structured data – numbers from balance sheets and income statements – but increasingly, crucial insights are buried within unstructured text like news articles, analyst reports, and social media feeds. Existing methods for processing this ‘Public Information Set’ frequently fall short because they struggle to effectively synthesize the nuances of language, leading to incomplete or inaccurate assessments of asset value. For instance, sentiment analysis alone might identify a generally positive tone surrounding a company, but fail to detect subtle warnings embedded within the text regarding supply chain vulnerabilities or regulatory risks. This inability to fully integrate qualitative data limits the capacity to identify genuinely undervalued or overvalued assets, creating opportunities for those who can more effectively decode the signals hidden within the sea of text.
The proliferation of digital information has created a vast ‘Public Information Set’ – encompassing news articles, social media posts, regulatory filings, and countless other textual sources – that overwhelms conventional financial analysis. Effectively navigating this data requires more than simply collecting it; a scalable solution is needed to process and interpret the subtle signals hidden within this noise. Researchers are now focused on applying techniques from Natural Language Processing and Machine Learning to automatically extract sentiment, identify key themes, and quantify the impact of textual information on asset prices. This automated approach aims to move beyond manual review, allowing for the continuous monitoring of a significantly larger data universe and ultimately, a more informed and efficient marketplace.
The LLM Outlook Score: A Rational Assessment of Firm Fundamentals
The LLM Outlook Score is a standardized metric, expressed as a z-score, designed to evaluate company prospects across all sectors. This score is generated through the analysis of the ‘Public Information Set’, encompassing data from sources like SEC filings, news articles, and press releases. The z-score normalizes the LLM’s assessment, allowing for comparison of outlooks between companies of differing sizes and across various industries. A higher score indicates a more positive projected outlook, while a negative score suggests potential challenges, with the magnitude of the score representing the degree of confidence in that assessment. The score’s sector-neutrality eliminates bias inherent in industry-specific comparisons.
The LLM Outlook Score utilizes periodically saved Large Language Model states, termed ‘LLM Checkpoints’, as quantifiable representations of information available at specific points in time. These checkpoints function as proxies for assessing ‘Firm Fundamentals’ by providing a consistent basis for comparison across different reporting periods. By analyzing the LLM’s understanding of public information as captured in these checkpoints, the score enables a standardized and reproducible evaluation of a company’s core financial and operational strengths, independent of fluctuating market conditions or subjective interpretations. This approach ensures that changes in the score reflect genuine shifts in the underlying fundamentals, rather than variations in analytical methodology.
Traditional firm analysis relies on manually curated datasets and often focuses on structured financial statements, regulatory filings, and limited news sources. The LLM Outlook Score, however, utilizes large language models to automatically synthesize information from a significantly broader ‘Public Information Set’, including earnings calls, press releases, social media, and unstructured text data. This automated synthesis enables the consideration of a more comprehensive range of factors impacting firm prospects, moving beyond purely quantitative metrics to incorporate qualitative insights and sentiment analysis, thereby providing a holistic assessment unavailable through conventional methods.

Empirical Validation: The Predictive Power of the LLM Signal
Empirical analysis demonstrates a statistically significant relationship between the LLM Outlook Score and subsequent asset returns. Specifically, the analysis reveals a coefficient of 0.0122, indicating that a one-unit increase in the LLM Outlook Score is associated with a 0.0122 increase in returns. This result is statistically significant, as evidenced by a t-statistic of 4.25. Crucially, this predictive power persists even after controlling for standard valuation metrics, suggesting the LLM Outlook Score captures information not already reflected in traditional financial analysis. The analysis utilized β coefficients to quantify the relationship between the LLM signal and the observed returns, controlling for confounding variables.
Analysis reveals a strong positive correlation between prediction horizon and the strength of the LLM-derived predictive signal. Specifically, a Spearman correlation of 0.91 (p=0.03) demonstrates that as the prediction horizon extends, the predictive power of the LLM Outlook Score increases. This statistically significant relationship indicates that longer-term forecasts generated using the LLM exhibit greater accuracy and reliability compared to shorter-term predictions, suggesting the model effectively captures and extrapolates long-term trends.
Analysis indicates a consistent positive correlation between the level of ‘Model Sophistication’ employed and the strength of generated predictive signals. Increased model complexity, encompassing factors such as parameter count and architectural advancements, demonstrably improves the accuracy of forecasts. This finding supports the hypothesis that the LLM possesses and effectively utilizes analytical capabilities, as more advanced models consistently outperform their less sophisticated counterparts in predicting future outcomes. Quantitative evaluation confirms that gains in predictive power are directly attributable to enhancements in the LLM’s internal analytical processes.
Portfolio Implications: Translating Insight into Superior Risk-Adjusted Returns
Analysis reveals that investment portfolios built utilizing the LLM Outlook Score demonstrate notably strong performance characteristics. Specifically, these portfolios achieved a Sharpe Ratio of 2.31, a key metric evaluating risk-adjusted return, significantly outpacing the Sharpe Ratio of 1.31 recorded by the S&P 500 Price Index over the same period. This substantial difference indicates the LLM-driven strategy delivers a more favorable return for each unit of risk assumed, suggesting its potential to generate superior investment outcomes compared to traditional market-tracking approaches. The higher Sharpe Ratio effectively quantifies the LLM’s ability to identify and capitalize on opportunities for enhanced portfolio efficiency.
Traditional factor exposure analysis, while valuable, often provides an incomplete picture of asset risk and return potential. This research demonstrates that incorporating an LLM-derived outlook score significantly expands this understanding. The LLM signal captures nuanced, forward-looking insights from textual data – news, reports, and social media – that conventional factors, such as value or momentum, frequently miss. By analyzing the sentiment and predictive content within these sources, the LLM identifies emerging trends and subtle shifts in market perception. This allows for a more holistic assessment of risk drivers, moving beyond historical correlations to incorporate expectations about future performance, ultimately leading to portfolios that are not only more efficient but also potentially more resilient to unforeseen market events.
Portfolio resilience benefits significantly from the incorporation of the LLM Outlook Score into established risk management frameworks. Analysis reveals a maximum drawdown of just 3.7% for portfolios utilizing this signal, a substantial improvement over the 9.7% experienced by the S&P 500 Price Index under similar conditions. This enhanced downside protection suggests the LLM provides valuable insight into potential market vulnerabilities, enabling proactive adjustments to mitigate losses and optimize overall portfolio performance. The data indicates that leveraging this predictive capability not only bolsters defenses during periods of market stress, but also contributes to a more consistent and favorable risk-adjusted return profile.
The study reveals a persistent informational inefficiency within market valuations, suggesting that even readily available textual data isn’t fully incorporated into pricing models. This echoes a foundational principle of rational inquiry: truth isn’t proclaimed, but rigorously tested against failure. As Mary Wollstonecraft observed, “It is time to try the method of reason.” The analysis of LLM-derived outlook scores demonstrates that qualitative data, captured in a ‘frozen’ time capsule of language, can offer predictive power beyond what contemporaneous metrics suggest. The persistence of these factor exposures highlights the difficulty of achieving true market efficiency, reinforcing the necessity of continual reassessment and the acceptance of inherent uncertainty in predictive modeling. Correlation, in this context, is indeed suspicion, not proof; the LLM’s signals require diligent examination, but they offer a valuable challenge to established assumptions.
What’s Next?
The observation that a static record of past language – a frozen LLM, effectively – can reveal valuation discrepancies isn’t particularly surprising. Markets are rarely, if ever, perfectly efficient. The more interesting question isn’t if information is missed, but why it was missed, and what that implies about the nature of information processing itself. This work doesn’t solve the problem of alpha generation; it merely shifts the burden of explanation. Identifying statistically significant exposures doesn’t equate to understanding the cognitive biases, institutional constraints, or simple inattention that allowed these discrepancies to persist.
Future iterations will inevitably explore more sophisticated LLMs, expanded datasets, and increasingly complex factor models. But a relentless pursuit of predictive accuracy risks mistaking correlation for causality. A more fruitful path might involve focusing on the errors – systematically analyzing instances where the LLM-derived signals fail. It is in these failures that the true limitations of this approach – and, perhaps, the underlying inefficiencies of the market – will be revealed. Wisdom, after all, isn’t knowing what will happen, but understanding the boundaries of what can be known.
Ultimately, this research serves as a reminder that ‘alpha’ isn’t a property of the data, but a consequence of collective misjudgment. A truly robust methodology wouldn’t seek to exploit these errors, but to diagnose the systematic flaws in human reasoning that create them. That, however, is a problem for behavioral science, not quantitative finance.
Original article: https://arxiv.org/pdf/2604.21433.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- All Itzaland Animal Locations in Infinity Nikki
- Persona PSP soundtrack will be available on streaming services from April 18
- Cthulhu: The Cosmic Abyss Chapter 3 Ritual Puzzle Guide
- Raptors vs. Cavaliers Game 2 Results According to NBA 2K26
- Paramount CinemaCon 2026 Live Blog – Movie Announcements Panel for Sonic 4, Street Fighter & More (In Progress)
- Gold Rate Forecast
- Dungeons & Dragons Gets First Official Actual Play Series
- DC Studios Is Still Wasting the Bride of Frankenstein (And Clayface Can Change That)
- 100 un-octogentillion blocks deep. A crazy Minecraft experiment that reveals the scale of the Void
- When Logic Breaks Down: Understanding AI Reasoning Errors
2026-04-24 22:07