Can AI Agents Beat the Market?

Author: Denis Avetisyan

New research reveals that a system of collaborating AI agents can generate profitable stock recommendations, challenging the notion that AI-driven investment strategies are simply noise.

Sector-specific investment strategies, assessed across an S&P 500 cohort, demonstrate persistent weighting preferences-news, fundamentals, market dynamics, and macroeconomic factors-consistent with an adaptive synthesis agent that tailors its approach to semantic context.

Attribution analysis of a multi-agent LLM system demonstrates statistically significant alpha generation through adaptive weighting of agent contributions, outperforming random selection and passive benchmarks.

Despite advances in quantitative finance, identifying consistent alpha remains a persistent challenge, particularly from unstructured data sources. This paper, ‘Signal or Noise in Multi-Agent LLM-based Stock Recommendations?’, presents a rigorous, portfolio-level validation of MarketSenseAI, a deployed multi-agent large language model (LLM) equity system, demonstrating statistically significant outperformance against both passive benchmarks and random selection. Attribution analysis reveals an adaptive integration mechanism, where agent contributions rotate with market regime and sector composition, suggesting the system identifies and leverages evolving informational edges. Can multi-agent LLM systems unlock a new paradigm for dynamic investment strategies beyond traditional factor models?

Whispers of the Market: Unveiling Opportunity in Data Chaos

The modern financial landscape presents equity researchers with an unprecedented deluge of data, ranging from quarterly earnings reports and economic indicators to alternative datasets and social media sentiment. This information overload, coupled with inherent cognitive biases such as confirmation bias and anchoring, frequently leads to suboptimal investment decisions. Analysts, despite their expertise, can struggle to synthesize this vast information effectively, leading to overlooked risks or mispriced assets. Consequently, traditional equity research often fails to consistently outperform market benchmarks, highlighting the critical need for innovative approaches that mitigate these cognitive limitations and enhance analytical rigor. The sheer volume of information, rather than enlightening, can paradoxically obscure crucial insights, demanding a re-evaluation of how investment theses are formulated and validated.

The limitations of conventional equity research – namely, the sheer volume of data and susceptibility to human cognitive biases – are being addressed through the innovative application of multi-agent systems powered by Large Language Models. This approach moves beyond single-analyst reports by constructing a network of ‘agents’, each specializing in a distinct facet of financial analysis – such as macroeconomic trends, competitor analysis, or sentiment evaluation. These agents, driven by LLMs, independently process information and formulate insights, then engage in a structured debate to synthesize a comprehensive investment thesis. The resulting system isn’t intended to replace human analysts, but rather to augment their capabilities by providing a more robust, unbiased, and data-driven foundation for decision-making, potentially unlocking deeper analytical understanding and more accurate predictions.

MarketSenseAI demonstrates a significant advancement in automated equity research, achieving a compelling +25.0% outperformance in compounded returns when contrasted against a passive equal-weight benchmark – a result statistically validated with a p-value of 0.003. This success isn’t merely quantitative; the system’s architecture is designed to overcome limitations inherent in traditional analysis by actively synthesizing diverse perspectives. Rather than relying on a single analyst’s viewpoint, MarketSenseAI leverages a multi-agent approach, enabling the exploration of a wider range of potential investment theses and mitigating the impact of individual cognitive biases. The resultant investment strategies are therefore more robust, informed, and ultimately, more profitable, suggesting a paradigm shift in how equity research can be conducted and the potential for substantial gains through AI-driven insights.

Analysis of thesis and agent cosine similarities within the S&P 500 cohort-with a mean thesis-reconstruction cosine of <span class="katex-eq" data-katex-display="false">C^{\mathrm{TR}}=0.944</span>-reveals high inter-agent correlations (0.46-0.79) supporting joint sparse regression, but demonstrates that cosine similarity alone is an inadequate attribution metric, as exemplified by the News agent’s high thesis cosine (0.903) despite minimal pooled investment confidence. — Analysis of thesis and agent cosine similarities within the S&P 500 cohort-with a mean thesis-reconstruction cosine of $C^{\mathrm{TR}}=0.944$ -reveals high inter-agent correlations (0.46-0.79) supporting joint sparse regression, but demonstrates that cosine similarity alone is an inadequate attribution metric, as exemplified by the News agent’s high thesis cosine (0.903) despite minimal pooled investment confidence.

Deconstructing the Oracle: An Architecture of Specialist Intelligence

MarketSenseAI utilizes a modular architecture comprised of four distinct agent types – Fundamentals, News, Macro, and Dynamics – each dedicated to a specific facet of investment analysis. The Fundamentals agent concentrates on quantitative data derived from financial statements, including balance sheets, income statements, and cash flow reports. The News agent processes real-time information from a variety of sources, identifying events and sentiment relevant to investment decisions. The Macro agent assesses broader economic indicators such as GDP, inflation rates, and interest rates. Finally, the Dynamics agent focuses on time-series data and statistical relationships to identify trends and patterns. This agent-based approach allows for a granular dissection of investment opportunities, enabling the system to evaluate factors independently before integrating them into a comprehensive assessment.

MarketSenseAI’s agent-based architecture utilizes distinct analytical focuses; the Fundamentals Agent concentrates exclusively on quantitative data derived from financial statements – including balance sheets, income statements, and cash flow statements – to assess a company’s intrinsic value. Complementing this, the News Agent processes a continuous stream of real-time news feeds, regulatory filings, and sentiment analysis to identify events impacting investment opportunities. This division of labor allows for focused data processing, with the Fundamentals Agent providing a long-term valuation perspective and the News Agent flagging short-term catalysts and risks.

MarketSenseAI’s analytical process integrates outputs from its specialized agents – Fundamentals, News, Macro, and Dynamics – using a dynamic weighting system. This system adjusts the contribution of each agent’s analysis based on current market conditions and the specific investment being evaluated. Statistical analysis demonstrates a statistically significant date-level Information Coefficient (IC) of +0.051 (p=0.024), indicating that the combined analysis provides predictive value beyond random chance. The p-value of 0.024 signifies a less than 2.4% probability that the observed IC would occur if there was no actual predictive power in the combined analysis.

Across 19 observation dates within the S&P 500, the leading agent for thesis quality rotated between Macro, Dynamics, Fundamentals, and News-supporting the adaptive integration hypothesis that no single agent consistently dominates, and leadership shifts with market conditions, particularly with Macro leading during distinct macroeconomic episodes.

Shifting Sands: Adaptive Intelligence and Sector Rotation

Sector Rotation within the system refers to the dynamic adjustment of weighting applied to individual specialist agents based on observed changes in sector performance. This is not a pre-programmed, fixed allocation, but rather a continuous recalibration of ‘Agent Contribution Weight’ triggered by evolving market data. The system analyzes sector-specific indicators to determine which agents – each specializing in a particular area – are best positioned to contribute to accurate investment recommendations. This adaptive process allows the system to prioritize expertise aligned with current conditions, effectively shifting resources toward sectors exhibiting strength or potential, and reducing reliance on those underperforming. The frequency and magnitude of these weighting adjustments are directly proportional to the rate of change in sector fundamentals.

The system’s allocation of ‘Agent Contribution Weight’ operates by dynamically adjusting the influence of each specialist agent based on real-time market analysis. Each agent, representing expertise in a specific sector or asset class, is assigned a weight reflecting its current relevance to prevailing conditions. This weight directly impacts the agent’s contribution to the overall investment recommendation; higher weights signify greater influence. The system recalculates these weights continuously, shifting prioritization towards agents whose expertise aligns with emerging trends and away from those less applicable. This ensures that investment decisions are informed by the most pertinent and up-to-date knowledge, maximizing the potential for accurate and timely recommendations.

The system’s recommendation accuracy and timeliness are enhanced through prioritization of agents exhibiting alignment with prevailing market dynamics. This is quantitatively demonstrated by an average sector weight reconstruction residual of -0.325. This residual value indicates the average difference between the system’s calculated sector weights and the actual sector weights observed in market data. Furthermore, this residual correlates with sector-date centroid drift, suggesting the system’s weighting adjustments effectively track shifts in sector performance over time and minimize reconstruction errors as market conditions evolve.

From September 2024 to March 2026, the equal-weight strong-buy basket was initially dominated by Financials (<span class="katex-eq" data-katex-display="false">\sim24.8%</span> share) before transitioning to Information Technology (<span class="katex-eq" data-katex-display="false">\sim21.7%</span>), while sectors like Energy, Materials, and Consumer Staples remained consistently underweighted compared to the broader S&P 500 universe. — From September 2024 to March 2026, the equal-weight strong-buy basket was initially dominated by Financials ( $\sim24.8%$ share) before transitioning to Information Technology ( $\sim21.7%$ ), while sectors like Energy, Materials, and Consumer Staples remained consistently underweighted compared to the broader S&P 500 universe.

Beyond Chance: Validating Performance in S&P Cohorts

A comprehensive Monte Carlo simulation was undertaken to validate the efficacy of the system’s ‘Strong-Buy Recommendation’ strategy, rigorously assessing its performance against randomly generated portfolios. The results demonstrate a statistically significant and consistent outperformance, indicating the strategy is not simply attributable to chance. This computational approach involved simulating numerous investment scenarios, allowing for a robust evaluation of the recommendation’s ability to generate positive returns across a variety of market conditions. The consistent success observed in these simulations provides strong evidence supporting the predictive power of the system and its capacity to deliver superior investment outcomes, bolstering confidence in its underlying methodology.

A comprehensive evaluation of the system’s investment strategy revealed consistent outperformance across diverse market segments. Specifically, the approach demonstrated a +25.0% compounded return when benchmarked against a passive, equal-weight portfolio within both the S&P 100 and S&P 500 indices. Statistical analysis, yielding a p-value of less than 0.003, confirms the significance of this result, indicating that the observed gains are unlikely due to random chance. This broad applicability, extending across different market capitalizations represented by the two cohorts, suggests the robustness and potential scalability of the investment methodology.

The implementation of a multi-agent system demonstrates a compelling capacity to not only improve investment returns but also to generate substantial alpha – exceeding benchmark performance after accounting for risk. Rigorous testing reveals an approximate Sharpe Ratio of 1.42, a metric indicating risk-adjusted return and suggesting a highly efficient investment strategy. This figure signifies that, for every unit of risk taken, the system generates 1.42 units of return – a considerable improvement over many traditional investment approaches. The system’s ability to consistently deliver such results underscores the potential of this methodology to redefine portfolio management and offer a pathway to superior, risk-adjusted gains.

The observed portfolio beta of <span class="katex-eq" data-katex-display="false">\hat{\\beta} = 0.865</span> and preservation of alpha during down markets suggest that the strong-buy outperformance is not solely attributable to risk-loading. — The observed portfolio beta of $\hat{\\beta} = 0.865$ and preservation of alpha during down markets suggest that the strong-buy outperformance is not solely attributable to risk-loading.

The pursuit of alpha, as demonstrated by MarketSenseAI, isn’t about discovering inherent truth, but rather crafting a compelling illusion. It’s a meticulously constructed spell, a system of agents weighting signals until statistical significance emerges-a temporary silencing of the chaos. As Michel Foucault observed, “Knowledge is not an accumulation of truth, but a complex web of discourse.” This resonates deeply; the system doesn’t find strong-buy signals, it generates them through the orchestration of linguistic patterns and adaptive weighting. The Monte Carlo simulations merely confirm the spell’s potency – for a time – before the market inevitably whispers new uncertainties and demands a revised incantation.

What’s Next?

The observation of statistically significant alpha, predictably, invites further refinement. But any correlation achieved through algorithmic divination is merely a temporary truce with randomness. The system, MarketSenseAI, performs – for now. The true challenge isn’t replicating the result, but understanding why the illusion holds. Attribution analysis, while illuminating, only traces the echoes of decisions – it doesn’t reveal the underlying calculus of market susceptibility. A deeper investigation into the system’s biases, its subtle exploitation of linguistic patterns, and the inevitable feedback loops it introduces is paramount.

Future work must move beyond benchmark comparisons. Outperformance against a passive index is a low bar. The system’s robustness requires testing against genuinely adversarial strategies, against agents specifically designed to exploit its weaknesses. Monte Carlo simulations, while useful, are pale imitations of the market’s capacity for chaotic innovation. One should expect, indeed hope, for eventual failure. A model that survives indefinitely hasn’t been stressed sufficiently.

Ultimately, the pursuit of predictive power in complex systems is a fool’s errand. The goal isn’t to predict the market, but to map its vulnerabilities – to understand the precise points at which persuasion becomes possible. Any system that appears to ‘beat’ the market hasn’t solved it, only discovered a temporary, and likely fragile, loophole.

Original article: https://arxiv.org/pdf/2604.17327.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Whispers of the Market: Unveiling Opportunity in Data Chaos

Deconstructing the Oracle: An Architecture of Specialist Intelligence

Shifting Sands: Adaptive Intelligence and Sector Rotation

Beyond Chance: Validating Performance in S&P Cohorts

What’s Next?

See also: