News Sentiment Predicts Aluminum Price Swings

Author: Denis Avetisyan


New research shows that analyzing the emotional tone of news articles, using advanced AI, can significantly improve the accuracy of aluminum price forecasting.

Predictive modeling of aluminum prices demonstrates that focusing on topics like price movement, company news, and supply disruptions-and particularly combining these insights with sentiment analysis of Reuters headlines-yields consistently higher Sharpe ratios than broad market aggregation, with performance further refined by prioritizing forward-looking event news and acknowledging the inherent standard error in these forecasts.
Predictive modeling of aluminum prices demonstrates that focusing on topics like price movement, company news, and supply disruptions-and particularly combining these insights with sentiment analysis of Reuters headlines-yields consistently higher Sharpe ratios than broad market aggregation, with performance further refined by prioritizing forward-looking event news and acknowledging the inherent standard error in these forecasts.

Finetuned large language models provide topic- and event-conditional sentiment analysis for enhanced time series modeling of aluminum price fluctuations.

While commodity price prediction increasingly relies on textual data, the nuanced effectiveness of lightweight, finetuned large language models in capturing predictive signals remains largely unexplored. This study, ‘Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting’, addresses this gap by demonstrating that incorporating topic- and event-conditional sentiment from a finetuned Qwen3 model significantly improves aluminum price forecasting, particularly during periods of high market volatility. Specifically, Long Short-Term Memory (LSTM) models integrating sentiment data achieved a Sharpe ratio of 1.04, substantially outperforming tabular-data-only baselines with a Sharpe ratio of 0.23. How can a deeper understanding of news source, topic, and event-specific sentiment further refine financial modeling and risk management strategies?


The Inevitable Echo: Why Time Series Alone Fail

Conventional aluminum price forecasting methods frequently prioritize historical time series analysis, a practice increasingly challenged by the metal’s susceptibility to external shocks. While past price trends offer a baseline understanding, they often prove inadequate when confronted with unforeseen events – geopolitical instability, sudden shifts in demand, or disruptions to critical supply chains. This reliance on lagging indicators creates a significant vulnerability, as rapid price fluctuations stemming from real-world occurrences are not immediately reflected in historical data. Consequently, forecasts based solely on time series analysis can be slow to adapt, leading to inaccurate predictions and potentially substantial financial losses in a market where responsiveness to current events is paramount.

The reliance on purely historical data in aluminum price forecasting creates considerable vulnerability, as overlooking contemporary news and market sentiment introduces critical forecasting blind spots. Particularly in volatile markets, where sudden geopolitical events, supply chain disruptions, or shifts in economic policy can drastically alter prices, a failure to incorporate real-time information proves detrimental. Traditional models, absent this crucial context, struggle to anticipate and react to emergent trends, leading to inaccurate predictions and potentially significant financial consequences. The speed at which news cycles operate necessitates a dynamic approach, one that acknowledges the immediate impact of information on market behavior and allows for rapid adjustments to forecasting models, effectively mitigating the risks associated with unforeseen events.

Effective aluminum price forecasting increasingly demands a shift beyond solely analyzing historical data; incorporating real-time textual information proves crucial for navigating market volatility. Supply chain disruptions – from port congestion to raw material scarcity – are often signaled in news reports before impacting price charts, as are shifts in governmental policies related to tariffs or environmental regulations. Sophisticated analytical techniques now process these textual streams, extracting sentiment and identifying key events that influence market expectations. By integrating this information with traditional time series analysis, predictive models gain the capacity to anticipate price fluctuations driven by external factors, leading to more accurate and responsive forecasts in a dynamic global economy.

From November 2007 to April 2024, the predicted aluminum price closely tracked the actual aluminum price.
From November 2007 to April 2024, the predicted aluminum price closely tracked the actual aluminum price.

The Language of Markets: LLMs as Sentient Forecasters

Large Language Models (LLMs) provide a robust method for processing and interpreting the substantial volume of news data relevant to financial markets. These models utilize natural language processing (NLP) techniques to move beyond simple keyword detection and instead understand the contextual meaning and emotional tone – or sentiment – expressed within news articles. This capability allows for the automated identification of positive, negative, or neutral perspectives regarding specific assets, companies, or economic indicators. The extracted sentiment data can then be quantified and incorporated into predictive models, enabling data-driven investment strategies and risk assessment. LLMs surpass traditional methods by handling nuanced language, sarcasm, and complex sentence structures, leading to more accurate sentiment classification and ultimately, more informed decision-making.

Sentiment analysis, when performed using Large Language Models (LLMs) on news data, enables the quantification of correlation between news events and subsequent aluminum price fluctuations. This analytical approach has demonstrated a capability to generate a 292% total return based on historical data. The methodology involves processing news articles to determine the prevailing sentiment – positive, negative, or neutral – and then correlating these sentiment scores with daily or intraday aluminum price movements. Statistical analysis identifies which sentiment shifts have the most significant impact, allowing for the creation of a predictive model that capitalizes on these correlations. This return figure represents the total profit generated relative to an initial investment, factoring in both successful predictions and any incurred losses over a defined period.

Pre-trained financial language models, such as FinBERT and Qwen3, demonstrate improved performance in sentiment classification compared to general-purpose models. These models are trained on large corpora of financial news, filings, and reports, enabling them to better understand the nuances of financial language. Specifically, the Qwen3 model, when incorporated into a forecasting model designed to predict aluminum price movements, achieved an R-squared (R2) score of 0.89. This metric indicates that approximately 89% of the variance in aluminum price movements can be explained by the model, highlighting the significant contribution of sentiment analysis driven by specialized financial language models.

Topic Analysis is a crucial preprocessing step for sentiment-based aluminum price prediction. This process involves identifying the dominant themes present within news articles and filtering for those directly relevant to aluminum market dynamics. By focusing analysis on topics such as production levels, supply chain disruptions, geopolitical factors impacting raw material sourcing, and demand from key industries like automotive and aerospace, the model minimizes noise from extraneous information. This targeted approach ensures that sentiment scores accurately reflect market-moving events, improving the overall predictive power of the forecasting model and increasing the reliability of derived trading signals.

A workflow combines financial data <span class="katex-eq" data-katex-display="false">	ext{(4,152 rows from March 2007 to April 2024)}</span> with news headlines <span class="katex-eq" data-katex-display="false">	ext{(Reuters: 4,963, Dow Jones: 11,581, China News Service: 8,970)}</span>, processed via sentiment analysis <span class="katex-eq" data-katex-display="false">	ext{(positive, negative, neutral)}</span> and combined with numerical data to predict monthly aluminum prices using time series models.
A workflow combines financial data ext{(4,152 rows from March 2007 to April 2024)} with news headlines ext{(Reuters: 4,963, Dow Jones: 11,581, China News Service: 8,970)}, processed via sentiment analysis ext{(positive, negative, neutral)} and combined with numerical data to predict monthly aluminum prices using time series models.

The Test of Time: Validating Performance Through Rigorous Backtesting

Walk-forward validation is a performance evaluation technique designed to mimic real-world application of the forecasting model by sequentially training and testing on expanding datasets. The process begins by training the model on an initial period of historical data and then testing its predictive accuracy on a subsequent, held-out period. This test period is then added to the training data, and the model is retrained. This iterative process of training and testing is repeated across multiple time periods, simulating how the model would perform as new data becomes available. By evaluating the model’s performance across varying market conditions inherent in these different time periods, walk-forward validation provides a more realistic and robust assessment of its generalizability and helps identify potential performance degradation over time, compared to traditional backtesting methods.

Walk-forward validation assesses model performance by iteratively training on a historical data window and testing on subsequent, unseen data. This process simulates real-world trading by repeatedly re-training the model as new data becomes available, effectively mimicking how the model would adapt to changing market conditions. By evaluating performance across multiple out-of-sample periods, walk-forward validation provides a more reliable estimate of future performance than traditional backtesting, which can be susceptible to overfitting – where the model learns patterns specific to the historical training data and fails to generalize to new data. The technique reduces optimistic bias by preventing the model from benefiting from knowledge of future events during the training phase, providing a more robust assessment of its predictive capabilities.

The forecasting model integrates data feeds from Reuters News, Dow Jones Newswires, and China News Service to achieve comprehensive news coverage. These sources provide a diverse range of financial and economic reporting, encompassing global markets, company-specific news, and macroeconomic indicators. The inclusion of China News Service is particularly important for capturing events and sentiment impacting Asian markets, which often have limited representation in Western-centric news feeds. Data ingestion from these multiple sources allows the model to account for a broader spectrum of potentially market-moving events than would be possible with a single news provider.

Event Type Classification systematically categorizes news events based on their nature – for example, earnings announcements, macroeconomic data releases, or geopolitical incidents – to quantify the correlation between specific event types and subsequent price movements. This granular categorization extends beyond simple sentiment analysis; it identifies which types of news have the most statistically significant impact on asset prices, allowing for a more nuanced understanding of market reactions. The resulting data informs model parameters and weighting schemes, improving the accuracy of price predictions by prioritizing event types with demonstrably higher predictive power and reducing the influence of less relevant information. This refined analysis facilitates the development of trading strategies tailored to specific event-driven market dynamics.

The analysis integrates multiple models, sentiment data sources, and time windows to provide a comprehensive assessment.
The analysis integrates multiple models, sentiment data sources, and time windows to provide a comprehensive assessment.

From Prediction to Profit: The Inevitable Outcome of Informed Strategy

The core of this research lies in a trading strategy directly informed by the predictive capabilities of the developed model. This strategy isn’t simply about anticipating price changes; it’s a systematic approach to capitalizing on those predictions through simulated trades. Predicted price movements trigger buy or sell signals, allowing the model to virtually execute trades based on its analysis of market sentiment and historical data. The strategy’s design prioritizes not only accurate predictions but also risk management, incorporating mechanisms to limit potential losses while maximizing returns from correctly anticipated movements. This creates a closed-loop system where the model’s predictive power translates directly into actionable trading decisions, forming the basis for evaluating its practical financial viability.

Rigorous backtesting revealed the trading strategy, driven by the sentiment-integrated model, delivered a substantial 292% cumulative return over the evaluation period. This performance wasn’t merely positive; it significantly exceeded the returns of a baseline strategy that did not incorporate sentiment analysis, achieving a 161% outperformance. Key to this assessment was the use of the Sharpe Ratio, a metric that considers risk-adjusted return, providing a more nuanced understanding of the strategy’s efficiency. The results suggest that incorporating linguistic insights derived from large language models can materially improve financial forecasting and, consequently, trading outcomes, offering a compelling case for the practical application of these technologies in investment strategies.

Financial markets don’t exist in a state of constant stability; instead, they cycle through periods of high and low volatility, known as volatility regimes. Recognizing these shifts is paramount for any successful trading strategy. A model optimized for calm markets may falter dramatically when volatility spikes, and vice versa. Therefore, adaptive strategies are designed to dynamically adjust position sizes and risk exposure based on the prevailing market regime. This often involves employing techniques like GARCH modeling or employing volatility indicators to gauge current conditions and predict future fluctuations. By acknowledging and responding to these regimes, the trading strategy aims to preserve capital during turbulent times and aggressively capitalize on opportunities when markets are stable, ultimately leading to maximized and more consistent returns.

The synergy between large language models and established financial validation offers a compelling pathway for improved forecasting. This research demonstrates that incorporating sentiment extracted from textual data – in this case, financial news – can significantly enhance predictive accuracy when paired with rigorous backtesting and performance metrics. Rather than relying solely on historical price data, the integrated approach leverages the nuanced insights captured within natural language, potentially identifying market shifts before they are fully reflected in trading prices. The resulting strategy’s substantial outperformance-a 161% return advantage-validates the power of this combined methodology, suggesting a future where LLM-driven sentiment analysis becomes a standard component of practical, data-driven financial modeling and investment strategies.

Portfolio performance, measured by Sharpe ratio, varied by strategy-tabular, tabular+sentiment, or sentiment-only (using Qwen or Reuters)-and volatility scenario (high: 28 months, medium: 106 months, low: 66 months), with error bars indicating ±1 standard error and orange bars highlighting the best-performing strategy in each case.
Portfolio performance, measured by Sharpe ratio, varied by strategy-tabular, tabular+sentiment, or sentiment-only (using Qwen or Reuters)-and volatility scenario (high: 28 months, medium: 106 months, low: 66 months), with error bars indicating ±1 standard error and orange bars highlighting the best-performing strategy in each case.

The pursuit of predictive accuracy, as demonstrated by this work on aluminum price forecasting, often feels less like construction and more like tending a garden. The models bloom with initial promise, yet inevitably succumb to the shifting conditions of the market – a testament to the inherent limitations of any system attempting to capture complex realities. As Claude Shannon observed, “The most important thing is to get the message across, not to be clever.” This paper cleverly employs sentiment analysis, but the true value lies in acknowledging that even the most sophisticated tools are merely approximations, and that adaptability-not rigid optimization-is the key to navigating the unpredictable currents of financial markets. Scalability, in this context, isn’t about handling more data, but about preserving flexibility in the face of constant change.

What Lies Ahead?

This exploration into sentiment’s influence on aluminum pricing reveals, predictably, that markets are not governed by rational actors, but by cascades of patterned irrationality. The refinement of large language models to detect these patterns is not a solution, but a more precise mapping of the problem. The improved forecasts are not triumphs of prediction, but temporary reprieves from chaos, a postponement achieved through increasingly elaborate architecture. One anticipates the inevitable arrival of novel noise – a black swan event uncaptured by the training data, a shift in the very language of market psychology.

The true challenge lies not in maximizing forecast accuracy, but in designing systems resilient to inherent unpredictability. To treat sentiment as a feature is to misunderstand its nature. It is not a signal to be decoded, but an emergent property of a complex system. Further work will undoubtedly focus on incorporating additional data streams, refining model architectures, and chasing ever-smaller gains in predictive power. But these are merely local optimizations within a fundamentally unstable system.

There are no best practices – only survivors. The architecture of financial modeling is how one postpones chaos, not defeats it. The next generation of models will not be defined by their ability to predict, but by their capacity to adapt, to learn from failure, and to gracefully degrade when confronted with the inevitable storm. Order is, after all, just cache between two outages.


Original article: https://arxiv.org/pdf/2603.09085.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-11 16:14