Turning Economic Talk into Data

Author: Denis Avetisyan

New research shows how advanced language models can accurately gauge policy uncertainty from textual sources, offering a powerful upgrade to traditional methods.

Throughout the nineteenth century, spikes in U.S. economic policy uncertainty-as measured by analysis of historical newspaper content using a Longformer classifier-coincided with and were largely driven by specific policy events such as the Embargo Act, debates surrounding national banks, and shifts in tariff legislation, as demonstrated by correlating topic prevalence-identified with a Llama-based classifier-with the overall uncertainty index.

Large language models significantly improve the measurement of economic policy uncertainty from text, enabling more robust and multilingual analysis.

Measuring economic phenomena from textual data often relies on imperfect proxies, limiting the scope and reliability of empirical analysis. This challenge is addressed in ‘Narratives to Numbers: Large Language Models and Economic Policy Uncertainty’, which demonstrates that large language models (LLMs) substantially improve the accuracy and expand the reach of quantifying policy uncertainty. By moving beyond traditional keyword-based approaches, the study constructs novel indices-including a new nineteenth-century U.S. measure-and reveals the potential of LLMs as explicit measurement tools. Could this paradigm shift unlock a more nuanced understanding of economic forces hidden within vast archives of textual data?

Navigating the Murky Waters: The Challenge of Measuring Economic Policy Uncertainty

Effective economic forecasting and strategic business planning depend heavily on a clear understanding of prevailing economic conditions, but anticipating the impact of future policy shifts requires gauging Economic Policy Uncertainty (EPU). While seemingly straightforward, accurately measuring EPU presents a significant challenge. Traditional methods, often relying on broad economic indicators or surveys, frequently fail to capture the subtle, anticipatory anxieties that drive investment decisions and market volatility. These conventional approaches struggle to differentiate between general economic concern and uncertainty specifically tied to potential governmental actions, leading to imprecise assessments. Consequently, policymakers and investors alike may operate with incomplete information, potentially leading to suboptimal outcomes and increased economic risk. A more nuanced approach is therefore needed to effectively quantify this crucial, yet elusive, component of the economic landscape.

Current methods for quantifying Economic Policy Uncertainty (EPU), such as the widely-used Baker, Bloom, and Davis Index, often function by tracking the frequency of specific keywords appearing in news articles. While providing a valuable initial assessment, this approach inherently struggles with the subtleties of language. The simple presence of terms like “uncertainty” or “regulation” doesn’t fully capture the degree of uncertainty or the specific nature of policy concerns. Nuance is lost; for instance, a statement acknowledging potential regulatory changes might be flagged as uncertain, even if the changes are clearly defined and anticipated. This reliance on keyword counts risks misinterpreting rhetorical devices, sarcasm, or indirect expressions of concern, ultimately leading to an incomplete and potentially misleading picture of actual economic anxieties as reflected in textual data.

To move beyond simplistic keyword counts, researchers are increasingly turning to sophisticated text classification techniques for a more nuanced understanding of economic policy uncertainty. These methods, leveraging advancements in natural language processing and machine learning, analyze not just the presence of uncertainty-related terms, but also the semantic context and sentiment surrounding them. Algorithms can now discern subtle expressions of uncertainty – for instance, distinguishing between a statement expressing genuine concern about policy shifts and one merely mentioning the possibility of change. By training models on large datasets of economic news, policy statements, and financial reports, it becomes possible to categorize textual data based on the degree and nature of expressed uncertainty, providing a more granular and reliable measure for economists and policymakers. This shift towards advanced classification promises a more accurate reflection of market anxieties and a stronger foundation for informed economic decision-making.

A single fine-tuned Llama 3.1 model effectively generates informative Early Political Risk (EPU) indicators across 29 languages-as demonstrated by bootstrapped F1 scores and GDP-weighted indices for Africa and Bangladesh-suggesting its potential for cross-lingual and cross-country risk assessment.

From Surface Signals to Deep Understanding: Leveraging Large Language Models

Traditional text classification methods relied heavily on keyword matching, treating text as a bag of words and failing to account for word order or semantic meaning. Large Language Models (LLMs) represent a substantial advancement by incorporating contextual understanding and semantic relationships into the classification process. Instead of simply identifying the presence of specific terms, LLMs analyze the surrounding text to determine the meaning and intent, enabling them to differentiate between nuanced language and identify the true subject matter with greater accuracy. This is achieved through complex neural network architectures trained on massive datasets, allowing the model to learn the relationships between words and concepts and, consequently, classify text based on its overall meaning rather than isolated keywords.

Models such as BERT, Longformer, and Llama 3.1 demonstrate proficiency in analyzing textual data to detect nuanced indicators of economic policy uncertainty (EPU). These models utilize transformer architectures and extensive pre-training on large corpora, enabling them to move beyond simple keyword identification to assess the semantic context of statements related to fiscal, monetary, and regulatory policies. Specifically, they can identify uncertainty expressed through conditional language, subjective phrasing, and references to potential future events. Performance is measured through correlation with established EPU indices and the ability to predict economic indicators, consistently outperforming traditional methods reliant on hand-crafted dictionaries or bag-of-words approaches. The models’ ability to handle long-range dependencies, particularly in the case of Longformer, is crucial for interpreting complex policy documents and news articles.

The Attention Mechanism is a core component enabling the performance of Large Language Models (LLMs) in tasks like text classification. Unlike sequential processing methods, Attention allows the model to weigh the importance of different words in the input sequence when processing each word. This is achieved through the calculation of attention weights, which determine the degree to which each input word contributes to the representation of other words. Specifically, the model computes a weighted sum of the input embeddings, where the weights are derived from the relationships between the words – effectively allowing the model to ‘focus’ on the most relevant parts of the text when making predictions. The mechanism typically involves three learned weight matrices – Query, Key, and Value – used to compute attention scores based on dot products or other similarity functions, and is crucial for handling long-range dependencies within text.

Longformer-2048 consistently outperforms traditional methods like Bag-of-Words and Support Vector Machines in article-level classification of economic policy, demonstrating strong alignment with human assessments and improved accuracy, particularly within its context window.

Rigorous Assessment: Optimizing Performance and Ensuring Reliability

Accurate evaluation of binary text classification models necessitates the selection of appropriate performance metrics beyond overall accuracy, particularly when class imbalance exists. We utilize Youden’s Index, calculated as $J = Sensitivity + Specificity – 1$, to determine the optimal classification threshold. This index maximizes the model’s ability to correctly identify both positive and negative cases by balancing sensitivity (true positive rate) and specificity (true negative rate). By optimizing for Youden’s Index, we achieve a threshold that best differentiates between classes, leading to more reliable and robust classification results compared to relying solely on default threshold values.

Prior to utilizing Large Language Models (LLMs) for analysis of historical texts, Optical Character Recognition (OCR) is a critical preprocessing step. Historical documents are frequently encountered as scanned images or non-searchable PDF files; OCR technology converts these visual representations into machine-readable text formats. This conversion is essential because LLMs require text input to perform tasks such as named entity recognition, sentiment analysis, or topic modeling. Without OCR, the documents remain inaccessible to these analytical tools, preventing effective data extraction and subsequent analysis. The accuracy of OCR significantly impacts the quality of downstream LLM processing; therefore, employing robust OCR engines and post-processing error correction techniques is vital for reliable results.

The integration of historical text analysis with robust text classification techniques enables the longitudinal tracking of Economic Policy Uncertainty (EPU). Recent evaluations demonstrate that fine-tuned Large Language Models (LLMs) significantly outperform traditional keyword-based models in this application, achieving a 46% relative improvement in F1 score. This performance gain indicates that LLMs are more effective at accurately identifying and classifying text indicative of EPU, offering more reliable data for policymakers and researchers studying economic trends and their influencing factors. The increased accuracy facilitates a more nuanced understanding of EPU fluctuations over time, potentially informing more effective policy decisions and economic forecasting.

Comparative analysis of Early Public Understanding (EPU) index construction reveals that choices regarding aggregation methods and threshold rules significantly impact index amplitude and baseline, as demonstrated by both audit sample and full historical corpus evaluations using Longformer and other classifiers.

Beyond the Numbers: Acknowledging Uncertainty and Charting Future Directions

Quantifying economic policy uncertainty (EPU) inherently involves measurement error, a reality stemming from the subjective nature of news and the complexities of natural language processing. This isn’t a flaw to be eliminated, but rather a constant to be acknowledged and addressed. Researchers are increasingly focused on understanding the sources of these errors-including ambiguity in media reporting and the limitations of automated text analysis-and developing methods to mitigate their influence. Sophisticated statistical techniques, such as Bayesian modeling and error-robust estimation, are employed to account for uncertainty in EPU indices. Ignoring measurement error can lead to overstated conclusions about the impact of policy uncertainty on economic outcomes; therefore, transparent reporting of error bounds and sensitivity analyses are crucial for ensuring the reliability and validity of EPU research and its implications for policymakers.

The analytical power of Economic Policy Uncertainty (EPU) measurement isn’t limited by language; recent advancements in multilingual modeling significantly broaden its scope. These techniques allow researchers to analyze news articles and policy statements – previously inaccessible due to linguistic barriers – from a vastly wider range of countries and regions. This expansion is critical because EPU isn’t solely a phenomenon of major economic powers; political and regulatory shifts in emerging markets and developing nations can have substantial global repercussions. By processing text in multiple languages, these models capture a more comprehensive picture of worldwide policy uncertainty, revealing localized risks and opportunities that might otherwise be overlooked. Consequently, a more nuanced understanding of global economic interconnectedness emerges, facilitating more informed international policy decisions and risk assessments.

A more sophisticated understanding of Economic Policy Uncertainty (EPU) emerges from the integration of advanced text classification methods with a rigorous acknowledgement of data limitations. This approach moves beyond simple keyword counts, instead leveraging machine learning to discern subtle shifts in policy-related news and reporting. Crucially, the methodology doesn’t simply accept data at face value; it actively accounts for potential biases, reporting inconsistencies, and the inherent challenges of quantifying subjective concepts like ‘uncertainty’. By carefully calibrating models to reflect these constraints, researchers can generate EPU assessments that are not only more precise, but also more robust and reliable. This, in turn, provides policymakers with improved tools for forecasting economic trends, mitigating risks, and ultimately fostering greater macroeconomic stability through informed decision-making.

The pursuit of quantifying abstract concepts like policy uncertainty demands a rigorous approach to measurement. This paper demonstrates a shift from relying on simple keyword counts to leveraging the nuanced understanding of large language models. This echoes Galileo Galilei’s assertion: “You cannot teach a man anything; you can only help him discover it himself.” The models don’t define uncertainty, but rather facilitate the discovery of its presence and intensity within textual data. By moving beyond surface-level indicators, the research offers a more accurate and comprehensive understanding – a discovery made possible through computational tools that reveal hidden patterns, aligning with a philosophy that structure dictates behavior and a good system understands the whole.

What Lies Ahead?

The demonstrated utility of large language models in quantifying policy uncertainty, while promising, merely shifts the locus of the problem, rather than resolving it. The construction of any index, regardless of its sophistication, relies on an implicit theory of how language reflects underlying economic realities. The current paradigm focuses on detecting uncertainty, but neglects the crucial question of why uncertainty manifests as it does. A truly robust approach requires a move beyond purely statistical correlations, toward models that incorporate behavioral economics and institutional knowledge – understanding not just that policy creates anxiety, but how specific policies shape expectations and investment decisions.

Furthermore, the multilingual capabilities, while valuable, reveal a deeper challenge: the inherent cultural specificity of language. Translating nuance across linguistic boundaries is not simply a matter of finding equivalent words; it demands an understanding of differing legal frameworks, political histories, and social norms. The temptation to treat language as a universal code must be resisted, replaced by a commitment to localized models calibrated to specific contexts. This necessitates a shift from centralized, “one-size-fits-all” indices to a distributed network of localized measurements.

Ultimately, the pursuit of objective economic measurement through textual analysis is a paradoxical endeavor. The very act of observation alters the observed. Good architecture is invisible until it breaks, and only then is the true cost of decisions visible.

Original article: https://arxiv.org/pdf/2511.17866.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Navigating the Murky Waters: The Challenge of Measuring Economic Policy Uncertainty

From Surface Signals to Deep Understanding: Leveraging Large Language Models

Rigorous Assessment: Optimizing Performance and Ensuring Reliability

Beyond the Numbers: Acknowledging Uncertainty and Charting Future Directions

What Lies Ahead?

See also: