Decoding ESG: A New Framework for Report Analysis

Author: Denis Avetisyan


Researchers have developed an innovative system to automatically parse and understand complex Environmental, Social, and Governance reports, unlocking valuable insights for financial analysis.

The Pharos-ESG system establishes a framework for evaluating the environmental, social, and governance (ESG) performance of companies, leveraging a multi-faceted approach to assess sustainability beyond traditional financial metrics and incorporating data points such as $CO_2$ emissions, labor practices, and board diversity to quantify a holistic responsibility profile.
The Pharos-ESG system establishes a framework for evaluating the environmental, social, and governance (ESG) performance of companies, leveraging a multi-faceted approach to assess sustainability beyond traditional financial metrics and incorporating data points such as $CO_2$ emissions, labor practices, and board diversity to quantify a holistic responsibility profile.

Pharos-ESG introduces a multimodal parsing framework and releases Aurora-ESG, a large-scale dataset for improved ESG report understanding and hierarchical labeling.

Despite the growing importance of Environmental, Social, and Governance (ESG) principles in financial governance, extracting meaningful insights from the often-chaotic layouts and weakly structured content of ESG reports remains a significant challenge. This paper introduces Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report, a unified system that transforms these complex documents into structured, analytically-ready representations through multimodal parsing and contextual narration. We demonstrate consistent performance gains over existing document parsing and multimodal models, and further contribute Aurora-ESG, the first large-scale public dataset of annotated ESG reports. Will this framework unlock a new era of data-driven ESG integration and more informed financial decision-making?


Decoding the Noise: Why ESG Reports Resist Automation

The growing importance of sustainable investing relies heavily on Environmental, Social, and Governance (ESG) reports, yet these documents present a significant analytical challenge. Unlike standardized financial filings, ESG reports frequently exhibit inconsistent visual layouts, varying section orders, and a reliance on infographics and tables embedded within narrative text. This lack of uniformity directly impedes automated data extraction and analysis, as algorithms designed for structured data struggle to parse the complex interplay of text and visuals. Consequently, investors and analysts face difficulties in efficiently comparing ESG performance across companies, hindering the effective allocation of capital towards genuinely sustainable initiatives. The inherent difficulty in converting these visually-rich reports into machine-readable data limits the scalability of ESG integration into mainstream investment strategies.

Current Natural Language Processing techniques, while powerful in many applications, often falter when applied to Environmental, Social, and Governance reports due to their unique structural challenges. These documents don’t present information linearly; instead, they embed crucial data within a complex, often unstated, hierarchical framework of sections, sub-sections, and visually demarcated tables. Traditional NLP algorithms, designed to process text sequentially, struggle to decipher these implicit relationships, leading to misinterpretations and inaccurate data extraction. For instance, a key performance indicator might be presented within a specific context – a particular geographic region or business unit – information that a standard NLP model could easily overlook, misattributing the data or failing to recognize its limitations. This inability to accurately map the document’s internal organization significantly hinders the reliable automation of ESG data analysis and undermines the potential for effective sustainable investment strategies.

Current methods of extracting data from Environmental, Social, and Governance (ESG) reports are proving inadequate due to a fundamental challenge: inconsistent presentation. Unlike standardized financial filings, ESG reports exhibit significant variation in formatting, relying heavily on tables, charts, and narrative descriptions rather than structured data. This prevalence of visual elements and free-form text necessitates a shift beyond traditional Natural Language Processing (NLP) techniques, which struggle to interpret information not presented in a consistent, machine-readable format. Successfully decoding these complex documents requires innovative approaches capable of integrating visual and textual analysis, effectively ‘reading’ the report as a whole and accurately identifying key performance indicators and qualitative disclosures – a capability crucial for investors seeking reliable sustainability data.

Pharos-ESG: Mapping Structure from Chaos

Pharos-ESG tackles the difficulties inherent in extracting data from Environmental, Social, and Governance (ESG) reports by integrating multimodal parsing, contextual grounding, and hierarchical labeling into a single framework. Traditional ESG data extraction struggles with the diverse formats – text, tables, charts – and implicit structural relationships within these reports. Pharos-ESG addresses this by parsing information from multiple modalities simultaneously, grounding the extracted data within the specific context of the report, and then organizing it using a hierarchical labeling system that reflects the report’s inherent structure. This unified approach aims to improve both the accuracy and efficiency of ESG data extraction, enabling more reliable analysis and reporting.

Pharos-ESG employs Contextual Transformation and Reading Order Modeling to enhance the accurate interpretation of visual elements within ESG reports and their associated relationships. Contextual Transformation refines element understanding by considering surrounding text and visual cues, while Reading Order Modeling establishes the logical sequence in which elements should be processed. This combined approach yields a Reading Order Kendall’s Tau (ROKT) of 0.92, demonstrating a high degree of correlation between the predicted and actual reading order of elements within the documents, and validating the framework’s capacity for precise structural analysis.

ToC-Guided Hierarchical Structure Reconstruction is a core component of the Pharos-ESG framework, addressing the challenge of extracting structured data from semi-structured ESG reports. This process utilizes the Table of Contents as a guiding mechanism to infer and rebuild the underlying hierarchical organization of the document. By analyzing the ToC, the framework identifies key sections, subsections, and their relationships, effectively mapping the implicit structural information present in the report. This reconstructed hierarchy then serves as a foundation for accurately parsing and extracting specific data points, improving the precision and recall of information retrieval from complex ESG documentation.

Precision Through Alignment: Bridging Content and Structure

Pharos-ESG utilizes Region-Aware Prompting (RAP) and the ALIGN model to establish a precise correspondence between the Table of Contents (ToC) structure and the relevant content sections within Environmental, Social, and Governance (ESG) reports. RAP strategically segments the report, enabling targeted prompting for content identification. ALIGN, a contrastive learning framework, then trains the model to accurately match ToC entries with their corresponding content blocks based on semantic similarity. This process ensures that each section of the ESG report is correctly associated with its designated ToC heading, facilitating downstream parsing and analysis tasks by providing a structurally sound foundation for information extraction.

The Pharos-ESG system achieves high parsing accuracy by integrating Region-Aware Prompting (RAP) and ALIGN with the LayoutLMv3 content encoding model. This combination facilitates precise identification and extraction of data from ESG reports, resulting in a Parsing F1 score of 93.59%. Evaluations demonstrate that this performance surpasses all baseline models used for comparison, indicating a significant improvement in the system’s ability to correctly parse and categorize information within the complex layouts typical of ESG documentation. The F1 score represents a harmonic mean of precision and recall, providing a balanced measure of the parsing system’s effectiveness.

Following content block parsing, Pharos-ESG utilizes a Multi-Level Labeling (MLL) process driven by the MLPDH model to categorize and annotate textual data. MLPDH assigns three key labels to each content block: ESG category, relevant GRI Indicators, and sentiment. This annotation process transforms unstructured text into structured data suitable for quantitative analysis, yielding a macro-F1 score of 86.32% as a measure of labeling accuracy. The resulting structured dataset facilitates detailed reporting and evaluation of ESG performance based on established GRI standards and sentiment trends.

Beyond the Horizon: Scaling ESG Intelligence Globally

A newly compiled resource, Aurora-ESG, represents a significant leap forward in environmental, social, and governance (ESG) data accessibility. Built upon the outputs of the Pharos-ESG model, this expansive public dataset consolidates ESG reports from Mainland China, Hong Kong, and the United States into a single, structured format. Comprising 8 million distinct content blocks, Aurora-ESG currently stands as the largest structured ESG dataset available, offering researchers and analysts an unprecedented volume of information for comparative studies and in-depth investigations into corporate sustainability practices across diverse regulatory landscapes. The scale of this dataset promises to unlock new insights into global ESG trends and facilitate more robust assessments of corporate responsibility.

The Pharos-ESG model exhibits a notable capacity for cross-market generalization, consistently delivering reliable performance despite the diverse landscape of Environmental, Social, and Governance (ESG) reporting standards. This adaptability stems from its design, allowing it to effectively process and interpret data from Mainland China, Hong Kong, and the U.S.-regions with significantly different regulatory frameworks and disclosure practices. Unlike models often trained on a single standardized dataset, Pharos-ESG demonstrates robustness in handling variations in reporting metrics, terminology, and levels of detail, ensuring consistent and accurate analysis across these disparate markets. This capability is crucial for investors and organizations seeking a comprehensive and globally applicable understanding of ESG performance, as it mitigates the risks associated with region-specific biases and inconsistencies in data.

Rigorous comparative analysis demonstrates that Pharos-ESG significantly outperforms leading large language models, including Gemini 2.5 Pro and GPT-4o, in the nuanced task of extracting and interpreting Environmental, Social, and Governance data. Achieving a Hierarchical Logic Accuracy (HLA) of 94.78%, Pharos-ESG establishes a new benchmark as the most accurate model for ESG analysis currently available. This superior performance is further underscored by a macro-F1 score that exceeds the strongest baseline model by a substantial 6.78%, highlighting its ability to effectively discern complex relationships within ESG reports and deliver more reliable insights for investors and stakeholders. The results suggest that Pharos-ESG’s specialized architecture and training data are particularly well-suited for navigating the intricacies of ESG reporting, offering a considerable advancement in the field of sustainable investing.

Pharos-ESG, as detailed in the research, doesn’t simply accept the presented structure of ESG reports; it actively deconstructs them to understand the underlying relationships between data points. This echoes Edsger W. Dijkstra’s sentiment: “It’s not enough to just do something; you must understand why it works.” The framework’s hierarchical labeling and parsing of complex documents, particularly its attempt to infer reading order, isn’t about passively receiving information, but about actively reverse-engineering the report’s logic. The creation of Aurora-ESG, a large-scale dataset, further embodies this principle – providing the raw material for others to dissect, challenge, and ultimately, comprehend the intricacies of ESG reporting. It’s a purposeful dismantling to facilitate deeper insight.

What Lies Ahead?

Pharos-ESG, and datasets like Aurora-ESG, represent a step – a useful one, certainly – towards treating financial reporting as a decipherable system. But the illusion of ‘understanding’ emerges only when a system is sufficiently broken down, and the inherent messiness of ESG data continues to present challenges. The current framework excels at parsing what is reported, but says little about why certain information is prioritized, or what remains deliberately obscured. Consider this not as a solved problem, but as a particularly complex compiler – it translates, but doesn’t necessarily verify the underlying logic.

Future iterations will inevitably grapple with the problem of grounding. Currently, the system identifies hierarchical relationships within reports, but lacks a robust method for connecting those structures to real-world impacts. The next frontier isn’t simply about reading the report, but about cross-referencing its claims with independently verifiable data – satellite imagery, supply chain audits, on-the-ground reporting. It’s about moving beyond syntax to semantics, and ultimately, to verifiable truth.

The work suggests that reality is open source – humanity just hasn’t read the code yet. And even when the code is read, interpreting intent remains the ultimate challenge. The true measure of success won’t be in automating the extraction of data, but in building systems that can intelligently question the data itself, and expose the assumptions embedded within these reports.


Original article: https://arxiv.org/pdf/2511.16417.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-24 03:27