Author: Denis Avetisyan
A new framework leverages knowledge graphs and large language models to critically assess sustainability claims and expose misleading environmental reporting.

EmeraldMind is a retrieval-augmented generation system designed for improved greenwashing detection, fact verification, and responsible ESG reporting.
Despite growing reliance on AI for decision-making, verifying the sustainability claims of corporations remains a critical challenge, particularly given the prevalence of greenwashing. To address this, we introduce EmeraldMind: A Knowledge Graph-Augmented Framework for Greenwashing Detection, a fact-centric system integrating a domain-specific knowledge graph with retrieval-augmented generation. Our framework, EmeraldMind, demonstrably improves both the accuracy and transparency of greenwashing detection by surfacing verifiable evidence and providing justification-centric classifications-outperforming generic large language models without requiring fine-tuning. Could this approach pave the way for more reliable and accountable ESG assessments across industries?
The Erosion of Trust: Greenwashing and the Imperative for Rigorous Verification
The proliferation of Environmental, Social, and Governance (ESG) reports has coincided with growing concern over greenwashing – the practice of misleading stakeholders about a company’s sustainability efforts. Increased public and investor awareness, coupled with heightened media scrutiny, now subjects corporate disclosures to rigorous examination. Companies are facing mounting pressure to substantiate claims made within these reports, as unsubstantiated assertions can damage reputation and erode trust. This skepticism extends beyond simple marketing claims; detailed analysis is revealing inconsistencies between stated values and actual practices, prompting calls for standardized reporting metrics and independent verification to ensure the integrity of ESG data and prevent the misallocation of capital towards falsely sustainable ventures.
The sheer scale of Environmental, Social, and Governance (ESG) reporting now presents a significant verification challenge. Companies increasingly publish detailed ESG reports, often exceeding hundreds of pages, filled with nuanced qualitative data and quantitative metrics. Traditional verification methods, reliant on manual review by analysts and auditors, are simply overwhelmed by this volume. The complexity isn’t merely about quantity; ESG disclosures frequently lack standardized definitions and rely on self-reported data, making consistent assessment difficult. Furthermore, claims are often embedded within lengthy narratives, requiring substantial time and expertise to extract and validate. This creates a bottleneck, hindering effective oversight and opening the door to unsubstantiated claims, ultimately undermining the credibility of ESG investing and the pursuit of genuine sustainability.
The evolving landscape of sustainable finance is being reshaped by stringent new regulatory frameworks, notably the Sustainable Finance Disclosure Regulation (SFDR) and the Corporate Sustainability Reporting Directive (CSRD). These directives compel organizations to provide detailed, standardized reporting on their environmental and social impacts, moving beyond voluntary disclosures. This increased demand for granular data, coupled with the sheer volume of ESG reports now being published, presents a significant verification challenge. Manual review is no longer scalable or efficient, creating a pressing need for automated verification tools capable of sifting through complex data, identifying inconsistencies, and ensuring the accuracy and reliability of sustainability claims. The implementation of such technologies isn’t merely about compliance; it’s becoming crucial for maintaining investor trust and fostering genuine progress toward sustainable investment goals.
EmeraldGraph: A Knowledge Foundation for ESG Analysis
EmeraldGraph serves as the core data structure for our ESG solution, implemented as a Knowledge Graph to model entities relevant to Environmental, Social, and Governance factors and the complex relationships between them. This graph-based approach allows for efficient storage and retrieval of interconnected data points, facilitating analysis beyond simple tabular datasets. Entities within EmeraldGraph represent organizations, policies, events, and concepts related to ESG, while relationships define how these entities interact – for example, a company investing in a specific renewable energy project, or a policy regulating carbon emissions. The Knowledge Graph architecture enables reasoning and inference, allowing the system to derive new insights from existing data and identify hidden connections between ESG factors.
Schema extraction is the process of identifying and classifying key information within unstructured ESG data – such as reports, news articles, and filings – to convert it into a structured, machine-readable format. This involves utilizing Natural Language Processing (NLP) techniques to pinpoint entities, relationships, and attributes relevant to ESG factors. The extracted data is then mapped to a predefined schema, defining the types of entities (e.g., companies, projects, policies) and the relationships between them (e.g., “company invests in project”, “policy regulates emission”). Successful schema extraction is essential for populating the EmeraldGraph Knowledge Graph, enabling efficient querying, analysis, and reasoning over ESG data, and ultimately facilitating the identification of patterns and insights that would be difficult or impossible to uncover from raw, unstructured text.
The EmeraldGraph knowledge base is populated from multiple data sources, notably unstructured ESG text residing in EmeraldDB and the structured benchmark dataset, EmeraldData. As of the current data snapshot, EmeraldData comprises a knowledge graph containing 53,748 unique entities and 59,344 defined relationships between those entities. This dataset serves as a foundational component, providing pre-populated data and a reference structure for incorporating information extracted from the larger volume of raw text within EmeraldDB. The combination of these sources facilitates both broad coverage and structured data integrity within the overall knowledge graph.
OntoSustain serves as a foundational, standardized vocabulary for representing Environmental, Social, and Governance (ESG) data, facilitating interoperability and consistent data interpretation. This ontology, alongside related knowledge graphs such as SustainGraph and KnowUREnvironment, defines classes and relationships relevant to sustainability reporting, enabling the uniform representation of ESG factors across diverse datasets. Utilizing these standardized vocabularies ensures that entities and their connections – for example, a company’s carbon emissions or supply chain practices – are consistently defined and categorized, which is crucial for data aggregation, analysis, and benchmarking. This approach minimizes ambiguity and supports automated reasoning and knowledge discovery within the ESG domain.

Automated Verification: RAG and Fact-Checking with EmeraldMind
EmeraldMind is a Retrieval-Augmented Generation (RAG) framework specifically designed for the analysis of Environmental, Social, and Governance (ESG) reports. The system utilizes Large Language Models (LLMs) to process textual data within these reports and identify statements that require verification. Unlike general-purpose LLMs, EmeraldMind’s domain specificity allows for targeted claim detection relevant to ESG disclosures. The framework’s architecture is optimized for identifying potentially misleading information, forming the basis for subsequent automated fact-checking procedures. This focus on ESG reporting differentiates EmeraldMind from broader RAG implementations and enables a more precise and effective evaluation of sustainability-related claims.
EmeraldMind employs EmeraldGraph, a structured knowledge base, to locate evidence pertinent to claims made within ESG reports. This retrieval process isn’t simply keyword-based; EmeraldGraph’s organization allows for the identification of relationships between concepts and data points. The framework queries EmeraldGraph using the claim as context, extracting relevant statements, metrics, and supporting documentation. Retrieved evidence consists of factual assertions and quantitative data directly linked to the original claim, enabling a comparative analysis to determine verification status. The structured format of EmeraldGraph facilitates precise evidence retrieval, minimizing ambiguity and enhancing the reliability of the fact-checking process.
Automated fact-checking within the EmeraldMind framework operates by directly comparing statements extracted from ESG reports against evidence retrieved from EmeraldGraph. This comparison process determines the veracity of each claim, enabling automated verification without manual intervention. Performance evaluations, utilizing the GreenClaims dataset and employing few-shot prompting techniques, have demonstrated an overall accuracy rate of 70.59% in identifying factual inconsistencies or validating claims.
The Retrieval-Augmented Generation (RAG) methodology employed by EmeraldMind utilizes EmeraldGraph as its knowledge source, resulting in improved accuracy and reliability of generated explanations. Evaluations on the EmeraldData dataset demonstrate coverage ranging from 62% to 77%, indicating the system’s ability to retrieve relevant information to support its explanations. This performance is directly attributable to the structured and validated data within EmeraldGraph, which minimizes reliance on potentially inaccurate or irrelevant information often encountered when using broader, less curated knowledge bases. The coverage metric specifically quantifies the proportion of claims for which supporting evidence was successfully retrieved from EmeraldGraph, establishing a measurable baseline for explanation fidelity.

Towards Responsible AI: Explainability and Trustworthy Insights
The foundation of reliable automated verification systems rests upon the delivery of evidence-based explanations. Without clear justification for automated conclusions, stakeholders are less likely to accept and act upon the information provided, hindering the effective implementation of these technologies. These explanations aren’t simply about stating what a system decided, but detailing why, referencing the specific data and reasoning processes that led to the outcome. Robust explanations facilitate scrutiny, allowing users to assess the validity of the system’s logic and identify potential biases or errors. Consequently, a system’s trustworthiness is directly correlated with the quality and comprehensiveness of its supporting evidence, making explainability a critical component for successful adoption and impactful results in fields like corporate sustainability reporting and beyond.
The quality of explanations generated by automated verification systems is rigorously evaluated using ILORA, a framework designed to assess comprehensiveness and logical soundness. Studies consistently reveal that systems leveraging ILORA achieve demonstrably higher scores across all five key dimensions of explanation quality – including fidelity, coherence, comprehensibility, sufficiency, and conciseness – when contrasted with baseline methods. This consistently superior performance indicates that ILORA not only identifies shortcomings in explanations but also effectively guides the development of more transparent and reliable AI systems, fostering increased confidence in their outputs and justifications.
The advancement of artificial intelligence demands a shift from simply identifying issues to delivering solutions stakeholders can utilize. Responsible AI principles facilitate this transition by embedding interpretability and accountability directly into automated systems. This proactive approach moves beyond flagging inconsistencies in data – such as within corporate sustainability reports – and instead provides clear, actionable insights that allow for informed decision-making and targeted improvements. By focusing on why a system reached a particular conclusion, rather than merely that it did, stakeholders gain the necessary context to address underlying problems and foster greater trust in the AI’s assessments. This ultimately moves the technology beyond a monitoring tool to a collaborative partner in achieving responsible and sustainable practices.
The pursuit of responsible AI is demonstrably enhancing accountability within corporate sustainability reporting. Recent evaluations, leveraging the Borda Score methodology, reveal a strong preference for justifications generated by the EM-RAG model. This preference isn’t merely subjective; EM-RAG consistently outperformed both EM-KGRAG and a baseline model across key metrics, indicating a superior ability to provide clear, logical reasoning for sustainability claims. This shift moves beyond simple detection of issues to offering stakeholders readily understandable insights, thereby increasing transparency and fostering greater confidence in reported environmental, social, and governance (ESG) data. The results suggest that AI-driven explanations are not only feasible but are actively improving the reliability and trustworthiness of corporate sustainability disclosures.

The pursuit of verifiable truth, as exemplified by EmeraldMind, resonates with a core tenet of mathematical rigor. Paul Erdős famously stated, “A mathematician knows a lot of things, but knows nothing deeply.” This sentiment underscores the necessity of robust frameworks like the one presented, which moves beyond superficial assessments of sustainability claims. EmeraldMind’s knowledge graph-augmented RAG approach mirrors the mathematician’s drive for foundational understanding, ensuring claims aren’t simply accepted as ‘working on tests’ but are grounded in verifiable evidence. The framework’s emphasis on fact verification, a key component of its design, embodies a commitment to mathematical purity – a solution must be demonstrably correct, not merely appear so.
What Lies Ahead?
The presented work, while a demonstrable improvement in identifying deceptive sustainability claims, merely scratches the surface of a profoundly difficult problem. The elegance of a knowledge graph-augmented retrieval system rests not in its immediate accuracy, but in its capacity for provable fact verification. Current evaluation metrics, focused on superficial agreement with human labels, fail to capture the crucial distinction between correlation and causation. A truly robust system demands a formalization of ‘greenwashing’ itself – a mathematical definition of deceptive intent, rather than reliance on subjective interpretations.
Future research must address the inherent scalability limitations of hand-curated knowledge graphs. The temptation to rely on Large Language Models for automated knowledge acquisition is strong, yet fraught with peril. LLMs, demonstrably prone to confabulation, introduce a new layer of uncertainty. The challenge lies in developing algorithms capable of discerning genuine expertise from fluent falsehood – a distinction that requires more than statistical pattern matching.
Ultimately, the pursuit of responsible AI in ESG reporting necessitates a shift in focus. The objective should not be to detect greenwashing, but to prevent it. This demands a proactive approach, integrating formal verification techniques into the very process of sustainability claim generation. Until then, the identification of deception remains a computationally expensive approximation of a fundamentally logical problem.
Original article: https://arxiv.org/pdf/2512.11506.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Super Animal Royale: All Mole Transportation Network Locations Guide
- Shiba Inu’s Rollercoaster: Will It Rise or Waddle to the Bottom?
- The best Five Nights at Freddy’s 2 Easter egg solves a decade old mystery
- Zerowake GATES : BL RPG Tier List (November 2025)
- Pokemon Theme Park Has Strict Health Restrictions for Guest Entry
- LINK PREDICTION. LINK cryptocurrency
- Daisy Ridley to Lead Pierre Morel’s Action-Thriller ‘The Good Samaritan’
- Wuthering Waves version 3.0 update ‘We Who See the Stars’ launches December 25
- xQc blames “AI controversy” for Arc Raiders snub at The Game Awards
- Where Winds Meet Gifting Free Rewards to Players, Teases More Game Improvements
2025-12-15 17:53