Author: Denis Avetisyan
New research reveals that large language models systematically misrepresent the diversity of public opinion on climate change, potentially skewing engagement and policy discussions.

The study demonstrates that these models exhibit biases, particularly through racialized gender stereotypes and the compression of diverse viewpoints, raising concerns about equitable climate communication and policy design.
Despite the increasing reliance on large language models to analyze public opinion and inform policy, a critical gap remains in understanding how accurately these AI systems represent diverse viewpoints. Our research, ‘How Large Language Models Systematically Misrepresent American Climate Opinions’, investigates this issue by comparing AI-generated responses to a nationally representative survey on U.S. climate opinions, revealing a tendency for LLMs to compress the diversity of viewpoints and exhibit intersectional biases. Specifically, we find evidence of racialized gender stereotypes influencing AI predictions, misrepresenting the climate opinions of Black Americans. These findings raise concerns about the potential for LLMs to undermine equitable climate governance-can we ensure AI-driven climate communication accurately reflects, rather than distorts, the public’s voice?
Predicting Beliefs: A Cautionary Tale of Echo Chambers
The increasing sophistication of Large Language Models (LLMs) has extended their application beyond simple text generation to the modeling of nuanced human characteristics, notably beliefs surrounding complex issues like climate change. Researchers are leveraging these models to understand the distribution of climate opinions, predict individual viewpoints, and even simulate public discourse. This approach offers a novel pathway to analyze the factors influencing belief formation and potentially identify strategies for more effective communication. However, this burgeoning field necessitates careful consideration of the models’ limitations and potential biases, ensuring that LLMs accurately reflect the full spectrum of human thought rather than reinforcing existing societal patterns.
The increasing application of Large Language Models to understand human beliefs, such as those surrounding climate change, necessitates rigorous evaluation beyond simple predictive accuracy. While these models demonstrate impressive capabilities in processing and generating text, their training data often reflects existing societal biases and may not adequately represent the full spectrum of viewpoints. Consequently, LLMs risk perpetuating skewed understandings or overlooking marginalized perspectives, leading to inaccurate or unfair representations of public opinion. Careful scrutiny of model outputs, coupled with diverse and representative training datasets, is therefore crucial to ensure these powerful tools contribute to a more nuanced and equitable understanding of complex social issues.
Predicting public opinion on climate change demands more than simply increasing the size of large language models. Recent research indicates that while LLMs can identify broad trends, they demonstrably compress the true diversity of human viewpoints; specifically, these models exhibit 28% less variance in responses compared to actual human opinions across a range of questions. This suggests that scaling model parameters alone does not capture the nuanced interplay of demographic factors-age, location, political affiliation, and more-that genuinely shape individual beliefs. Consequently, LLMs risk presenting a homogenized and potentially inaccurate picture of climate opinion, highlighting the need for methodologies that explicitly incorporate and represent the full spectrum of demographic influences on belief formation.

Ground Truth and the Illusion of Objectivity
The ‘Climate Change in the American Mind’ (CCAM) Survey served as the foundational dataset for establishing ground truth regarding public climate change opinions. This survey, conducted by Yale and George Mason Universities, employs a nationally representative sample obtained through stratified random sampling techniques. The methodology ensures proportional representation across key demographic variables including age, gender, race, education level, and geographic region, resulting in a dataset capable of generalizing findings to the broader U.S. population. Data collection utilizes a combination of telephone and online surveys to maximize response rates and minimize potential biases associated with single-mode administration. The survey instrument focuses on assessing public beliefs, attitudes, and behaviors related to climate change, providing a quantifiable measure of climate opinion for comparison with LLM predictions.
To assess the capacity of large language models (LLMs) to infer climate change opinions, we prompted GPT-5, Llama-3, and Gemma-3 with demographic profiles – specifically age, gender, income, education level, and political affiliation – and requested predictions regarding each individual’s stated beliefs on climate change. These models were utilized in a zero-shot learning paradigm, meaning no specific training data relating to climate opinions was provided prior to the prediction task. The LLM-generated predictions were then compared against the actual responses recorded in the ‘Climate Change in the American Mind’ (CCAM) survey to quantify performance and identify systematic biases.
Fixed effects regression was employed to account for unobserved heterogeneity and potential confounding variables within the dataset. This statistical method incorporates individual-specific intercepts, effectively controlling for any time-invariant characteristics of each respondent that might influence both their climate opinions and the LLM’s predictive accuracy. By isolating the effect of demographic variables – such as age, gender, and education – on the residual errors of the LLM predictions, we minimized the risk of attributing prediction inaccuracies to spurious correlations or omitted variable bias. The resulting coefficients represent the estimated impact of each demographic factor on prediction error, holding all other individual characteristics constant. This approach provides a more robust and reliable assessment of how well LLMs can predict climate opinions based solely on observable demographic profiles.

Unmasking the Biases: Beyond Simple Demographics
Statistical analysis demonstrated a consistent gender bias in Large Language Model (LLM) outputs regarding climate change concern. Specifically, LLMs underestimated the reported climate concern of female respondents by a coefficient of -0.021. This finding is statistically significant, as indicated by a p-value of less than 0.05, suggesting that the observed underestimation is unlikely due to random chance. The coefficient represents the average difference in predicted climate concern scores between male and female respondents, adjusted for other variables in the model. This bias indicates a systematic tendency of the LLM to associate lower levels of climate concern with female respondents, even when controlling for other factors.
Analysis indicates a racial bias exists in LLM predictions of climate concern, specifically an overestimation of opinions held by Black respondents. This overestimation is quantified by a coefficient of 0.030, indicating that the model predicts a higher level of climate concern for Black respondents than is statistically warranted based on the input data. The associated p-value of less than 0.05 indicates that this overestimation is statistically significant, suggesting it is not due to random chance and represents a systematic bias within the model’s predictions.
Analysis indicates the presence of an ‘Intersectional Bias’ in LLM responses, manifesting as increased misrepresentation of climate concern among individuals holding multiple marginalized identities. This bias is not simply an aggregate of gender and racial biases acting independently; the combined effect significantly deviates from the sum of its parts. Specifically, the degree of misprediction is greatest for groups experiencing the convergence of multiple forms of marginalization, suggesting that LLMs struggle to accurately assess the climate opinions of individuals with complex social identities. This indicates a systemic failure in the model’s ability to account for the nuanced interplay of demographic factors when predicting individual beliefs.
Analysis indicates that Large Language Models (LLMs) do not merely mirror pre-existing societal biases in climate concern prediction; they actively exacerbate these biases. Specifically, the models generated a 24-point differential in predicted climate concern scores between Black conservative males and females. This amplification stems from the intersection of both gender and racial biases observed in the data – a -0.021 coefficient for gender bias and a 0.030 coefficient for racial bias – which, when combined, produce a disproportionately large divergence in predicted responses for this demographic intersection. This suggests LLMs can introduce and heighten disparities beyond those present in the initial data, potentially leading to inaccurate or inequitable outcomes in applications relying on these predictive models.

The Illusion of Consensus: Homogenization and the Loss of Nuance
Analysis reveals a phenomenon termed ‘Variance Compression’ within large language models when predicting climate opinions. The study demonstrates a consistent tendency for these models to cluster responses tightly around the average opinion, effectively diminishing the represented diversity. Quantified results indicate a reduction in response variance of 28%, suggesting a significant loss of nuance in the simulated public discourse. This compression isn’t simply about predicting the most likely opinion; rather, the LLMs consistently underestimate the full spectrum of viewpoints, potentially creating a skewed representation of actual public sentiment on climate change.
Despite efforts to introduce greater variability in responses, research indicates that adjusting the ‘Model Temperature’ parameter – a common method for controlling the randomness of large language model outputs – proved surprisingly ineffective in mitigating the observed homogenization of climate opinions. This parameter, typically used to encourage more diverse or creative text generation, failed to significantly alter the tendency of these models to cluster predicted viewpoints around the average. The findings suggest that the compression of variance isn’t simply a consequence of overly predictable outputs, but rather an intrinsic characteristic of how these models internally represent and process the complex spectrum of public sentiment regarding climate change. This inherent bias poses challenges for accurately gauging public opinion and could ultimately limit the effectiveness of climate communication strategies reliant on these technologies.
The study reveals that the tendency of large language models to flatten the spectrum of climate opinions isn’t merely a consequence of predictable outputs; rather, the bias appears deeply embedded within the models’ internal representation of public sentiment. Analyses indicate that even when adjusting the ‘temperature’ parameter – which governs the randomness of generated text – the compression of diverse viewpoints persists. This suggests the LLM doesn’t simply lack nuanced understanding, but actively constructs a homogenized version of climate opinions, potentially reflecting and reinforcing dominant narratives present in its training data. Consequently, these models offer a skewed portrayal of public belief, presenting a narrower range of perspectives than actually exists and raising concerns about their reliability for accurately gauging public opinion on critical issues.
The compression of diverse climate opinions by large language models carries substantial risks for accurately gauging public sentiment and, consequently, for effective communication strategies. This homogenization isn’t merely a statistical quirk; it actively constructs a narrower representation of what people believe, potentially obscuring legitimate concerns or minority viewpoints. Policymakers and communicators relying on these models for insights could therefore be operating with a fundamentally distorted understanding of the public landscape, leading to ill-targeted initiatives or messaging that fails to resonate with key demographics. The result is a feedback loop where the perceived consensus, shaped by the model’s limitations, further reinforces a constrained narrative, hindering genuine dialogue and potentially exacerbating societal polarization around climate issues.

The study meticulously details how these large language models, despite their sophistication, flatten nuanced opinions into predictable demographics. It’s a predictable outcome, really. As Donald Knuth observed, “Premature optimization is the root of all evil.” These models, optimized for seeming coherence, sacrifice accuracy and intersectionality in representing genuine climate concerns. The research highlights a compression of viewpoints, essentially prioritizing a smooth narrative over representing the messy reality of differing opinions – a clear case of optimizing for presentation before ensuring the data is sound. One anticipates, inevitably, that this ‘scalable’ solution will require significant refactoring when confronted with the inconvenient truths of actual public sentiment.
What’s Next?
The predictable march of algorithmic simplification continues, apparently. This work demonstrates that even with terabytes of ingested data, large language models still reliably produce caricatures of public opinion – specifically, when it comes to something as fraught as climate change. The surprise isn’t that bias exists – it always does – but the consistency with which these systems flatten nuance into readily digestible, racially and gendered tropes. Tests, of course, are merely a form of faith, not certainty.
Future research will undoubtedly focus on ‘debiasing’ techniques, a Sisyphean task if history is any guide. The real problem isn’t a lack of clever algorithms, but the fundamental mismatch between the complexity of human belief and the reductive logic of these models. Expect a proliferation of metrics claiming to measure ‘fairness’ while actual disparities remain stubbornly entrenched. One imagines production systems will find ingenious ways to amplify these biases, regardless of stated intentions.
Perhaps the more interesting question isn’t how to fix the models, but how to accept their inherent limitations. A system that can generate plausible text is not a system that understands the motivations or concerns of diverse communities. To treat it as such is a category error. The focus should shift toward transparency – acknowledging how these models distort reality, not pretending they can accurately reflect it. It’s a long shot, but one can hope for a pragmatic acceptance of imperfection, instead of chasing the phantom of algorithmic objectivity.
Original article: https://arxiv.org/pdf/2512.23889.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- The Rookie Saves Fans From A Major Disappointment For Lucy & Tim In Season 8
- Stranger Things’s Randy Havens Knows Mr. Clarke Saved the Day
- James Cameron Has a Backup Plan for Avatar
- New look at Ralph Fiennes in 28 Years Later: The Bone Temple sparks hilarious Harry Potter comparisons
- How does Stranger Things end? Season 5 finale explained
- Games Want You to Play Forever, But Dispatch Tells You When to Stop
- Ozark: The Ultimate Breaking Bad Replacement on Netflix
- Decoding the Crypto Transformation: Is It Still the Wild West?
- Chevy Chase Was Put Into a Coma for 8 Days After Heart Failure
- Why Natasha Lyonne Wanted To Move Away From Poker Face, And Whether She’d Play Charlie Cale Again
2026-01-02 13:27