Reasoning for Fairness: A New Approach to Bias in AI

Author: Denis Avetisyan

Researchers are combining the strengths of logic programming and deep learning to build machine learning models that are both accurate and demonstrably fair.

ProbLog4Fairness demonstrates an ability to model label, measurement, and historical biases—approaching the performance of an unbiased upper baseline—through training that prioritizes accuracy alongside statistical parity, even when evaluated on data free of those same biases.

This paper introduces ProbLog4Fairness, a neurosymbolic framework for modeling and mitigating bias using probabilistic logic programming and causal reasoning.

Defining and addressing algorithmic bias remains challenging, as rigid fairness criteria often conflict or fail to capture nuanced, context-specific harms. This paper introduces ProbLog4Fairness: A Neurosymbolic Approach to Modeling and Mitigating Bias, a novel framework that formalizes bias assumptions as logical programs, enabling flexible integration with neural network training. By representing bias through probabilistic logic, our approach demonstrably mitigates distortions in both synthetic and real-world datasets, outperforming baseline methods constrained by fixed fairness notions. Could this neurosymbolic approach unlock a more adaptable and interpretable path towards truly equitable machine learning systems?

The Inevitable Echo: Bias in Machine Learning

Despite remarkable advancements, machine learning systems are increasingly recognized for their potential to reflect and even exacerbate existing societal inequalities. These models, trained on vast datasets mirroring real-world patterns, can inadvertently learn and perpetuate biases present within that data. This isn’t a flaw in the algorithms themselves, but rather a consequence of their ability to identify and amplify correlations – even those that are unjust or discriminatory. Consequently, applications ranging from facial recognition and loan approvals to criminal risk assessment and hiring processes can produce outcomes that disproportionately disadvantage certain demographic groups, raising serious ethical and legal concerns about fairness and equity in an increasingly automated world. The success of a model, measured simply by its accuracy, provides no guarantee against such biased outcomes, necessitating a more nuanced evaluation of their societal impact.

The emergence of bias in machine learning models is rarely a simple flaw, but rather a consequence of systematic issues embedded within the data itself. The process of data generation often reflects pre-existing societal inequalities; if historical data used to train a model contains prejudiced outcomes—such as biased loan approvals or disproportionate policing—the model will likely learn and perpetuate those same patterns. Furthermore, the very features chosen to represent data, and the way those features are measured and labeled, can introduce bias. For instance, relying on zip codes as a proxy for socioeconomic status might inadvertently discriminate against individuals living in historically marginalized communities. Even seemingly objective data points can be subtly influenced by subjective human judgments during the labeling process, leading to skewed representations and unfair algorithmic outcomes. Therefore, a thorough examination of the entire data lifecycle—from collection to labeling—is crucial for identifying and mitigating these insidious sources of bias.

The pursuit of high accuracy in machine learning, while a traditional benchmark of success, is increasingly recognized as an incomplete metric. Models capable of precise predictions can still yield profoundly unjust outcomes if those predictions disproportionately harm or disadvantage specific demographic groups. This disparity arises because algorithms learn patterns present in the training data, and if that data reflects existing societal biases—regarding race, gender, socioeconomic status, or other protected characteristics—the model will inevitably perpetuate and even amplify them. Consequently, a demonstrably fair and equitable model isn’t simply one that predicts correctly most of the time, but one that exhibits consistent performance across all relevant groups, demanding a rigorous evaluation of its impact beyond overall accuracy and a commitment to mitigating disparate outcomes, even if it necessitates a slight reduction in predictive power.

The pursuit of increasingly accurate machine learning models must be accompanied by a fundamental re-evaluation of evaluation criteria. Historically, emphasis has been placed almost exclusively on maximizing predictive power, often disregarding the potential for models to encode and amplify societal biases. A necessary progression involves actively identifying the origins of these biases – whether in skewed datasets, flawed feature engineering, or algorithmic design – and incorporating methods to mitigate them. This shift demands the development of novel techniques that explicitly model fairness, incorporating metrics beyond simple accuracy and prioritizing equitable outcomes across all demographic groups. Consequently, future research must prioritize not only what a model predicts, but how it arrives at those predictions, ensuring that algorithmic systems contribute to a more just and equitable world.

Our method effectively mitigates label bias when the features and labels are not independent, achieving performance comparable to training on unbiased data even with increasing levels of bias.

Logic as a Lever: Introducing ProbLog4Fairness

ProbLog4Fairness introduces a classification methodology that incorporates bias directly into the model’s probabilistic reasoning process. Utilizing ProbLog and DeepProbLog, the system moves beyond treating bias as an external factor to be corrected after prediction; instead, biasing mechanisms are integrated into the core classification logic. This is achieved by representing relationships and dependencies using Bayesian Networks, which are then implemented via DeepProbLog and Neural Networks. The integration allows for explicit modeling of bias sources – such as biased labels or measurement errors – and their influence on predictive outcomes, enabling a more nuanced and controlled approach to classification.

The ProbLog4Fairness framework addresses bias by representing its origins – whether stemming from inaccuracies in labeling processes, systematic errors in measurement techniques, or patterns present within historical datasets – as explicit components within a probabilistic logic model. This allows for the quantification of bias propagation through the prediction pipeline. By formally defining the sources of bias, the system can then calculate how these biases affect the probabilities assigned to different outcomes. This facilitates a detailed analysis of prediction sensitivity to biased inputs and enables the evaluation of mitigation strategies by observing changes in probabilistic outcomes when bias sources are modified or removed. The resulting model provides a transparent and auditable account of how bias influences predictions, moving beyond black-box fairness assessments.

Traditional fairness interventions often operate as post-processing steps applied to model outputs, potentially limiting their effectiveness and introducing unintended consequences. ProbLog4Fairness differs by integrating fairness considerations directly into the model learning phase. This is achieved by encoding fairness constraints and objectives – such as demographic parity or equalized odds – as logical rules within the ProbLog framework. Consequently, the model learns to satisfy these constraints during training, rather than attempting to correct for bias after predictions are made. This proactive approach allows for a more holistic optimization process, simultaneously maximizing predictive accuracy and adhering to predefined fairness criteria, leading to demonstrably fairer and more reliable outcomes.

ProbLog4Fairness leverages Bayesian Networks to model probabilistic dependencies between variables relevant to bias, including features, labels, and predicted outcomes. These networks, implemented using DeepProbLog and integrated with Neural Networks, facilitate the representation of complex relationships and allow for probabilistic inference regarding the impact of bias. Specifically, the system represents conditional probabilities $P(Y|X,B)$, where $Y$ is the predicted label, $X$ represents input features, and $B$ encapsulates bias factors. This allows for reasoning about how bias influences predictions given specific feature values, enabling the quantification and mitigation of unfair outcomes through manipulation of network parameters and incorporation of fairness constraints during the learning process.

Using the correct bias probability to estimate program parameters maximizes accuracy and statistical disparity, as demonstrated by performance on unbiased data after training with a fixed bias of 0.3, exceeding the upper baseline.

Beyond Measurement: Quantifying Fairness with Probabilistic Constraints

ProbLog4Fairness differentiates itself from traditional fairness assessments by moving beyond simply measuring disparities in outcomes to actively modeling the origins of those disparities. Established metrics such as Statistical Parity and Equalized Odds evaluate fairness post-hoc, but do not account for the factors driving unfairness. ProbLog4Fairness integrates probabilistic logic programming to represent potential sources of bias – such as biased features or discriminatory rules – as explicit components within the model. This allows for a more granular understanding of how bias manifests and propagates, enabling targeted interventions and a more accurate quantification of fairness beyond simple demographic parity or equalized error rates. By modeling these sources, the system can then assess fairness as a constraint satisfaction problem, offering a nuanced evaluation of disparity attributable to modeled bias.

Representing fairness as a probabilistic constraint allows for a more granular evaluation of disparity beyond simple threshold-based metrics. This is achieved by modeling fairness not as a binary classification of “fair” or “unfair”, but as a probability distribution reflecting the degree to which a model satisfies fairness criteria. This probabilistic approach accounts for uncertainty inherent in both the data and the model’s predictions, providing a more accurate and nuanced assessment of fairness violations. The constraint is formally integrated into the model’s objective function, enabling optimization that directly balances predictive accuracy with fairness considerations, and allows for quantifying the trade-offs between the two. This methodology moves beyond simply detecting disparate impact and enables a quantifiable measure of the degree to which fairness is achieved, expressed as a probability value.

Evaluations on both synthetically generated datasets and real-world datasets – specifically the Student and CELEB-A datasets – demonstrate that the ProbLog4Fairness approach achieves accuracy levels comparable to those of established upper baselines when subjected to various bias conditions. This performance consistency was observed across different types of bias present in the datasets, indicating the robustness of the methodology. Quantitative analysis confirms that the accuracy of ProbLog4Fairness does not significantly deviate from the performance of the upper baselines, validating its effectiveness as a fairness-aware modeling technique without compromising predictive power.

Experimental results on both the synthetic and real-world Student and CELEB-A datasets demonstrate that the ProbLog4Fairness method effectively reduces statistical disparity when bias mitigation techniques are applied. Specifically, the observed levels of disparity following mitigation closely approach the expected disparity levels calculated based on the inherent characteristics of the datasets. This indicates that the model not only reduces unfairness but does so in a predictable and controlled manner, aligning with established expectations for fairness in these contexts. Quantitative analysis confirms that the achieved disparity levels are consistently near the predetermined thresholds, validating the efficacy of the approach in minimizing unfair outcomes.

Evaluation on the Student dataset indicates an improvement in F1 Score when utilizing ProbLog4Fairness for bias mitigation. Specifically, the model achieves a higher harmonic mean of precision and recall compared to the baseline without fairness constraints. On the CELEB-A dataset, performance with bias mitigation is comparable to the upper baseline; while no statistically significant improvement was observed, the model maintains similar accuracy levels despite addressing potential biases. These results demonstrate that the approach effectively mitigates bias without substantial performance degradation on both datasets.

Combining bias mitigation across multiple facial attributes improves overall F1 score performance on unbiased data, as demonstrated by a 95% confidence region around the observed results.

Toward Robust and Equitable Systems: A Shift in Perspective

ProbLog4Fairness presents a novel approach to artificial intelligence development by directly incorporating bias modeling into the system’s logic. Rather than treating fairness as a post-hoc correction, the framework allows developers to explicitly define and account for potential biases during the model-building process, leading to systems demonstrably less prone to unfair or discriminatory outcomes. This proactive methodology moves beyond simply identifying bias in existing models and instead aims to build resilience against it from the outset. By representing biases as probabilistic constraints within the logic programming framework, the system can assess the likelihood of unfair decisions and mitigate them, offering a pathway towards more robust and equitable AI, particularly critical in sectors where algorithmic fairness is paramount, such as loan applications or medical diagnoses.

The potential for biased outcomes in artificial intelligence carries particularly weighty consequences within critical sectors like healthcare, finance, and criminal justice. In healthcare, algorithmic bias could lead to misdiagnoses or unequal access to treatment, exacerbating existing health disparities. Within financial systems, biased models may unfairly deny loans or insurance, perpetuating economic inequality. Perhaps most critically, the application of biased AI in criminal justice—such as risk assessment tools used in sentencing or parole decisions—can have profound and lasting impacts on individuals and communities, potentially reinforcing systemic biases and leading to unjust outcomes. Therefore, a rigorous approach to identifying and mitigating bias, such as that offered by ProbLog4Fairness, is not merely a technical refinement, but a crucial step towards ensuring fairness, accountability, and equitable access to opportunities within these high-stakes domains.

The efficacy of ProbLog4Fairness, and indeed many machine learning frameworks, is fundamentally linked to the quality of its Feature Vector; however, this vector is often susceptible to Measurement Bias, introducing systematic errors during data acquisition. This bias can stem from flawed instruments, inconsistent application of measurement protocols, or inherent limitations in capturing complex phenomena with simplified features. Consequently, meticulous feature engineering and rigorous data collection practices are paramount. Researchers must not only select relevant attributes but also actively mitigate potential sources of measurement error through calibration, standardization, and careful consideration of data provenance. Addressing this issue proactively ensures the model learns from accurate representations of reality, fostering more reliable and equitable outcomes, particularly in sensitive applications where biased features could perpetuate or amplify existing societal inequalities.

Achieving genuinely fair and unbiased artificial intelligence necessitates a comprehensive examination of how sensitive variables – characteristics like race, gender, or socioeconomic status – influence the very creation of datasets. It’s not simply about removing these variables post-collection, but understanding how they interact with the data generation process itself, potentially introducing systemic biases at the foundational level. These interactions can manifest subtly, affecting feature distributions, labeling practices, and even the availability of data for certain groups. A robust approach demands tracing the pathway from real-world phenomena to data representation, identifying points where sensitive variables could inadvertently skew the learning process. Consequently, developers must move beyond treating sensitive attributes as mere inputs and instead consider them as integral components shaping the data landscape, demanding careful scrutiny and mitigation strategies throughout the entire model lifecycle.

A core challenge in developing fair AI lies in understanding the relationship between data quantity and the reliability of bias estimation. Researchers have formalized this with a derived equation – $n ≥ 1 / (2ε² ) ln(2 / (1 – γ))$ – which precisely quantifies this trade-off. This relationship demonstrates that achieving higher accuracy ($ε$) in bias measurement, coupled with increased confidence ($γ$) in the results, necessitates a larger dataset size ($n$). Essentially, the formula reveals that diminishing returns are inherent in data collection; substantial increases in data may be required to achieve only marginal improvements in bias estimation, particularly when striving for high accuracy and confidence levels. This finding underscores the practical limitations of solely relying on ‘big data’ as a solution to algorithmic bias and highlights the importance of efficient bias modeling techniques alongside robust data collection strategies.

Our approach demonstrates improved F1 scores on unbiased labels, nearing the expected statistical disparity, and benefits from sensible simplifying assumptions as indicated by the 95% confidence regions.

The pursuit of fairness in machine learning, as detailed in this work, echoes a fundamental truth about complex systems. ProbLog4Fairness, by explicitly modeling bias through probabilistic logic, doesn’t build a solution so much as cultivate an environment where fairer outcomes are more likely to emerge. It acknowledges that bias isn’t a bug to be eradicated, but a persistent force to be understood and managed within the system’s inherent probabilities. Grace Hopper famously stated, “It’s easier to ask forgiveness than it is to get permission.” This resonates with the approach taken here; rather than striving for a perfect, bias-free model upfront—a potentially paralyzing quest—ProbLog4Fairness allows for a more adaptive, iterative process, acknowledging that some degree of ‘forgiveness’ – careful monitoring and adjustment – will always be necessary to navigate the complexities of real-world data and ensure equitable outcomes. The system anticipates failure, and builds in mechanisms for graceful recovery, much like postponing inevitable chaos with careful architecture.

The Currents Shift

ProbLog4Fairness, as a means of formalizing and addressing bias, offers a temporary respite, a localized dam against the inevitable flood. The elegance of logic, however, cannot fundamentally alter the nature of data itself. Datasets are not mirrors reflecting reality, but distortions shaped by histories of collection, annotation, and inherent systemic pressures. To believe a model, even one rigorously assessed for fairness, is ‘unbiased’ is to mistake a carefully constructed illusion for truth.

The focus will, predictably, turn towards ever more granular definitions of fairness – a proliferation of metrics chasing a phantom ideal. Yet, the true challenge lies not in quantifying bias, but in acknowledging its pervasiveness. Technologies change, dependencies remain. The formalisms presented here are not solutions, but rather increasingly sophisticated tools for describing the problems—a shifting of the burden, not its lifting.

Future work will likely explore the integration of these neurosymbolic approaches with causal inference—a necessary, if belated, attempt to move beyond correlation and towards understanding the generative processes that create biased data. But one suspects that each refinement, each new layer of complexity, will only reveal deeper, more intractable layers of inequity. The architecture isn’t structure—it’s a compromise frozen in time.

Original article: https://arxiv.org/pdf/2511.09768.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Echo: Bias in Machine Learning

Logic as a Lever: Introducing ProbLog4Fairness

Beyond Measurement: Quantifying Fairness with Probabilistic Constraints

Toward Robust and Equitable Systems: A Shift in Perspective

The Currents Shift

See also: