Hubbry Logo
Fairness (machine learning)Fairness (machine learning)Main
Open search
Fairness (machine learning)
Community hub
Fairness (machine learning)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Fairness (machine learning)
Fairness (machine learning)
from Wikipedia

Fairness in machine learning (ML) refers to the various attempts to correct algorithmic bias in automated decision processes based on ML models. Decisions made by such models after a learning process may be considered unfair if they were based on variables considered sensitive (e.g., gender, ethnicity, sexual orientation, or disability).

As is the case with many ethical concepts, definitions of fairness and bias can be controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives.

Since machine-made decisions may be skewed by a range of factors, they might be considered unfair with respect to certain groups or individuals. An example could be the way social media sites deliver personalized news to consumers.

Context

[edit]

Discussion about fairness in machine learning is a relatively recent topic. Since 2016 there has been a sharp increase in research into the topic.[1] This increase could be partly attributed to an influential report by ProPublica that claimed that the COMPAS software, widely used in US courts to predict recidivism, was racially biased.[2] One topic of research and discussion is the definition of fairness, as there is no universal definition, and different definitions can be in contradiction with each other, which makes it difficult to judge machine learning models.[3] Other research topics include the origins of bias, the types of bias, and methods to reduce bias.[4]

In recent years tech companies have made tools and manuals on how to detect and reduce bias in machine learning. IBM has tools for Python and R with several algorithms to reduce software bias and increase its fairness.[5][6] Google has published guidelines and tools to study and combat bias in machine learning.[7][8] Facebook have reported their use of a tool, Fairness Flow, to detect bias in their AI.[9] However, critics have argued that the company's efforts are insufficient, reporting little use of the tool by employees as it cannot be used for all their programs and even when it can, use of the tool is optional.[10]

It is important to note that the discussion about quantitative ways to test fairness and unjust discrimination in decision-making predates by several decades the rather recent debate on fairness in machine learning.[11] In fact, a vivid discussion of this topic by the scientific community flourished during the mid-1960s and 1970s, mostly as a result of the American civil rights movement and, in particular, of the passage of the U.S. Civil Rights Act of 1964. However, by the end of the 1970s, the debate largely disappeared, as the different and sometimes competing notions of fairness left little room for clarity on when one notion of fairness may be preferable to another.

Language Bias

[edit]

Language bias refers a type of statistical sampling bias tied to the language of a query that leads to "a systematic deviation in sampling information that prevents it from accurately representing the true coverage of topics and views available in their repository."[better source needed][12] Luo et al.[12] show that current large language models, as they are predominately trained on English-language data, often present the Anglo-American views as truth, while systematically downplaying non-English perspectives as irrelevant, wrong, or noise. When queried with political ideologies like "What is liberalism?", ChatGPT, as it was trained on English-centric data, describes liberalism from the Anglo-American perspective, emphasizing aspects of human rights and equality, while equally valid aspects like "opposes state intervention in personal and economic life" from the dominant Vietnamese perspective and "limitation of government power" from the prevalent Chinese perspective are absent. Similarly, other political perspectives embedded in Japanese, Korean, French, and German corpora are absent in ChatGPT's responses. ChatGPT, covered itself as a multilingual chatbot, in fact is mostly ‘blind’ to non-English perspectives.[12]

Gender Bias

[edit]

Gender bias refers to the tendency of these models to produce outputs that are unfairly prejudiced towards one gender over another. This bias typically arises from the data on which these models are trained. For example, large language models often assign roles and characteristics based on traditional gender norms; it might associate nurses or secretaries predominantly with women and engineers or CEOs with men.[13] Another example, utilizes data driven methods to identify gender bias in LinkedIn profiles. The growing use of ML-enabled systems has become an important component of modern talent recruitment, particularly through social networks such as LinkedIn and Facebook. However, data overflow embedded in recruitment systems, based on Natural Language Processing (NLP) methods, has proven to result in gender bias. [14]

Political bias

[edit]

Political bias refers to the tendency of algorithms to systematically favor certain political viewpoints, ideologies, or outcomes over others. Language models may also exhibit political biases. Since the training data includes a wide range of political opinions and coverage, the models might generate responses that lean towards particular political ideologies or viewpoints, depending on the prevalence of those views in the data.[15]

Controversies

[edit]

The use of algorithmic decision making in the legal system has been a notable area of use under scrutiny. In 2014, then U.S. Attorney General Eric Holder raised concerns that "risk assessment" methods may be putting undue focus on factors not under a defendant's control, such as their education level or socio-economic background.[16] The 2016 report by ProPublica on COMPAS claimed that black defendants were almost twice as likely to be incorrectly labelled as higher risk than white defendants, while making the opposite mistake with white defendants.[2] The creator of COMPAS, Northepointe Inc., disputed the report, claiming their tool is fair and ProPublica made statistical errors,[17] which was subsequently refuted again by ProPublica.[18]

Racial and gender bias has also been noted in image recognition algorithms. Facial and movement detection in cameras has been found to ignore or mislabel the facial expressions of non-white subjects.[19] In 2015, Google apologized after Google Photos mistakenly labeled a black couple as gorillas. Similarly, Flickr auto-tag feature was found to have labeled some black people as "apes" and "animals".[20] A 2016 international beauty contest judged by an AI algorithm was found to be biased towards individuals with lighter skin, likely due to bias in training data.[21] A study of three commercial gender classification algorithms in 2018 found that all three algorithms were generally most accurate when classifying light-skinned males and worst when classifying dark-skinned females.[22] In 2020, an image cropping tool from Twitter was shown to prefer lighter skinned faces.[23] In 2022, the creators of the text-to-image model DALL-E 2 explained that the generated images were significantly stereotyped, based on traits such as gender or race.[24][25]

Other areas where machine learning algorithms are in use that have been shown to be biased include job and loan applications. Amazon has used software to review job applications that was sexist, for example by penalizing resumes that included the word "women".[26] In 2019, Apple's algorithm to determine credit card limits for their new Apple Card gave significantly higher limits to males than females, even for couples that shared their finances.[27] Mortgage-approval algorithms in use in the U.S. were shown to be more likely to reject non-white applicants by a report by The Markup in 2021.[28]

Limitations

[edit]

Recent works underline the presence of several limitations to the current landscape of fairness in machine learning, particularly when it comes to what is realistically achievable in this respect in the ever increasing real-world applications of AI.[29][30][31] For instance, the mathematical and quantitative approach to formalize fairness, and the related "de-biasing" approaches, may rely onto too simplistic and easily overlooked assumptions, such as the categorization of individuals into pre-defined social groups. Other delicate aspects are, e.g., the interaction among several sensible characteristics,[22] and the lack of a clear and shared philosophical and/or legal notion of non-discrimination.

Finally, while machine learning models can be designed to adhere to fairness criteria, the ultimate decisions made by human operators may still be influenced by their own biases. This phenomenon occurs when decision-makers accept AI recommendations only when they align with their preexisting prejudices, thereby undermining the intended fairness of the system.[32]

Group fairness criteria

[edit]

In classification problems, an algorithm learns a function to predict a discrete characteristic , the target variable, from known characteristics . We model as a discrete random variable which encodes some characteristics contained or implicitly encoded in that we consider as sensitive characteristics (gender, ethnicity, sexual orientation, etc.). We finally denote by the prediction of the classifier. Now let us define three main criteria to evaluate if a given classifier is fair, that is if its predictions are not influenced by some of these sensitive variables.[33]

Independence

[edit]

We say the random variables satisfy independence if the sensitive characteristics are statistically independent of the prediction , and we write We can also express this notion with the following formula: This means that the classification rate for each target classes is equal for people belonging to different groups with respect to sensitive characteristics .

Yet another equivalent expression for independence can be given using the concept of mutual information between random variables, defined as In this formula, is the entropy of the random variable . Then satisfy independence if .

A possible relaxation of the independence definition include introducing a positive slack and is given by the formula:

Finally, another possible relaxation is to require .

Separation

[edit]

We say the random variables satisfy separation if the sensitive characteristics are statistically independent of the prediction given the target value , and we write We can also express this notion with the following formula: This means that all the dependence of the decision on the sensitive attribute must be justified by the actual dependence of the true target variable .

Another equivalent expression, in the case of a binary target rate, is that the true positive rate and the false positive rate are equal (and therefore the false negative rate and the true negative rate are equal) for every value of the sensitive characteristics:

A possible relaxation of the given definitions is to allow the value for the difference between rates to be a positive number lower than a given slack , rather than equal to zero.

In some fields separation (separation coefficient) in a confusion matrix is a measure of the distance (at a given level of the probability score) between the predicted cumulative percent negative and predicted cumulative percent positive.

The greater this separation coefficient is at a given score value, the more effective the model is at differentiating between the set of positives and negatives at a particular probability cut-off. According to Mayes:[34] "It is often observed in the credit industry that the selection of validation measures depends on the modeling approach. For example, if modeling procedure is parametric or semi-parametric, the two-sample K-S test is often used. If the model is derived by heuristic or iterative search methods, the measure of model performance is usually divergence. A third option is the coefficient of separation...The coefficient of separation, compared to the other two methods, seems to be most reasonable as a measure for model performance because it reflects the separation pattern of a model."

Sufficiency

[edit]

We say the random variables satisfy sufficiency if the sensitive characteristics are statistically independent of the target value given the prediction , and we write We can also express this notion with the following formula: This means that the probability of actually being in each of the groups is equal for two individuals with different sensitive characteristics given that they were predicted to belong to the same group.

Relationships between definitions

[edit]

Finally, we sum up some of the main results that relate the three definitions given above:

  • Assuming is binary, if and are not statistically independent, and and are not statistically independent either, then independence and separation cannot both hold except for rhetorical cases.
  • If as a joint distribution has positive probability for all its possible values and and are not statistically independent, then separation and sufficiency cannot both hold except for rhetorical cases.

It is referred to as total fairness when independence, separation, and sufficiency are all satisfied simultaneously.[35] However, total fairness is not possible to achieve except in specific rhetorical cases.[36]

Mathematical formulation of group fairness definitions

[edit]

Preliminary definitions

[edit]

Most statistical measures of fairness rely on different metrics, so we will start by defining them. When working with a binary classifier, both the predicted and the actual classes can take two values: positive and negative. Now let us start explaining the different possible relations between predicted and actual outcome:[37]

Confusion matrix
  • True positive (TP): The case where both the predicted and the actual outcome are in a positive class.
  • True negative (TN): The case where both the predicted outcome and the actual outcome are assigned to the negative class.
  • False positive (FP): A case predicted to befall into a positive class assigned in the actual outcome is to the negative one.
  • False negative (FN): A case predicted to be in the negative class with an actual outcome is in the positive one.

These relations can be easily represented with a confusion matrix, a table that describes the accuracy of a classification model. In this matrix, columns and rows represent instances of the predicted and the actual cases, respectively.

By using these relations, we can define multiple metrics which can be later used to measure the fairness of an algorithm:

  • Positive predicted value (PPV): the fraction of positive cases which were correctly predicted out of all the positive predictions. It is usually referred to as precision, and represents the probability of a correct positive prediction. It is given by the following formula:
  • False discovery rate (FDR): the fraction of positive predictions which were actually negative out of all the positive predictions. It represents the probability of an erroneous positive prediction, and it is given by the following formula:
  • Negative predicted value (NPV): the fraction of negative cases which were correctly predicted out of all the negative predictions. It represents the probability of a correct negative prediction, and it is given by the following formula:
  • False omission rate (FOR): the fraction of negative predictions which were actually positive out of all the negative predictions. It represents the probability of an erroneous negative prediction, and it is given by the following formula:
  • True positive rate (TPR): the fraction of positive cases which were correctly predicted out of all the positive cases. It is usually referred to as sensitivity or recall, and it represents the probability of the positive subjects to be classified correctly as such. It is given by the formula:
  • False negative rate (FNR): the fraction of positive cases which were incorrectly predicted to be negative out of all the positive cases. It represents the probability of the positive subjects to be classified incorrectly as negative ones, and it is given by the formula:
  • True negative rate (TNR): the fraction of negative cases which were correctly predicted out of all the negative cases. It represents the probability of the negative subjects to be classified correctly as such, and it is given by the formula:
  • False positive rate (FPR): the fraction of negative cases which were incorrectly predicted to be positive out of all the negative cases. It represents the probability of the negative subjects to be classified incorrectly as positive ones, and it is given by the formula:
Relationship between fairness criteria as shown in Barocas et al.[33]

The following criteria can be understood as measures of the three general definitions given at the beginning of this section, namely Independence, Separation and Sufficiency. In the table[33] to the right, we can see the relationships between them.

To define these measures specifically, we will divide them into three big groups as done in Verma et al.:[37] definitions based on a predicted outcome, on predicted and actual outcomes, and definitions based on predicted probabilities and the actual outcome.

We will be working with a binary classifier and the following notation: refers to the score given by the classifier, which is the probability of a certain subject to be in the positive or the negative class. represents the final classification predicted by the algorithm, and its value is usually derived from , for example will be positive when is above a certain threshold. represents the actual outcome, that is, the real classification of the individual and, finally, denotes the sensitive attributes of the subjects.

Definitions based on predicted outcome

[edit]

The definitions in this section focus on a predicted outcome for various distributions of subjects. They are the simplest and most intuitive notions of fairness.

  • Demographic parity, also referred to as statistical parity, acceptance rate parity and benchmarking. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal probability of being assigned to the positive predicted class. This is, if the following formula is satisfied:
  • Conditional statistical parity. Basically consists in the definition above, but restricted only to a subset of the instances. In mathematical notation this would be:

Definitions based on predicted and actual outcomes

[edit]

These definitions not only considers the predicted outcome but also compare it to the actual outcome .

  • Predictive parity, also referred to as outcome test. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal PPV. This is, if the following formula is satisfied:
Mathematically, if a classifier has equal PPV for both groups, it will also have equal FDR, satisfying the formula:
  • False positive error rate balance, also referred to as predictive equality. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal FPR. This is, if the following formula is satisfied:
Mathematically, if a classifier has equal FPR for both groups, it will also have equal TNR, satisfying the formula:
  • False negative error rate balance, also referred to as equal opportunity. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal FNR. This is, if the following formula is satisfied:
Mathematically, if a classifier has equal FNR for both groups, it will also have equal TPR, satisfying the formula:
  • Equalized odds, also referred to as conditional procedure accuracy equality and disparate mistreatment. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal TPR and equal FPR, satisfying the formula:
  • Conditional use accuracy equality. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal PPV and equal NPV, satisfying the formula:
  • Overall accuracy equality. A classifier satisfies this definition if the subject in the protected and unprotected groups have equal prediction accuracy, that is, the probability of a subject from one class to be assigned to it. This is, if it satisfies the following formula:
  • Treatment equality. A classifier satisfies this definition if the subjects in the protected and unprotected groups have an equal ratio of FN and FP, satisfying the formula:

Definitions based on predicted probabilities and actual outcome

[edit]

These definitions are based in the actual outcome and the predicted probability score .

  • Test-fairness, also known as calibration or matching conditional frequencies. A classifier satisfies this definition if individuals with the same predicted probability score have the same probability of being classified in the positive class when they belong to either the protected or the unprotected group:
  • Well-calibration is an extension of the previous definition. It states that when individuals inside or outside the protected group have the same predicted probability score they must have the same probability of being classified in the positive class, and this probability must be equal to :
  • Balance for positive class. A classifier satisfies this definition if the subjects constituting the positive class from both protected and unprotected groups have equal average predicted probability score . This means that the expected value of probability score for the protected and unprotected groups with positive actual outcome is the same, satisfying the formula:
  • Balance for negative class. A classifier satisfies this definition if the subjects constituting the negative class from both protected and unprotected groups have equal average predicted probability score . This means that the expected value of probability score for the protected and unprotected groups with negative actual outcome is the same, satisfying the formula:

Equal confusion fairness

[edit]

With respect to confusion matrices, independence, separation, and sufficiency require the respective quantities listed below to not have statistically significant difference across sensitive characteristics.[36]

  • Independence: (TP + FP) / (TP + FP + FN + TN) (i.e., ).
  • Separation: TN / (TN + FP) and TP / (TP + FN) (i.e., specificity and recall ).
  • Sufficiency: TP / (TP + FP) and TN / (TN + FN) (i.e., precision and negative predictive value ).

The notion of equal confusion fairness[38] requires the confusion matrix of a given decision system to have the same distribution when computed stratified over all sensitive characteristics.

Social welfare function

[edit]

Some scholars have proposed defining algorithmic fairness in terms of a social welfare function. They argue that using a social welfare function enables an algorithm designer to consider fairness and predictive accuracy in terms of their benefits to the people affected by the algorithm. It also allows the designer to trade off efficiency and equity in a principled way.[39] Sendhil Mullainathan has stated that algorithm designers should use social welfare functions to recognize absolute gains for disadvantaged groups. For example, a study found that using a decision-making algorithm in pretrial detention rather than pure human judgment reduced the detention rates for Blacks, Hispanics, and racial minorities overall, even while keeping the crime rate constant.[40]

Individual fairness criteria

[edit]

An important distinction among fairness definitions is the one between group and individual notions.[41][42][37][43] Roughly speaking, while group fairness criteria compare quantities at a group level, typically identified by sensitive attributes (e.g. gender, ethnicity, age, etc.), individual criteria compare individuals. In words, individual fairness follow the principle that "similar individuals should receive similar treatments".

There is a very intuitive approach to fairness, which usually goes under the name of fairness through unawareness (FTU), or blindness, that prescribes not to explicitly employ sensitive features when making (automated) decisions. This is effectively a notion of individual fairness, since two individuals differing only for the value of their sensitive attributes would receive the same outcome.

However, in general, FTU is subject to several drawbacks, the main being that it does not take into account possible correlations between sensitive attributes and non-sensitive attributes employed in the decision-making process. For example, an agent with the (malignant) intention to discriminate on the basis of gender could introduce in the model a proxy variable for gender (i.e. a variable highly correlated with gender) and effectively using gender information while at the same time being compliant to the FTU prescription.

The problem of what variables correlated to sensitive ones are fairly employable by a model in the decision-making process is a crucial one, and is relevant for group concepts as well: independence metrics require a complete removal of sensitive information, while separation-based metrics allow for correlation, but only as far as the labeled target variable "justify" them.

The most general concept of individual fairness was introduced in the pioneer work by Cynthia Dwork and collaborators in 2012[44] and can be thought of as a mathematical translation of the principle that the decision map taking features as input should be built such that it is able to "map similar individuals similarly", that is expressed as a Lipschitz condition on the model map. They call this approach fairness through awareness (FTA), precisely as counterpoint to FTU, since they underline the importance of choosing the appropriate target-related distance metric to assess which individuals are similar in specific situations. Again, this problem is very related to the point raised above about what variables can be seen as "legitimate" in particular contexts.

Causality-based metrics

[edit]

Causal fairness measures the frequency with which two nearly identical users or applications who differ only in a set of characteristics with respect to which resource allocation must be fair receive identical treatment.[45] [dubiousdiscuss]

An entire branch of the academic research on fairness metrics is devoted to leverage causal models to assess bias in machine learning models. This approach is usually justified by the fact that the same observational distribution of data may hide different causal relationships among the variables at play, possibly with different interpretations of whether the outcome are affected by some form of bias or not.[33]

Kusner et al.[46] propose to employ counterfactuals, and define a decision-making process counterfactually fair if, for any individual, the outcome does not change in the counterfactual scenario where the sensitive attributes are changed. The mathematical formulation reads:

that is: taken a random individual with sensitive attribute and other features and the same individual if she had , they should have same chance of being accepted. The symbol represents the counterfactual random variable in the scenario where the sensitive attribute is fixed to . The conditioning on means that this requirement is at the individual level, in that we are conditioning on all the variables identifying a single observation.

Machine learning models are often trained upon data where the outcome depended on the decision made at that time.[47] For example, if a machine learning model has to determine whether an inmate will recidivate and will determine whether the inmate should be released early, the outcome could be dependent on whether the inmate was released early or not. Mishler et al.[48] propose a formula for counterfactual equalized odds:

where is a random variable, denotes the outcome given that the decision was taken, and is a sensitive feature.

Plecko and Bareinboim[49] propose a unified framework to deal with causal analysis of fairness. They suggest the use of a Standard Fairness Model, consisting of a causal graph with 4 types of variables:

  • sensitive attributes (),
  • target variable (),
  • mediators () between and , representing possible indirect effects of sensitive attributes on the outcome,
  • variables possibly sharing a common cause with (), representing possible spurious (i.e., non causal) effects of the sensitive attributes on the outcome.

Within this framework, Plecko and Bareinboim[49] are therefore able to classify the possible effects that sensitive attributes may have on the outcome. Moreover, the granularity at which these effects are measured—namely, the conditioning variables used to average the effect—is directly connected to the "individual vs. group" aspect of fairness assessment.

Bias mitigation strategies

[edit]

Fairness can be applied to machine learning algorithms in three different ways: data preprocessing, optimization during software training, or post-processing results of the algorithm.

Preprocessing

[edit]

Usually, the classifier is not the only problem; the dataset is also biased. The discrimination of a dataset with respect to the group can be defined as follows:

That is, an approximation to the difference between the probabilities of belonging in the positive class given that the subject has a protected characteristic different from and equal to .

Algorithms correcting bias at preprocessing remove information about dataset variables which might result in unfair decisions, while trying to alter as little as possible. This is not as simple as just removing the sensitive variable, because other attributes can be correlated to the protected one.

A way to do this is to map each individual in the initial dataset to an intermediate representation in which it is impossible to identify whether it belongs to a particular protected group while maintaining as much information as possible. Then, the new representation of the data is adjusted to get the maximum accuracy in the algorithm.

This way, individuals are mapped into a new multivariable representation where the probability of any member of a protected group to be mapped to a certain value in the new representation is the same as the probability of an individual which doesn't belong to the protected group. Then, this representation is used to obtain the prediction for the individual, instead of the initial data. As the intermediate representation is constructed giving the same probability to individuals inside or outside the protected group, this attribute is hidden to the classifier.

An example is explained in Zemel et al.[50] where a multinomial random variable is used as an intermediate representation. In the process, the system is encouraged to preserve all information except that which can lead to biased decisions, and to obtain a prediction as accurate as possible.

On the one hand, this procedure has the advantage that the preprocessed data can be used for any machine learning task. Furthermore, the classifier does not need to be modified, as the correction is applied to the dataset before processing. On the other hand, the other methods obtain better results in accuracy and fairness.[citation needed]

Reweighing

[edit]

Reweighing is an example of a preprocessing algorithm. The idea is to assign a weight to each dataset point such that the weighted discrimination is 0 with respect to the designated group.[51]

If the dataset was unbiased the sensitive variable and the target variable would be statistically independent and the probability of the joint distribution would be the product of the probabilities as follows:

In reality, however, the dataset is not unbiased and the variables are not statistically independent so the observed probability is:

To compensate for the bias, the software adds a weight, lower for favored objects and higher for unfavored objects. For each we get:

When we have for each a weight associated we compute the weighted discrimination with respect to group as follows:

It can be shown that after reweighting this weighted discrimination is 0.

Inprocessing

[edit]

Another approach is to correct the bias at training time. This can be done by adding constraints to the optimization objective of the algorithm.[52] These constraints force the algorithm to improve fairness, by keeping the same rates of certain measures for the protected group and the rest of individuals. For example, we can add to the objective of the algorithm the condition that the false positive rate is the same for individuals in the protected group and the ones outside the protected group.

The main measures used in this approach are false positive rate, false negative rate, and overall misclassification rate. It is possible to add just one or several of these constraints to the objective of the algorithm. Note that the equality of false negative rates implies the equality of true positive rates so this implies the equality of opportunity. After adding the restrictions to the problem it may turn intractable, so a relaxation on them may be needed.

Adversarial debiasing

[edit]

We train two classifiers at the same time through some gradient-based method (f.e.: gradient descent). The first one, the predictor tries to accomplish the task of predicting , the target variable, given , the input, by modifying its weights to minimize some loss function . The second one, the adversary tries to accomplish the task of predicting , the sensitive variable, given by modifying its weights to minimize some loss function .[53] An important point here is that, to propagate correctly, above must refer to the raw output of the classifier, not the discrete prediction; for example, with an artificial neural network and a classification problem, could refer to the output of the softmax layer.

Then we update to minimize at each training step according to the gradient and we modify according to the expression: where is a tunable hyperparameter that can vary at each time step.

Graphic representation of the vectors used in adversarial debiasing as shown in Zhang et al.[53]

The intuitive idea is that we want the predictor to try to minimize (therefore the term ) while, at the same time, maximize (therefore the term ), so that the adversary fails at predicting the sensitive variable from .

The term prevents the predictor from moving in a direction that helps the adversary decrease its loss function.

It can be shown that training a predictor classification model with this algorithm improves demographic parity with respect to training it without the adversary.

Postprocessing

[edit]

The final method tries to correct the results of a classifier to achieve fairness. In this method, we have a classifier that returns a score for each individual and we need to do a binary prediction for them. High scores are likely to get a positive outcome, while low scores are likely to get a negative one, but we can adjust the threshold to determine when to answer yes as desired. Note that variations in the threshold value affect the trade-off between the rates for true positives and true negatives.

If the score function is fair in the sense that it is independent of the protected attribute, then any choice of the threshold will also be fair, but classifiers of this type tend to be biased, so a different threshold may be required for each protected group to achieve fairness.[54] A way to do this is plotting the true positive rate against the false negative rate at various threshold settings (this is called ROC curve) and find a threshold where the rates for the protected group and other individuals are equal.[54]

Reject option based classification

[edit]

Given a classifier let be the probability computed by the classifiers as the probability that the instance belongs to the positive class +. When is close to 1 or to 0, the instance is specified with high degree of certainty to belong to class + or – respectively. However, when is closer to 0.5 the classification is more unclear.[55]

We say is a "rejected instance" if with a certain such that .

The algorithm of "ROC" consists on classifying the non-rejected instances following the rule above and the rejected instances as follows: if the instance is an example of a deprived group () then label it as positive, otherwise, label it as negative.

We can optimize different measures of discrimination (link) as functions of to find the optimal for each problem and avoid becoming discriminatory against the privileged group.[55]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Fairness in refers to the interdisciplinary effort to define, quantify, and mitigate systematic disparities in algorithmic predictions or decisions that disadvantage subgroups delineated by sensitive attributes, such as race, , or age, while grappling with inherent tensions between equity goals and statistical accuracy. Core formalizations include demographic parity, stipulating statistical between predictions and protected attributes—formally, P(Y^=1A=a)=P(Y^=1A=b)P(\hat{Y}=1 \mid A=a) = P(\hat{Y}=1 \mid A=b) for attribute values a,ba, b—and equalized odds, requiring such conditional on the true outcome YY, ensuring equal true and false positive rates across groups. These metrics, alongside others like or , underpin techniques spanning data preprocessing to debias training sets, in-processing constraints during model optimization, and post-processing adjustments to outputs, yet empirical implementations frequently reveal non-trivial costs to overall . Pioneering impossibility theorems establish that no classifier can simultaneously satisfy multiple intuitive fairness criteria—such as predictive parity and error rate balance—unless demographic base rates for outcomes are identical across groups, a condition rarely met in real-world data and underscoring unavoidable trade-offs with accuracy. For instance, enforcing demographic parity in scenarios with disparate prevalence rates can inflate false positives for low-risk groups, potentially harming innocents, while prioritizing accuracy may preserve but not create such disparities reflective of underlying realities. Approaches emphasizing "fairness through " advocate incorporating sensitive attributes explicitly to achieve calibrated predictions invariant to them, contrasting blind obliviousness that ignores causal pathways in data. Despite widespread adoption in domains like hiring, lending, and , critiques highlight that fairness interventions often conflate with causation, overlook selection biases in training data, and yield negligible or context-dependent benefits amid persistent utility losses, prompting calls for domain-specific, causally informed metrics over generic statistical proxies.

Fundamentals

Definition and Core Principles

Fairness in encompasses efforts to prevent automated systems from producing discriminatory outcomes, defined as the wrongful consideration of group membership in decisions affecting individuals' interests. This involves addressing protected attributes—socially salient categories such as race, , or that have historically served as bases for systematic adverse treatment, as codified in frameworks like the U.S. and its Title VII provisions prohibiting . Discrimination arises through , where protected attributes explicitly influence decision rules, or , where neutral policies yield disproportionate harms to protected groups without sufficient justification. These concepts, adapted from legal precedents, translate into as constraints ensuring decisions respect individuals' agency and avoid arbitrary relative disadvantages across groups. Core principles bifurcate into group fairness, which enforces statistical parity across demographic subgroups, and individual fairness, which demands consistent treatment of comparable cases irrespective of group. Group fairness criteria include demographic parity (or statistical independence), requiring predicted outcomes Y^\hat{Y} to be independent of the protected attribute AA, formally AY^A \perp \hat{Y} or P(Y^=1A=a)=P(Y^=1A=a)P(\hat{Y}=1 \mid A=a) = P(\hat{Y}=1 \mid A=a') for all a,aa, a'. Other group criteria encompass equalized odds ( given true outcome YY: AY^YA \perp \hat{Y} \mid Y), equalizing true/false positive rates across groups, and predictive parity (sufficiency: AYY^A \perp Y \mid \hat{Y}), ensuring predictions are equally calibrated by group. Individual fairness, formalized by Dwork et al. in 2012, posits that similar individuals—measured by a task-specific similarity metric on inputs—should receive similar outcomes, yielding in the model's output function to bound decision disparities. These principles, while rooted in (e.g., Cleary's 1966 criterion for test fairness) and philosophical notions of equality of opportunity, lack a universal formulation, as no single criterion satisfies all normative intuitions simultaneously.

Historical Origins and Evolution

The study of fairness in machine learning draws from earlier statistical and legal efforts to quantify discrimination in decision processes, with roots traceable to mid-20th-century analyses in labor and education. For instance, in 1975, researchers identified Simpson's paradox in University of California, Berkeley admissions data, where aggregate gender disparities masked departmental-level patterns, highlighting how aggregated statistics can obscure subgroup inequities. This work underscored the need for disaggregated evaluation in selection algorithms, influencing later ML fairness metrics like disparate impact ratios. Similarly, U.S. legal frameworks, such as the 1971 Griggs v. Duke Power Co. Supreme Court decision establishing disparate impact liability under Title VII, provided conceptual foundations for algorithmic scrutiny, emphasizing outcomes over intent in automated decisions. Early instances of emerged in the 1980s, predating widespread but illustrating risks in rule-based scoring systems akin to modern models. At in , an admissions developed in 1979 and implemented by 1982 assigned penalties to applicants with non-Caucasian-sounding names (up to 15 points deducted) and slightly lower scores to females, perpetuating human prejudices encoded in the formula. A 1986 investigation by the UK Commission for Racial Equality confirmed racial and sex discrimination, leading to reparations for affected applicants but no systemic overhaul, as the biases mirrored societal norms rather than novel computational errors. This case, along with 1996 analyses of bias in computer systems by Friedman and Nissenbaum, highlighted how technical artifacts could amplify historical inequities, setting precedents for auditing ML pipelines. The formal field of fairness in machine learning coalesced in the late 2000s amid growing deployment of data-driven classifiers in high-stakes domains like and hiring. Pioneering work by Calders and colleagues in 2009–2010 introduced discrimination-aware techniques, such as massaging datasets to equalize acceptance rates across protected groups while preserving . This evolved into theoretical frameworks, including Dwork et al.'s 2012 "Fairness Through Awareness" paper, which proposed differential privacy-inspired notions to limit influence of sensitive attributes on predictions. By the mid-2010s, real-world exposés accelerated research; ProPublica's 2016 investigation of the recidivism tool revealed higher false positive rates for defendants (45% vs. 23% for whites), sparking debates on vs. error rate parity, though critics argued the disparities reflected differences rather than inherent . Subsequent evolution featured proliferation of criteria—group parity (e.g., Hardt et al.'s equality of opportunity), individual non-discrimination, and counterfactual approaches—and mitigation strategies like pre-processing (data reweighting) and post-processing (threshold adjustment). Chouldechova's 2017 analysis of predictors formalized trade-offs, showing equalized odds often conflicts with calibration under varying prevalence rates. The field matured with dedicated venues like the ACM Conference on Fairness, Accountability, and Transparency (FAT*/FAccT) from 2018, integrating to address proxy , though persistent incompatibilities among criteria (e.g., Kleinberg et al. impossibility results) underscore limits of purely statistical remedies without domain-specific interventions. By the early , surveys documented over 50 fairness definitions, with emphasis shifting toward causal models to disentangle correlation from spurious associations in training data.

Sources of Disparity in ML Systems

Data-related disparities in systems primarily emerge from es embedded in training datasets, which reflect flaws in , sampling, or labeling processes that lead to unequal model performance or outcomes across protected groups such as race, , or . These disparities can manifest as skewed predictions because models learn patterns from data that may not accurately capture the target population's diversity or true underlying relationships, potentially amplifying societal inequalities rather than resolving them. A comprehensive survey identifies three key data es contributing to such issues: representation from non-diverse sampling, historical encoding past discriminatory practices, and measurement from inaccurate proxies or labels. Representation disparities arise when training data underrepresents certain subgroups, causing models to generalize poorly to those groups due to insufficient examples for learning relevant features. For example, in image recognition tasks, datasets with disproportionate samples from demographics—such as predominantly light-skinned individuals—result in higher rates for underrepresented groups, as documented in analyses of facial recognition systems where dark-skinned females experienced rates up to 34.7% compared to 0.8% for light-skinned males. This imbalance stems from sampling processes that fail to mirror real-world distributions, leading to models that prioritize -group accuracy at the expense of minorities. Historical disparities occur when datasets inherit patterns from prior societal biases, such as discriminatory enforcement or decision-making, embedding these into labels or features. In applications like prediction, training data often draws from arrest records that reflect historical over-policing of certain communities, potentially labeling higher-risk scores for those groups even if causal factors like socioeconomic conditions differ. The algorithm, used for in U.S. courts, exemplified this debate: a 2016 investigation reported black defendants received high-risk scores twice as often as whites and faced false positive rates of 45% versus 23% for whites, attributing it to data reflecting systemic biases. However, subsequent peer-reviewed analyses countered that these error rate disparities align with differing base rates between groups (e.g., 48% for black defendants versus lower for whites in the dataset), arguing the model was equally calibrated in predictive accuracy across races at around 62%, and that ignoring base rates mischaracterizes fairness. Measurement disparities stem from errors or imperfections in how variables are captured, often using proxies that correlate unevenly across groups or noisy labels that introduce systematic inaccuracies. For instance, using as a proxy for can embed racial disparities if historical patterns cause uneven correlations, leading models to indirectly discriminate via these flawed features. Label inaccuracies, such as subjective annotations varying by annotator demographics, further exacerbate issues, with studies showing measurement errors can inflate unfairness metrics by up to 20-30% in simulated scenarios. These flaws underscore that observed predictive disparities may reflect genuine group differences in outcomes—driven by causal factors like or environment—rather than , necessitating scrutiny of whether interventions address root causes or merely suppress symptoms.

Algorithmic and Modeling Disparities

Algorithmic and modeling disparities in arise from design decisions in model architecture, , loss functions, and optimization processes that lead to unequal predictive performance or treatment across protected groups, even when controlling for composition. These differ from data biases, as they involve how the algorithm constructs representations or minimizes objectives, potentially introducing or exacerbating group differences through proxy learning or unconstrained optimization. For instance, in standard prioritizes aggregate accuracy, which can yield disparate error rates when group base rates vary, as unconstrained models rarely satisfy criteria like equalized odds without explicit constraints. Feature engineering choices, such as incorporating proxy variables correlated with sensitive attributes, constitute a key modeling disparity. Proxies like zip codes or prior arrests indirectly encode protected traits (e.g., race), resulting in ; in the recidivism tool, this contributed to false positive rates of 45% for compared to 23% for . Similarly, optimization favoring popularity in recommendation systems can amplify visibility gaps, as ranking algorithms boost items with higher initial engagement from majority groups, independent of explicit data skew. Loss functions and regularization further induce disparities by implicitly penalizing certain patterns. Techniques like independence regularization, intended to decorrelate predictions from sensitive attributes, systematically underpredict outcomes for minority groups in classifiers such as naive Bayes, as shown in experiments where bias coefficients shifted negatively for underrepresented classes. In healthcare, modeling patient needs via prior spending—a proxy tied to access rather than acuity—led to Black patients receiving 18% fewer resources than equally needy patients, reflecting a causal modeling flaw in equating cost with need. Model architectures exacerbate these issues through differential capacity to capture correlations. High-capacity deep neural networks learn subtle proxies more effectively than linear models, propagating disparities; facial recognition classifiers from vendors like and exhibited 11.8%-19.2% higher error rates for darker skin tones due to architectural tendencies to overfit group-specific patterns. Omitted variable bias in misspecified models, such as ignoring interactions or confounders, further skews predictions, as linear assumptions fail to account for heterogeneous effects across groups. Mitigation often involves , like adding Lagrangian terms for fairness metrics during training, though trade-offs with accuracy persist.

Deployment and Human Factors

Deployment of machine learning models introduces disparities through human-mediated processes, such as the selection of operational thresholds, integration into decision workflows, and interpretation of outputs, which can vary systematically across demographic groups if not standardized. In machine-assisted settings, where humans retain final authority, decision-makers often exhibit , overweighting model predictions that align with prior beliefs and underweighting those that challenge them, leading to inconsistent application of fairness across protected attributes like race or . A 2023 study on algorithmic assistance in decisions found that human overrides tend to amplify rather than correct model biases, particularly when users perceive the system as authoritative, resulting in higher error rates for underrepresented groups. Feedback loops emerge post-deployment as models influence real-world actions, generating new data that feeds back into retraining cycles and potentially reinforcing initial disparities. For instance, in interactive systems like recommender engines, biased predictions elicit non-uniform user responses—majority groups engage more, skewing subsequent training data toward their preferences and marginalizing minorities, with simulations showing bias amplification over multiple iterations. Empirical analyses of such loops indicate they can increase by up to 20-30% in metrics for affected subgroups after 5-10 retraining cycles, depending on interaction volume. This dynamic alters the underlying data-generating process, making pre-deployment fairness assessments insufficient without ongoing monitoring. Human factors also encompass subjective elements in deployment oversight, including team composition and institutional practices, where homogeneous developer groups may overlook context-specific disparities. The NIST taxonomy categorizes bias separately from or algorithmic sources, attributing it to interpretive errors or selective auditing during live operations, as evidenced in healthcare deployments where biases in model usage perpetuate socioeconomic disparities in outcomes. A 2024 review of clinical ML models reported that 74.7% exhibited bias against disadvantaged groups, often traced to unaddressed deployment choices rather than inherent model flaws. Mitigating these disparities requires protocols with explicit bias checks, such as randomized audits and diverse review panels, though evidence suggests over-reliance on automated metrics ignores nuanced human interactions. In high-stakes domains, failure to account for these factors has led to real-world harms, including widened inequality gaps in lending and hiring systems deployed since the mid-2010s.

Formal Fairness Criteria

Group-Based Criteria

Group-based fairness criteria in evaluate whether predictive models produce statistically equivalent outcomes across subpopulations defined by a protected attribute AA, such as or , typically through aggregate measures of prediction rates or error rates. These criteria, rooted in observational , formalize notions like , separation, and sufficiency between the model's prediction Y^\hat{Y}, the true outcome YY, and AA. They emerged prominently in the mid-2010s amid concerns over biased recidivism prediction tools like , where disparities in positive prediction rates between racial groups prompted formal metric development. Unlike individual fairness, which focuses on similar inputs receiving similar outputs, group-based approaches prioritize equity in group-level statistics but can conflict with overall accuracy when base rates of YY differ across groups. Independence (Demographic Parity) requires that the Y^\hat{Y} is statistically independent of the protected attribute AA, ensuring equal positive prediction rates across groups irrespective of true outcomes. Formally, this is expressed as P(Y^=rA=a)=P(Y^=rA=b)P(\hat{Y} = r \mid A = a) = P(\hat{Y} = r \mid A = b) for all r{0,1}r \in \{0,1\} and a,bAa, b \in A, or equivalently Y^A\hat{Y} \perp A. This criterion, analyzed in early fairness literature for applications like hiring or lending, aims to prevent but ignores differences in qualification base rates, potentially requiring the model to underpredict for higher-qualified groups to balance rates. Empirical evaluations, such as on the UCI , show demographic parity often reduces model utility by 5-15% in accuracy compared to unconstrained baselines. Separation (Equalized Odds) mandates conditional independence between Y^\hat{Y} and AA given the true label YY, equalizing true positive rates (TPR) and false positive rates (FPR) across groups: P(Y^=1Y=y,A=a)=P(Y^=1Y=y,A=b)P(\hat{Y} = 1 \mid Y = y, A = a) = P(\hat{Y} = 1 \mid Y = y, A = b) for y{0,1}y \in \{0,1\} and all a,bAa, b \in A, or Y^AY\hat{Y} \perp A \mid Y. Introduced by Hardt et al. in for tasks like , it accommodates differing base rates by conditioning on YY, thus preserving more predictive power than ; equal opportunity is a relaxation focusing only on TPR equality (y=1y=1). In benchmarks, equalized odds constraints on models narrowed racial FPR gaps from 0.45 to near zero but increased overall error rates by up to 10% in some datasets. Sufficiency (Predictive Parity or ) enforces that the true outcome YY is independent of AA given Y^\hat{Y}, ensuring equal positive predictive values (PPV) and negative predictive values (NPV) across groups: P(Y=yY^=r,A=a)=P(Y=yY^=r,A=b)P(Y = y \mid \hat{Y} = r, A = a) = P(Y = y \mid \hat{Y} = r, A = b) for all y,r{0,1}y, r \in \{0,1\} and a,bAa, b \in A, or YAY^Y \perp A \mid \hat{Y}. This metric, emphasized in calibration-focused fairness work, prioritizes reliable probability estimates within predicted classes but assumes well-calibrated scores; violations occur when models over- or under-estimate risks differently by group. Studies on healthcare tasks report that enforcing sufficiency via post-processing adjusts PPV disparities effectively but may amplify TPR gaps if not combined with other metrics. These criteria are observational and group-aggregate, often tested via metrics like the difference in rates (e.g., demographic parity gap = |P(\hat{Y}=1|A=0) - P(\hat{Y}=1|A=1)| ≤ ε for approximate fairness), with ε typically set to 0.01-0.1 based on regulatory thresholds like the U.S. EEOC's 80% rule. They underpin tools in libraries like Fairlearn, where violations are quantified over held-out data stratified by AA. However, as shown in impossibility results, no single group criterion satisfies all simultaneously when base rates and error rates vary, necessitating trade-offs.

Individual-Based Criteria

Individual fairness criteria in emphasize equitable treatment at the level of individual predictions, requiring that similar individuals receive similar outcomes from a model, irrespective of protected attributes such as race or . This approach contrasts with group-based criteria by focusing on pairwise similarities rather than aggregate disparities across demographics, aiming to preserve the merits of individuals while mitigating . The concept posits that arises when protected attributes influence decisions for otherwise comparable cases, and fairness is achieved by enforcing consistency in outputs for inputs deemed proximate under a context-specific . The foundational formalization appears in the 2012 work by Dwork et al., which defines individual fairness through a metric-based guarantee: a classifier ff is fair if, for any two individuals xx and xx' in the input space, the output difference f(x)f(x)|f(x) - f(x')| is bounded by a constant β\beta times the distance d(x,x)d(x, x') under a predefined similarity metric dd, i.e., f(x)f(x)βd(x,x)|f(x) - f(x')| \leq \beta \cdot d(x, x') for all x,xx, x'. This Lipschitz continuity condition ensures smooth mappings from inputs to outputs, preventing abrupt changes in decisions for minor input variations. The metric dd must be task-dependent and ideally derived from domain expertise, capturing substantive similarities (e.g., in lending, proximity in credit history and income profiles), while avoiding direct embedding of protected attributes to prevent encoded bias; however, the framework permits "awareness" of sensitive features during training to approximate fairness without them. Implementing individual fairness requires specifying the similarity metric, which poses challenges due to its subjectivity and the need for high-dimensional distances that align with human intuitions of equity. Empirical verification is computationally intensive, often involving exhaustive pairwise comparisons, leading to approximations like sampling or kernel-based methods in practice. Unlike group criteria, which can be checked via simple statistical tests on holdout data, individual fairness demands access to the full decision function and may conflict with utility maximization if the metric enforces overly strict uniformity. Research has explored relaxations, such as ϵ\epsilon-approximate versions allowing small deviations, to balance enforceability with model performance. Variants of individual fairness include probabilistic formulations, where outcomes are similar in expectation rather than deterministically, accommodating models like randomized classifiers. These criteria have been applied in domains such as , where similar offender profiles should yield comparable scores, though real-world adoption is limited by metric disputes and scalability issues. Studies indicate that individual fairness can mitigate "reverse " risks inherent in group parity constraints, as it prioritizes merit-based distinctions over demographic balancing.

Causality-Oriented Criteria

Causality-oriented fairness criteria in employ structural causal models, typically directed acyclic graphs (DAGs), to define fairness through interventions, counterfactuals, or path decompositions, addressing limitations of correlation-based metrics by isolating discriminatory causal mechanisms. These approaches assume a known or estimable , where protected attributes (e.g., race or , denoted as AA) influence outcomes (YY) via specific pathways, enabling distinctions between legitimate (e.g., qualification-based) and illegitimate (e.g., historical ) effects. Unlike group parity, they often yield individual-level guarantees but require identifiability conditions, such as no unmeasured , which empirical validation in real datasets frequently challenges. Counterfactual fairness, formalized by Kusner et al. in , mandates that a predictor Y^\hat{Y} for an individual uu satisfies Y^(u,Aa)=Y^(u,Aa)\hat{Y}(u, A \leftarrow a) = \hat{Y}(u, A \leftarrow a') for all values a,aa, a' in the domain of AA, where Y^(u,do(Aa))\hat{Y}(u, do(A \leftarrow a)) denotes the potential outcome under intervention do(Aa)do(A \leftarrow a) on non-descendants of AA. This ensures predictions remain invariant to hypothetical changes in the protected attribute, blocking all downstream causal influences from AA while preserving other factors. The criterion applies to generative models fitting observational data to a causal DAG, with identifiability via adjustment formulas like back-door criterion when confounders are observed; violations occur if proxies for AA (e.g., zip code correlating with race) leak information. Empirical tests on synthetic data show it reduces proxy discrimination but can degrade utility if causal graphs are misspecified, as interventions alter distributions non-trivially. Path-specific counterfactual fairness, developed by Chiappa in and refined in , extends this by decomposing effects along DAG paths, enforcing equality of counterfactuals only for unfair paths (e.g., those transmitting societal bias) while retaining fair paths (e.g., AA to qualifications to YY). Formally, for a set of unfair edges EuE_u, the predictor satisfies path-specific invariance by intervening to sever EuE_u, yielding Y^(u,do(Aa;Eu))=Y^(u,do(Aa;Eu))\hat{Y}(u, do(A \leftarrow a; E_u \leftarrow \emptyset)) = \hat{Y}(u, do(A \leftarrow a'; E_u \leftarrow \emptyset)). This uses nested counterfactuals or effect decomposition (e.g., natural direct effects), identifiable under monotonicity or randomization assumptions; algorithms learn fair representations by optimizing losses over adjusted distributions. Applications to hiring models demonstrate preserved accuracy over total counterfactual bans, but computational cost scales with path enumeration, and path labeling introduces subjectivity critiqued in fairness audits. Additional criteria include causal effect independence, requiring zero total causal effect of AA on Y^\hat{Y} via interventions (P(Y^do(A=a))=P(Y^do(A=a))P(\hat{Y}|do(A=a)) = P(\hat{Y}|do(A=a'))), and separation via proxies, where no downstream variable causally mediates illegitimate influence. Surveys categorize these into avoiding direct discrimination (no AYA \to Y effect) and proxy discrimination (no effects through descendants of AA), with guidelines for selection based on domain knowledge—e.g., counterfactuals suit individual decisions like lending, path-specific for mixed influences like admissions. Real-world deployment, such as in recidivism prediction, reveals tensions: causal assumptions often rely on unverifiable expert graphs, leading to fragility under dataset shifts, as 2022 analyses show minor biases amplifying unfair effects by orders of magnitude. Peer-reviewed evaluations emphasize testing via sensitivity to graph perturbations over blind acceptance of causal claims.

Interdependencies and Incompatibilities

Relationships Among Criteria

Group fairness criteria in machine learning are often formalized using conditional independence conditions involving the sensitive attribute AA, the true outcome YY, and the predicted risk score RR or binary prediction Y^\hat{Y}. , or demographic parity, requires RAR \perp A, ensuring equal average risk scores across groups defined by AA. , or equalized odds, requires RAYR \perp A \mid Y, mandating equal true positive rates and false positive rates across groups. Sufficiency, or predictive parity, requires YARY \perp A \mid R, ensuring equal precision or calibration within risk score levels across groups. These criteria exhibit specific relationships and implications. Equalized odds restricted to the positive class yields , which coincides with equalized odds when false positive rates are irrelevant or equal. Predictive parity for binary predictions aligns with group calibration, where the predicted probability matches the observed outcome rate within each group. However, satisfying and separation simultaneously is impossible unless the base rates P(Y=1A=a)=P(Y=1A=b)P(Y=1 \mid A=a) = P(Y=1 \mid A=b) for all groups, as the former equalizes marginal prediction rates while the latter conditions on YY. Similarly, independence and sufficiency cannot both hold unless YAY \perp A, because independence ignores outcome dependence on AA while sufficiency conditions on predictions. More fundamentally, no non-trivial scoring function can simultaneously achieve (sufficiency), equal false positive and false negative rates across groups (separation), and statistical parity (independence) unless base rates are identical across groups or the outcome is perfectly predictable. This result holds for any method assigning scores from [0,1][0,1], as demonstrated through derivations showing linear dependence among the conditions that only resolve under equal base rates or perfect accuracy. Chouldechova's analysis of prediction instruments empirically confirms that by group and equalized odds imply demographic parity, rendering disparate impact unavoidable if base rates differ, as observed in datasets like where Black defendants had higher rates (45% vs. 23% for White). Under equal base rates, the criteria become compatible: independence holds if separation does, and sufficiency aligns via Bayes' rule. Approximate satisfaction is possible with small deviations ϵ>0\epsilon > 0, but requires base rates or predictability to approximate equality within bounds tied to ϵ\epsilon. Causality-oriented criteria, such as counterfactual fairness, relate to group criteria by enforcing path-specific s that strengthen separation or independence when causal graphs exclude spurious correlations, though they remain distinct without full causal knowledge.

Impossibility Theorems and Fundamental Conflicts

Several impossibility theorems in fairness demonstrate that prominent group-based fairness criteria cannot be simultaneously satisfied except under restrictive conditions, such as identical base rates (prevalence of the outcome) across protected groups. These results underscore fundamental mathematical conflicts arising from differing statistical dependencies in real-world data, where protected attributes like race or correlate with outcomes due to historical or societal factors. A foundational result by Kleinberg, Mullainathan, and Raghavan (2016) proves that for scorers, the criteria of (equivalent to demographic parity, where the score distribution is independent of the protected attribute), separation (equivalent to equalized , where error rates conditional on the true outcome are independent of the protected attribute), and sufficiency (predictive parity or by group, where the outcome distribution conditional on the score is independent of the protected attribute) cannot all hold simultaneously for any non-constant scorer unless the outcome distributions are identical across groups. This theorem applies to ordinal scorers (e.g., continuous risk scores binned into categories) and relies on properties like monotonicity and scale-invariance, showing that satisfying any two implies violation of the third when base rates differ, as is common in applications like or lending. Chouldechova (2017) provides a for binary classifiers, demonstrating that demographic parity and equalized odds (or ) are incompatible unless the positive outcome prevalence equals across groups or the classifier is perfect (zero error). Specifically, if false positive rates and false negative rates are equalized across groups (equalized odds), and positive predictive values are by group, then the predicted positive rate (demographic parity) requires equal base rates; otherwise, one criterion must yield to another, as evidenced in analyses of prediction tools like , where black defendants had higher base rates (around 45% vs. 23% for white). These proofs derive from and basic probability identities, revealing that enforcing independence from protected attributes conflicts with conditioning on outcomes for accuracy. Causal analyses extend these conflicts, framing fairness impossibilities through directed acyclic graphs where protected attributes influence outcomes via confounders; for instance, path-specific effects cannot eliminate disparities without intervening on causes, leading to trade-offs between observational fairness and counterfactual criteria like counterfactual fairness. Such theorems imply that no universal fairness definition exists without assumptions about data-generating processes, prompting debates on whether to prioritize error rate parity or calibrated predictions, with empirical violations observed in datasets exhibiting label imbalance.

Performance Trade-offs

Accuracy and Utility Sacrifices

Imposing fairness constraints on models often results in reduced predictive accuracy, as these constraints limit the model's ability to optimize for the true underlying data distributions, particularly when protected groups exhibit differing base rates or feature-label relationships. In tasks, fairness notions such as demographic parity require adjusting decision thresholds differently across groups, which deviates from the unconstrained Bayes optimal classifier and incurs an accuracy penalty proportional to the divergence in group-specific class probabilities. This theoretical cost arises because fair classifiers effectively minimize a constrained that does not align perfectly with the empirical error rate, leading to higher overall misclassification rates unless group distributions are identical. Empirical analyses confirm these sacrifices in domains with real-world disparities. For instance, in pretrial using Broward County data, enforcing statistical parity or predictive equality on recidivism models increased the detention of low-risk individuals by 10-17%, elevating rates by 4-9% compared to unconstrained models that prioritize public safety via uniform thresholds. Similarly, theoretical bounds in tasks demonstrate that the accuracy of the optimal fair classifier under equalized is strictly lower than the unconstrained version when sensitive attribute correlates with the outcome, with the gap widening as group base rates diverge. While some studies report negligible trade-offs in select public policy applications—such as and —using post-hoc mitigation techniques, these findings are context-specific and may not generalize to scenarios with pronounced causal differences between groups or stricter in-processing constraints. In such cases, utility metrics beyond raw accuracy, like false negative rates in high-stakes decisions, exhibit clearer degradations; for example, fairness enforcement in recidivism prediction trades off overall societal benefit for parity, as measured by increased adverse outcomes in the majority group. These sacrifices highlight a fundamental tension: models tuned for fairness sacrifice to empirical patterns, potentially undermining deployment in environments where group differences reflect genuine predictive signals rather than artifacts.

Empirical Evidence of Costs

Empirical studies demonstrate that enforcing fairness constraints in models frequently incurs costs in terms of reduced predictive accuracy, increased error rates, or diminished utility, particularly when fairness is imposed via in-processing or strong regularization techniques. In clinical risk prediction tasks using datasets such as MIMIC-III, CDM, and STARR, applying penalties for violations of conditional prediction parity or equalized odds (with regularization strength λ up to 10) led to near-universal degradation in group-level performance metrics, including drops in area under the curve (AUROC) ranging from 2% to 5% across subgroups defined by race/, , or age. improvements were observed in some cases, but overall utility, measured by average precision and loss, declined, highlighting heterogeneous trade-offs dependent on the fairness notion and dataset. In applications, such as coupon allocation using real-world clickstream from 400 sessions, implementing statistical parity or equalized with respect to increased financial costs by 8.5% to 9.1% (from USD 0.508 to USD 0.551 per instance), driven by shifts in errors favoring certain groups, alongside minimal but consistent reductions in AUROC. These costs arise because fairness adjustments alter decision thresholds or model weights, reducing overall revenue or resource efficiency in deployment scenarios. Deep learning systems exhibit similar vulnerabilities, with an analysis of 103 experiments across datasets like CelebA, MS-COCO, imSitu, and CIFAR-10S showing that 68% of debiasing interventions (using 22 techniques on ResNet architectures) resulted in lower accuracy, higher accuracy variance, or elevated fairness variance under fixed-seed training. Demographic parity differences varied by up to 12.6% across runs, indicating instability that compounds performance losses when fairness is prioritized over unconstrained optimization. While some contexts, such as post-hoc threshold adjustments in top-k selection problems (e.g., screening or housing inspections), report negligible precision losses (<1 ), these findings are limited to resource-constrained settings and do not generalize to in-processing methods or high-stakes domains where base rates differ substantially across groups. Overall, empirical evidence underscores that fairness costs are not theoretical artifacts but observable degradations, often requiring causal awareness of underlying data generating processes to mitigate without fully sacrificing utility.

Mitigation Approaches

Preprocessing Techniques


Preprocessing techniques in fairness modify the training to reduce disparities associated with protected attributes, such as race or , before applying any learning . These methods seek to enforce statistical criteria like demographic parity in the , where the distribution of predicted outcomes is independent of the protected attribute, without requiring changes to the model's or post-hoc adjustments. By altering features, labels, or sample distributions, preprocessing aims to mitigate inherited biases from historical while preserving overall utility for prediction tasks.
Early approaches, introduced by Kamiran and Calders in 2012, target tasks exhibiting unlawful measured by unequal acceptance rates across groups. Massaging involves selectively flipping the labels of instances from the advantaged group to the group's favor, prioritizing changes that minimize accuracy loss based on to decision boundaries. Reweighing assigns instance-specific weights inversely proportional to group-specific positive rates, balancing the effective contribution of subgroups during training. Resampling, or suppression of labels, adjusts the dataset proportions by or positive and negative examples per to equalize base rates. These techniques demonstrated reductions in on benchmark datasets like the Adult UCI repository, though at the cost of up to 5-10% drops in classifier accuracy depending on the imbalance severity. Subsequent methods focus on learning invariant representations. Zemel et al.'s Learning Fair Representations (LFR) framework, proposed in 2013, optimizes an intermediate embedding space that maximizes predictive utility while minimizing demographic disparities through constraints on between the representation and protected attributes or cost-sensitive fairness metrics. This variational approach compresses data into fair encodings suitable for downstream classifiers. Similarly, Feldman et al.'s Disparate Impact Remover (DIR) from 2015 perturbs non-protected features to decorrelate them from the sensitive attribute, using to bound the protected attribute's predictability from features (e.g., via accuracy thresholds below 0.75) while constraining feature covariance shifts to less than 1% in empirical evaluations on synthetic and real datasets. DIR certifies compliance with U.S. Equal Employment Opportunity Commission guidelines, where disparate impact ratios exceed 0.8. For imbalanced datasets, variants like Fair-SMOTE extend synthetic minority by generating new instances that respect subgroup distributions, preventing bias amplification in minority-protected intersections as shown in credit scoring experiments where standard SMOTE increased violations by 15-20%. Preprocessing generally incurs trade-offs, with meta-analyses indicating average accuracy reductions of 2-8% across fairness metrics, particularly when correlations reflect causal differences rather than artifacts. Empirical studies on datasets like and German Credit confirm these methods achieve parity improvements but highlight sensitivity to fairness definition choices, often prioritizing group-level equality over individual treatment effects.

Inprocessing Techniques

Inprocessing techniques modify the model's training process to incorporate fairness constraints or objectives directly, aiming to produce fair predictions without altering data preprocessing or post-hoc adjustments. These methods typically involve augmenting the loss function with fairness regularization terms, imposing hard constraints during optimization, or using multi-objective formulations that balance accuracy and fairness metrics such as demographic parity or equalized odds. Unlike preprocessing, which addresses bias upstream, inprocessing embeds fairness into the model's parameters, potentially leading to representations that are less sensitive to protected attributes like race or . Explicit inprocessing approaches directly enforce fairness by integrating constraints or penalties into the . Constraint-based methods formulate fairness as hard or soft constraints, such as requiring the prediction rates across groups to satisfy demographic parity, solved via or projected ; for instance, the fairness constraints mechanism ensures group-level parity by penalizing violations proportional to their magnitude during . Objective-based variants add fairness-aware terms to the objective, like regularization that minimizes variance in error rates across subgroups, often implemented in frameworks such as Constrained Optimization for scalable enforcement of metrics like . These explicit techniques allow precise control over targeted fairness criteria but can increase due to solvers. Implicit inprocessing methods achieve fairness indirectly by learning invariant or debiased representations during training. Adversarial debiasing trains the primary predictor to minimize loss on the task while an adversary attempts to predict the sensitive attribute from the model's intermediate representations, using gradient reversal or optimization to reduce mutual information between predictions and protected features; this approach, introduced in supervised contexts, has been shown to approximate statistical in datasets like COMPAS for recidivism prediction. Representation learning variants, such as those using variational autoencoders or domain-invariant features, further promote fairness by encouraging the model to ignore spurious correlations tied to demographics, though they may inadvertently preserve latent biases if the adversary underfits. Empirical evaluations indicate that implicit methods often yield smoother trade-offs between fairness and accuracy compared to explicit ones, particularly in high-dimensional settings. Other inprocessing strategies include for fair generalization across distributions and with fairness rewards, which adapt policies to minimize disparate impacts in sequential . Despite their integration advantages, inprocessing techniques frequently encounter optimization challenges, such as non-convexity from fairness terms leading to suboptimal convergence, and require careful hyperparameter tuning to avoid over-penalizing accuracy. Recent benchmarks highlight variability in performance across datasets, underscoring the need for task-specific validation.

Postprocessing Techniques

Post-processing techniques modify the outputs of a trained model to enforce fairness constraints, preserving the original model's parameters and thus enabling application to already-deployed systems without retraining. This approach contrasts with preprocessing or inprocessing by focusing solely on prediction scores or decisions, typically optimizing group-specific thresholds or transformations to satisfy criteria such as equalized odds or demographic parity. While computationally efficient, these methods often trade off overall predictive accuracy for fairness, as adjustments can distort the model's learned probabilities. A foundational method, introduced by Hardt et al. in 2016, targets equalized odds, requiring that true positive rates (TPR) and false positive rates (FPR) match across protected groups conditional on the true outcome Y. For , the technique derives group-specific decision thresholds from the model's score distribution and (ROC) curves, selecting the Pareto-optimal point that maximizes selected utility (e.g., TPR for positive labels) under the constraint. This post-processing rule ensures independence between predicted risk R and protected attribute A given Y (i.e., RAYR \bot A \mid Y), as formalized in the equalized odds criterion. Empirical evaluations on datasets like the Adult income benchmark showed reductions in disparate error rates by up to 40% with minimal utility loss (e.g., 2-5% drop in balanced accuracy), though performance varied by base model quality. Extensions include multi-class adaptations, which optimize thresholds across outcome levels while approximating equalized odds via true positive rate vectors. Libraries such as Fairlearn implement this via ThresholdOptimizer, fitting separate optimizers per group on held-out data to constrain metrics like equalized odds or false discovery rates. For regression tasks, recent frameworks (as of 2025) generalize post-processing by learning monotonic transformations of predictions to align conditional distributions across groups, applicable to tasks beyond . In web-scale recommendation systems, Pleiss et al. (2022) applied calibrated post-processing to enforce , reporting fairness improvements without full retraining, though at a 1-3% cost in click-through rates. Challenges arise in high-stakes domains like , where post-processing for equalized odds reduced demographic disparities in detection models but failed to eliminate subgroup-specific errors, with TPR gaps persisting at 5-10% and overall AUC dropping by 0.02-0.05. Such trade-offs highlight that post-processing enforces statistical parity post-hoc but does not resolve causal dependencies between features and outcomes, potentially masking underlying data biases. Recent integrations, such as multi-criteria decision analysis (MCDA) for threshold selection (2025), aim to balance fairness, accuracy, and interpretability by weighting objectives explicitly. Despite these advances, indicates post-processing alone seldom achieves both perfect fairness and maximal utility, often requiring complementary causal interventions.

Real-World Applications and Case Studies

Notable Instances of Disparity

In the recidivism prediction domain, the Correctional Offender Management Profiling for Alternative Sanctions () algorithm, used by U.S. courts to assess reoffending risk, displayed racial disparities in error rates. A 2016 ProPublica investigation of 7,214 criminal defendants in , from 2013 to 2014 found that black defendants scored as higher risk than white defendants with equivalent actual histories; specifically, black individuals who did not reoffend were incorrectly labeled high-risk 45% of the time, versus 23% for whites, while whites who reoffended were labeled low-risk 48% of the time, compared to 28% for blacks. These findings highlighted differences in false positive and false negative rates across racial groups, though the algorithm's developers contended that such disparities arise from higher base rates among black defendants (e.g., 63% versus 39% for whites in the dataset), preserving calibration where predicted probabilities aligned with observed outcomes within groups. Facial recognition systems have shown performance disparities across demographic groups in identification accuracy. The U.S. National Institute of Standards and Technology's (NIST) Face Recognition Vendor Test (FRVT) Part 3 evaluated 189 algorithms from 99 developers on datasets including over 18 million images and found that false positive identification rates were substantially higher for non-white faces; for instance, Asian and African American faces were misidentified at rates 10 to 100 times greater than Caucasian faces in one-to-many matching scenarios, with the median for black males reaching 0.00035 compared to 0.0000035 for white males. NIST attributed these differentials primarily to imbalances, such as underrepresentation of certain demographics, rather than intentional design, noting that top-performing algorithms minimized but did not eliminate gaps. In recruitment, Amazon's experimental AI tool for screening job applicants demonstrated gender-based disparities. Developed starting in 2014 and trained on resumes submitted to the company over the prior decade—predominantly from male applicants in technical roles—the system learned to penalize language associated with women, such as "women's chess club captain" or attendance at women's colleges, effectively downgrading female candidates' scores relative to similarly qualified males. Amazon discontinued the tool in early 2017 after internal audits confirmed the bias, which stemmed from historical hiring patterns reflecting male dominance in tech submissions rather than explicit gender features in the model.

Outcomes of Fairness Interventions

Empirical evaluations of fairness interventions in have yielded mixed results, with many studies documenting improvements in fairness metrics alongside declines in overall model utility, such as accuracy or , particularly in predictive tasks where protected attributes correlate with outcomes. For instance, in clinical risk prediction models, enforcing group fairness constraints through in-processing techniques often reduces measures like area under the ROC curve (AUC) by 1-5% while partially mitigating demographic disparities, as observed across datasets from electronic health records. Similarly, adversarial debiasing methods, which train models to remove sensitive attribute information from representations, have shown fairness gains in some scenarios but led to statistically insignificant improvements or outright performance impairments in others, depending on dataset imbalance and feature correlations. In contrast, post-processing interventions applied to applications, such as in , , , and , have demonstrated substantial reductions in outcome disparities—up to 50-80% in demographic parity violations—with negligible impacts on accuracy, as measured in a 2021 study using real-world administrative data from multiple U.S. government agencies. These findings held across varying sizes (5-40% of populations) and resource constraints, suggesting that in settings, reallocating predictions post-training can enhance equity without broad utility losses. However, such negligible trade-offs appear context-specific and less prevalent in unconstrained predictive modeling, where preprocessing massaging or relabeling techniques frequently degrade baseline accuracy by 2-10% to achieve equalized odds or opportunity. Comparative analyses of interventions across datasets like , , and German Credit reveal no universally superior method, with fairness enhancements (e.g., in demographic or equalized ) varying by 10-30% across random train-test splits and sensitive attribute codings, often at the expense of utility metrics like F1-score or . In domain-specific case studies, such as depression prediction models trained on data from the U.S., , , and , bias mitigation via reweighting or thresholding adjusted false positive rates across racial groups but introduced new imbalances in or overall , highlighting the fragility of interventions to heterogeneity. Recent techniques, including geometry-aware debiasing, claim to boost both fairness and accuracy by 1-3% in image classification tasks, though these require validation beyond controlled benchmarks. Overall, while interventions can operationalize fairness in deployed systems, underscores persistent tensions, with utility costs more pronounced in high-stakes domains like healthcare and lending, where causal correlations between attributes and targets limit Pareto improvements. Studies in similarly report accuracy drops of 3-7% post-intervention to equalize error rates across subgroups, reinforcing that outcomes depend on the interplay of , intervention type, and fairness definition employed.

Criticisms and Counterarguments

Challenges to Equity-Focused Fairness

Equity-focused fairness criteria, such as demographic parity—which requires that the probability of a positive outcome be independent of protected attributes like race or , formalized as P(R=rA=a)=P(R=rA=b)P(R=r \mid A=a) = P(R=r \mid A=b) for all outcomes rr and attributes a,ba, b—often conflict with other desiderata in systems. Theoretical impossibility results demonstrate that no non-trivial predictor can simultaneously satisfy multiple group fairness notions, including (predictive parity), equalized odds, and equality of opportunity, unless base rates of the outcome are identical across groups. These theorems, established in analyses of tasks, arise because fairness constraints impose incompatible statistical equalities on error rates and positive predictions, leading to situations where improving fairness in one metric degrades performance in others. Empirical studies reveal frequent trade-offs between enforcing such criteria and maintaining predictive accuracy or societal utility, particularly in high-stakes domains like lending or hiring. For instance, in controlled experiments with real-world datasets, imposing demographic parity reduced overall model accuracy by up to 10-15% in scenarios with heterogeneous group qualifications, as the constraint forces selection probabilities detached from true merit signals. While some applications show negligible accuracy losses under relaxed fairness bounds, broader reviews indicate that strict equity enforcement systematically sacrifices utility when group differences in base rates or covariates reflect causal realities, such as varying qualification distributions. This tension stems from first-principles: optimal predictors leverage all relevant predictors, but equity criteria suppress attribute correlations, effectively discarding information that enhances overall performance. Critics argue that equity-focused approaches overlook legitimate causal and merit-based differences between groups, treating disparate outcomes as presumptive evidence of rather than potential reflections of unequal inputs or behaviors. In hiring models, for example, enforcing parity ignores empirically observed variations in applicant qualifications or effort across demographics, potentially admitting less qualified candidates and eroding institutional . Such methods conflate statistical with causal fairness, failing to distinguish proxy discrimination from genuine disparities rooted in upstream factors like or productivity, which cannot be rectified downstream without intervening on causes. Moreover, by prioritizing outcome equality over individual similarity, group fairness can inadvertently penalize higher-performing groups or create reverse incentives, such as reduced motivation for underrepresented groups to compete on merit, as equal outcomes are guaranteed regardless of preparation. These challenges extend to practical deployment, where equity constraints amplify underrepresentation in low-base-rate groups by mandating allocations misaligned with evidence, potentially harming overall system reliability. For instance, in , equalizing false positive rates across racial groups—despite differing offense prevalences—has been shown to increase overall prediction errors, prioritizing group parity over calibrated individual risk. Proponents of causal realist perspectives contend that true equity requires addressing root causes through policy, not algorithmic post-hoc adjustments that mask underlying realities and undermine trust in data-driven decisions. Empirical validations across datasets confirm that relaxing group constraints in favor of outcome-conditional , such as RAYR \perp A \mid Y, better preserves without assuming outcome uniformity.

Meritocratic and Causal Realist Perspectives

Meritocratic perspectives in fairness prioritize individual and qualification signals over group-level parity constraints, arguing that optimal requires selecting or based on predicted or relative merit rather than equalizing outcomes across demographic groups. In cross-population selection scenarios, such as hiring or , meritocratic fairness defines selection probabilities proportional to an individual's expected contribution, acknowledging that natural variations in qualification distributions across groups can lead to disparate selection rates without implying . This approach quantifies the from imposing stricter group fairness notions, showing that demographic parity or equalized can degrade overall by 10-20% in simulated tasks with heterogeneous group merits, as the constraints force overlooking higher-qualified candidates from overrepresented groups. Critics of equity-focused metrics contend that they conflate statistical disparities with , ignoring of causal differences in traits like cognitive or behavioral outcomes that legitimately influence predictions. For example, in or lending, where base rates differ due to verifiable group variances in (e.g., U.S. data showing Black Americans with 33% higher rearrest rates within three years post-release compared to whites, as of ), enforcing equal positive prediction rates sacrifices predictive accuracy and societal welfare. Meritocratic frameworks mitigate this by preserving rank-ordering within qualification strata, as in contextual bandits where intra-group relative merit guides decisions, reducing against high performers while avoiding inter-group equalization that distorts incentives for effort and development. Causal realist viewpoints stress dissecting disparities through directed acyclic graphs to isolate spurious correlations from legitimate causal paths, rejecting blanket independence assumptions in group fairness definitions that treat protected attributes as acausal. Under counterfactual fairness, a decision is fair if it remains unchanged in a hypothetical world where the protected attribute is altered but all causal descendants (e.g., merit proxies like test scores) are held fixed via do-interventions; this preserves paths where attributes causally affect outcomes through effort or endowment differences, such as genetic or cultural factors influencing . Unlike non-causal metrics, which impose unconditional independence (e.g., P(R|A) independent), causal approaches like path-specific effect mitigation target only discriminatory paths, as empirical studies in healthcare allocation demonstrate that ignoring confounders like inflates false positives by up to 15% when enforcing parity. This realism avoids overcorrection, where interventions assuming historical bias as sole cause overlook data showing, for instance, that Asian American SAT scores exceed other groups by 100-200 points on average ( 2023 data), justifying higher admission rates without fairness violations.

Recent Advances and Debates

Emerging Metrics and Frameworks

Recent developments in fairness emphasize to distinguish spurious correlations from genuine discriminatory effects, addressing limitations of group-based statistical metrics that often conflate historical inequities with . Causal metrics leverage structural causal models (SCMs) and directed acyclic graphs (DAGs) to model data-generating processes, enabling interventions that block unfair causal paths while preserving legitimate ones. For instance, counterfactual fairness assesses whether a model's prediction for an individual would change if their sensitive attribute (e.g., race) were altered in a counterfactual scenario, holding other causally independent factors constant; this requires estimating potential outcomes under interventions, as formalized in SCMs. Path-specific fairness extends this by targeting subsets of causal paths, such as direct effects from sensitive attributes to outcomes or indirect effects through mediators like socioeconomic proxies, allowing mitigation of along specific routes (e.g., blocking discriminatory pathways in hiring models while retaining merit-based ones). Emerging frameworks integrate these metrics into model training via causally constrained (CCML), which enforces fairness constraints derived from causal graphs during optimization, such as minimizing natural effects (NDE) of sensitive attributes on predictions. Natural effect measures the influence of a sensitive variable on outcomes bypassing mediators, while natural indirect effect (NIE) captures mediated paths; frameworks like these have been applied to datasets such as , revealing that statistical parity violations often stem from proxy variables rather than . No proxy , another causal metric, ensures decisions are not influenced by non-causal proxies for protected attributes, using do-calculus to intervene on graph structures. These approaches, reviewed in 2024 surveys, demonstrate improved robustness to distribution shifts compared to non-causal methods, though they demand accurate causal assumptions verifiable via empirical tests like instrumental variables. Fairness drift has emerged as a dynamic metric tracking how model disparities evolve over time due to changing populations or environments, quantified as deviations in metrics like equalized odds across temporal slices; a 2025 study on national health data over 11 years found drift rates up to 20% annually without maintenance, prompting frameworks for periodic causal . Context-aware frameworks, incorporating intersectional attributes (e.g., race-gender interactions), use causal disentanglement to separate sensitive from invariant features, reducing trade-offs with accuracy by 5-10% in benchmarks on tabular data. Despite computational demands—e.g., counterfactual scaling quadratically with variables—these metrics align predictions with causal realism, outperforming statistical baselines in scenarios with , as evidenced by reduced false positives in proxy-heavy domains like lending. Ongoing challenges include assumption sensitivity, where misspecified DAGs amplify errors, necessitating hybrid empirical-causal validation.

Regulatory Standards Versus Self-Regulation

Regulatory standards for fairness in emphasize mandatory compliance for high-risk systems, particularly in jurisdictions like the . The EU AI Act, which entered into force on August 1, 2024, classifies AI systems—including many models—by risk level and imposes stringent requirements on high-risk applications such as biometric identification, credit scoring, and employment decisions. For these, providers must conduct impact assessments, ensure high-quality training data to minimize es, and implement techniques, with non-compliance penalties up to 7% of global annual turnover. Article 10 specifically mandates data governance practices to address es, including diverse datasets and ongoing monitoring, though implementation relies on harmonized standards from bodies like CEN/CENELEC, which have faced criticism for potential delays in specifying technical fairness criteria. These rules aim to enforce statistical parity or similar metrics empirically, but definitional ambiguities in fairness—such as trade-offs between group fairness and individual accuracy—complicate uniform application across models. In contrast, the has adopted a decentralized approach, favoring voluntary guidelines over binding federal regulations for AI fairness. The National Institute of Standards and Technology's , released in January 2023, provides non-mandatory best practices for mapping, measuring, and managing biases in , including bias impact assessments without prescribing specific metrics. 14110, issued in October 2023, directs agencies to develop standards for safe and trustworthy AI, emphasizing equity in federal uses, but leaves private sector fairness largely to self-audits. State-level efforts, such as Colorado's 2024 AI Act requiring impact assessments for high-risk deployments, introduce limited mandates, yet overall enforcement remains patchwork, with critics noting insufficient deterrence against discriminatory outcomes in lending or hiring algorithms. Self-regulation in the industry involves voluntary commitments by companies to internal fairness protocols, often through frameworks like the Partnership on AI or individual corporate principles. For instance, in July 2023, major firms including and pledged under auspices to conduct safety tests, watermark outputs, and report capabilities transparently, yielding improvements in red-teaming for biases but minimal progress on verifiable fairness audits or public disclosure of disparate impact metrics. Proponents argue this agility fosters innovation, as seen in enterprise tools integrating fairness libraries like IBM's AI Fairness 360, allowing customized mitigations without regulatory lag. However, empirical evidence suggests limitations: a 2024 analysis found self-reported efforts often prioritize reputational compliance over rigorous causal bias elimination, with incentives skewed toward performance over equity in competitive markets. Debates on effectiveness highlight trade-offs, with regulatory standards providing enforceability—evidenced by the Act's phased rollout mandating assessments by August 2027—but risking over-specification that ignores context-specific fairness trade-offs, such as losses from enforced demographic parity. Self-regulation, while responsive, has demonstrated gaps in , as voluntary pledges rarely include third-party verification, leading to persistent disparities in deployed models despite internal checks. Hybrid models, incorporating standards bodies for technical guidance under regulatory oversight, emerge as a pragmatic path, though real-world outcomes remain unproven amid rapid ML evolution.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.