Fairness-enforcing methods minimize interattribute differences in the model predictions, where each attribute corresponds to some relevant subgroup within the input distribution. For example, in speech recognition, two possible subgroups are African American Vernacular English (AAVE) and non-AAVE. To improve social equity, we seek to minimize any differences in model quality between the two groups.
Concretely, given a model \(f(\cdot; \theta)\) and training data \(\{(X_i, A_i, Y_i)\}\) indexed by \(i \in \mathcal{I}_\text{training}\), we want to learn a model \(f(\cdot; \theta^*)\) such that \(\hat{Y} = f(X; \theta^*)\) is conditionally independent of the attribute \(A\) given the prediction \(Y\).
Romano et al. [1] describe an adversarial approach