Once you have determined that Counterfactual Logit Pairing (CLP) is the appropriate technique for your use case, you can apply it by taking the following steps:
- Create an instance of
CounterfactualPackedInputs
with the original and counterfactual data. - Measure the flip rate and flip count to determine if intervention is needed.
- If intervention is needed, pass in the original input data, counterfactual data, original model, and couterfactual loss to the counterfactual model.
- Assess the impact of CLP by measuring the flip rate and flip count.
To see an example of applying the CLP to a Keras model, see the Use Counterfactual Logit Pairing with Keras tutorial.
Create an instance of CounterfactualPackedInputs
To create the counterfactual dataset, start by determining the terms and features you want to assess that, when removed or replaced, may alter the prediction of your model.
Once you understand the terms and features to assess on, you will need to
create an instance of CounterfactualPackedInputs
, which includes the original
input and counterfactual data. The original input should be the dataset
you used to train your Keras model. Counterfactual data has an original_x
value, a counterfactual_x
value and a counterfactual_sample_weight
. The
counterfactual value should be nearly identical to the original value with
the difference being one or more of the sensitive attributes are removed or
replaced. The quality of the counterfactual dataset is important as it is used
to pair the loss function between the original value and the counterfactual
value with the goal of assuring that the model’s prediction doesn’t change
when the sensitive attribute is different.
For details on how to develop this counterfactual dataset, see the notebook on creating a custom counterfactual dataset.
Measure flip count and flip rate
A flip is defined as a classifier giving a different decision when the sensitive attribute referenced in the example changes. It captures the situation where a classifier changes its prediction in the presence, absence, or change of an identity attribute. A more continuous metric should be used when assessing the real value (score) of a classifier.
Flip Count
Flip count measures the number of times the classifier gives a different decision if the identity term in a given example were changed.
- Overall Flip Count: Total flips of a prediction from positive to negative and vice versa.
- Positive to Negative Prediction Flip Count: Number of flips where the prediction label changed from positive to negative.
- Negative to Positive Prediction Flip Count: Number of flips where the prediction label changed from negative to positive.
Flip Rate
Flip rate measures the probability that the classifier gives a different decision if the identity term in a given example were changed.
- Overall Flip Rate: Total flip count over the total number of examples
- Positive to Negative Prediction Flip Rate: Positive to negative flip count over positive examples in counterfactual dataset
- Negative to Positive Prediction Flip Rate: Negative to positive flip count over negative examples in counterfactual dataset
After calculating the flip rate and flip count with Fairness Indicators, you can determine if the classifier is making a different prediction based on a sensitive attribute within the data. You can use the example count and confidence intervals to determine if you have sufficient data to apply CLP and draw conclusions from the flip rate. A high flip rate and flip count are indicative of this behavior occurring and can be used to decide whether CLP is appropriate for your use case. This decision is specific to your model and depends on factors such as the harm that may be caused to end users and the product that the model is used in.
Apply Counterfactual Logit Pairing to your Keras Model
To use CLP, you need the original Keras model you're
looking to remediate, the original training dataset, and the counterfactual
dataset. Determine what counterfactual loss
should be applied for the logit pairing. With this, you can build the
Counterfactual model with the desired counterfactual loss function and loss
function from your original model.
After applying CLP, you should calculate the flip rate and flip count, and any changes in other metrics such as overall accuracy to measure the improvement that resulted from applying this technique.