This repository contains the code and data for the experiments included in the paper:
Saillenfest, A. & Lemberger, P. (2025). Nonlinear Concept Erasure: A Density Matching Approach. (to appear in Proceedings of ECAI 2025- 28th European Conference on Artificial Intelligence).
The full version of the paper (article + supplementary material) can be found here: https://arxiv.org/abs/2507.12341
To run the experiments:
- Download Bias in Bios
- Download the top 150k GloVe embeddings (solely used to calculate WS-353)
- Download DIAL
- Run the experiments
To download Bias in Bios and compute the embeddings of the representations, follow the instructions in "datasets/biasbios/readme.md"
Download the top 150k GloVe embeddings from the RLACE (Ravfogel et al., 2022) repository : https://nlp.biu.ac.il/~ravfogs/rlace/glove/glove-top-50k.pickle Put the file in "datasets/GloVe" under the name: "glove-top-150k.pickle".
To download DIAL (Deepmoji), follow the instructions from the KRaM repository: https://github.com/brcsomnath/KRaM/blob/master/data/README.md
To run the experiments:
python main.py
Output projections will be stored in "./outputs". The program runs the evaluation after each training. Some configurations are already defined in main.py, a base config is used to define default parameters in configs/base_config.yml
To generate the t-SNE vizualisation of the embeddings, run "xxx_display.ipynb" after selecting the appropriate projection to evaluate (xxx is GloVe, biasbios, or dial)
Python version: 3.12.9
Setup an environment (we used uv)
# Create the environment
uv venv leopard# Install the libraries
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
uv pip install scikit-learn concept_erasure pyyaml transformers tqdm ipykernel matplotlib# Activate the environment
source leopard/bin/activateSee also the requirement.txt file.
