llmu-robustness

Official implementation for the paper: Improving LLM Unlearning Robustness via Random Perturbations

Installation

Create environment:

conda create -n llmu-robustness
conda activate llmu-robustness
pip install -r requirements.txt

Evaluation Framework

We use the lm-evaluation-harness for evaluation.

git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

Dataset

Download the required datasets from the WMDP repository and place them in the data/ directory.

Unlearning

Run the unlearning process using one of the following methods: For example: RMU and SimNPO

python -m baselines.rm.rmu.unlearn \
    --model_name_or_path "HuggingFaceH4/zephyr-7b-beta" \
    --max_num_batches 500 \
    --alpha "1200,1200" \
    --steering_coeffs "6.5,6.5" \
    --seed 42 \
    --batch_size 4 \
    --nu 0.0 \
    --layer_id 7 \
    --layer_ids "5,6,7";

python -m baselines.po.simnpo.unlearn \
    --model_name_or_path "HuggingFaceH4/zephyr-7b-beta" \
    --max_num_batches $BATCH \
    --alpha "20,50" \
    --retain_loss_fn "kl" \
    --seed 42 \
    --beta 0.1 \
    --nu 0.0 \
    --batch_size 4 \
    --target_layers "7" \
    --layer_ids "5,6,7";

To perform a grid search over unlearning methods: For RM in [rmu, adaptive_rmu, rsv]

bash scripts/rm/$RM.sh

For PO in [dpo, npo, simnpo]

bash scripts/po/$PO.sh

Trained models will be saved at checkpoints/

Evaluation

!lm-eval --model hf \
    --model_args pretrained=$CHECK_POINT \
    --tasks mmlu,wmdp \
    --batch_size=16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llmu-robustness

Installation

Evaluation Framework

Dataset

Unlearning

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
baselines		baselines
data		data
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

RebelsNLU-jaist/llmu-robustness

Folders and files

Latest commit

History

Repository files navigation

llmu-robustness

Installation

Evaluation Framework

Dataset

Unlearning

Evaluation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages