Skip to content

RebelsNLU-jaist/llmu-robustness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llmu-robustness

Official implementation for the paper: Improving LLM Unlearning Robustness via Random Perturbations


Installation

Create environment:

conda create -n llmu-robustness
conda activate llmu-robustness
pip install -r requirements.txt

Evaluation Framework

We use the lm-evaluation-harness for evaluation.

git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

Dataset

Download the required datasets from the WMDP repository and place them in the data/ directory.

Unlearning

Run the unlearning process using one of the following methods: For example: RMU and SimNPO

python -m baselines.rm.rmu.unlearn \
    --model_name_or_path "HuggingFaceH4/zephyr-7b-beta" \
    --max_num_batches 500 \
    --alpha "1200,1200" \
    --steering_coeffs "6.5,6.5" \
    --seed 42 \
    --batch_size 4 \
    --nu 0.0 \
    --layer_id 7 \
    --layer_ids "5,6,7";
python -m baselines.po.simnpo.unlearn \
    --model_name_or_path "HuggingFaceH4/zephyr-7b-beta" \
    --max_num_batches $BATCH \
    --alpha "20,50" \
    --retain_loss_fn "kl" \
    --seed 42 \
    --beta 0.1 \
    --nu 0.0 \
    --batch_size 4 \
    --target_layers "7" \
    --layer_ids "5,6,7";

To perform a grid search over unlearning methods: For RM in [rmu, adaptive_rmu, rsv]

bash scripts/rm/$RM.sh

For PO in [dpo, npo, simnpo]

bash scripts/po/$PO.sh

Trained models will be saved at checkpoints/

Evaluation

!lm-eval --model hf \
    --model_args pretrained=$CHECK_POINT \
    --tasks mmlu,wmdp \
    --batch_size=16

About

Improving LLM Unlearning Robustness via Random Perturbations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published