This directory conducts federated instruction tuning with a pretrained microsoft/phi-4 model on a General NLP dataset. We use Flower Datasets to download, partition and preprocess the dataset. Flower's Simulation Engine is used to simulate the LLM fine-tuning process in federated way, which allows users to perform the training on a single GPU.
The fine-tuning results have been submitted as a PEFT adapter and can be accessed here:
This experiment performs federated LLM fine-tuning with DoRA using the 🤗PEFT library. The clients' models are aggregated with FedAvg strategy. This provides a baseline performance for the leaderboard of General NLP challenge.
For the microsoft/phi-4 model I adopted the following fine-tuning methodology:
- Precision: bf16for model weights.
- Quantization: 4-bitquantization for reduced memory usage.
- DoRA Configuration:
- Rank (r): 8
- Alpha: 16
- Target Modules:
- qkv_proj
- o_proj
- gate_up_proj
- down_proj
 
 
- Rank (r): 
- Training Configuration:
- Batch size: 8
- Maximum number of steps: 10
- Total number of rounds: 100
- Fraction fit per round: 0.1
 
- Batch size: 
- Learning Rate Scheduler:
- Cosine Annealing over rounds, where:
- Maximum LR: 5e-5
- Minimum LR: 5e-6
 
- Maximum LR: 
- Constant learning rate scheduler over steps
 
- Cosine Annealing over rounds, where:
- Strategy: FedAvg
Below is the training loss plot from the experiment:
This methodology enabled efficient fine-tuning within constrained resources while ensuring competitive performance.
- STEM: 40.66 %
- Social Sciences: 74.52 %
- Humanities: 51.75 %
- Average: 55.64 %
45804.69 Megabytes
Project dependencies are defined in pyproject.toml. Install them in an activated Python environment with:
pip install -e .The dataset is divided into 20 partitions in an IID fashion, a partition is assigned to each ClientApp.
We randomly sample a fraction (0.1) of the total nodes to participate in each round, for a total of 100 rounds.
All settings are defined in pyproject.toml.
Important
Please note that [tool.flwr.app.config.static] and options.num-supernodes under [tool.flwr.federations.local-simulation] are not allowed to be modified for fair competition if you plan to participated in the LLM leaderboard.
Run the challenge with default config values.
The configs are defined in [tool.flwr.app.config] entry of pyproject.toml, and are loaded automatically.
flwr runThe global PEFT model checkpoints are saved every 5 rounds after aggregation on the sever side as default, which can be specified with train.save-every-round under [tool.flwr.app.config] entry in pyproject.toml.
Note
Please provide the last PEFT checkpoint if you plan to participated in the LLM leaderboard.
Use this model as:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-4")
model = PeftModel.from_pretrained(base_model, "mrs83/FlowerTune-phi-4-NLP-PEFT")
