This directory conducts federated instruction tuning with a pretrained Qwen/Qwen2.5-Coder-0.5B-Instruct model on a Code dataset. We use Flower Datasets to download, partition and preprocess the dataset. Flower's Simulation Engine is used to simulate the LLM fine-tuning process in federated way, which allows users to perform the training on a single GPU.
This baseline performs federated LLM fine-tuning with DoRA using the 🤗PEFT library.
The clients' models are aggregated with FedAvg strategy.
This provides a baseline performance for the leaderboard of Code challenge.
For the Qwen/Qwen2.5-Coder-0.5B-Instruct model I adopted the following fine-tuning methodology:
- Precision: bf16for model weights.
- Quantization: 4-bitquantization for reduced memory usage.
- Optimizer: paged_adamw_8bit
- DoRA Configuration:
- Rank (r): 32
- Alpha: 64
- Target Modules:
- down_proj,
- gate_up_proj,
- o_proj,
- qkv_proj,
 
 
- Rank (r): 
- Training Configuration:
- Batch size: 8
- Maximum number of steps: 10
- Total number of rounds: 100
- Fraction fit per round: 0.2
 
- Batch size: 
- Learning Rate Scheduler:
- Cosine Annealing over rounds, where:
- Maximum LR: 5e-5
- Minimum LR: 5e-6
 
- Maximum LR: 
- Constant learning rate scheduler over steps
 
- Cosine Annealing over rounds, where:
- Strategy: FedAvg
Below is the training loss plot from the experiment:
- MBPP: 25.60 %
- HumanEval: 37.81 %
- MultiPL-E (JS): 41.00 %
- MultiPL-E (C++): 32.92 %
- Average: 34.34 %
The evaluation was conducted on an RTX A4000 16GB.
8922.66 MB
For this experiment, I utilized CUDO Compute as the GPU compute provider.
| Component | Specification | 
|---|---|
| GPU | 1 × RTX A4000 16 GB | 
| vCPUs | 4 | 
| CPU | AMD EPYC (Milan) | 
| Memory | 16 GB | 
For an example on how to set up a GPU computing resource on CUDO Compute by using Terraform, please check ./terraform/.
| Component | Details | Cost/hr | 
|---|---|---|
| vCPUs | 4 cores | $0.0088/hr | 
| Memory | 16 GB | $0.056/hr | 
| GPU | 1 × RTX A4000 | $0.25/hr | 
| Component | Details | Cost/hr | 
|---|---|---|
| Boot Disk Size | 70 GB | $0.0077/hr | 
| Component | Details | Cost/hr | 
|---|---|---|
| Public IPv4 Address | N/A | $0.005/hr | 
| Total Cost/hr | 
|---|
| $0.3275/hr | 
| Parameter | Value | 
|---|---|
| Runtime | 1924.52 seconds (00:32:04) | 
| Simulation Cost | $0.18 | 
Project dependencies are defined in pyproject.toml. Install them in an activated Python environment with:
python -m pip install --upgrade pip wheel setuptools packaging
pip install -e .
pip install flash-attn --no-build-isolation   # Install FlashAttention-2The dataset is divided into 10 partitions in an IID fashion, a partition is assigned to each ClientApp.
We randomly sample a fraction (0.2) of the total nodes to participate in each round, for a total of 100 rounds.
All settings are defined in pyproject.toml.
Important
Please note that [tool.flwr.app.config.static] and options.num-supernodes under [tool.flwr.federations.local-simulation] are not allowed to be modified for fair competition if you plan to participated in the LLM leaderboard.
Run the challenge with default config values.
The configs are defined in [tool.flwr.app.config] entry of pyproject.toml, and are loaded automatically.
flwr runPlease check flowertune-eval-code.
The global PEFT model checkpoints are saved every 5 rounds after aggregation on the sever side as default, which can be specified with train.save-every-round under [tool.flwr.app.config] entry in pyproject.toml.
