New Model: VaultGemma
VaultGemma is a 1-billion-parameter, 26-layer, text-only decoder model trained with sequence-level differential privacy (DP).
Derived from Gemma 2, its architecture notably drops the norms after the Attention and MLP blocks and uses full attention for all layers, rather than alternating with local sliding attention.
The pretrained model is available with a 1024-token sequence length.
What's Changed
- Add DP research model by @sachinprasadhs in #2396
Full Changelog: v0.22.1...v0.22.2