Releases · kozistr/pytorch_optimizer

26 Oct 05:57

kozistr

v3.8.2

4b866c2

pytorch-optimizer v3.8.2 Latest

Latest

Change Log

Feature

Speed-up zeropower_via_newtonschulz up to 20% by utilizing torch.baddmm and torch.addmm ops. (#448)

Update

Refactor the type hints. (#448)

Fix

Resolved compatibility issue with lower PyTorch versions where torch.optim.optimizer.ParamT could not be imported. (#448)

Docs

Convert the docstring style from reST to google-style docstring. (#449)

Assets 2

18 Oct 10:44

kozistr

v3.8.1

657092f

pytorch-optimizer v3.8.1

Change Log

Feature

Implement FriendlySAM optimizer. (#424, #434)
- Friendly Sharpness-Aware Minimization
Implement AdaGO optimizer. (#436, #437)
- AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates
Update EXAdam optimizer to the latest version. (#438)
Update EmoNavi optimizer to the latest version. (#433, #439)
Implement Conda optimizer. (#440, #441)
- Conda: Column-Normalized Adam for Training Large Language Models Faster

Update

Accept the GaloreProjector parameters in the init params of the Conda optimizer. (#443, #444)

Bug

Fix NaN problem when grad norm is zero in StableSPAM optimizer. (#431)

Docs

Update the documentation page. (#428)

Contribution

thanks to @liveck, @AhmedMostafa16

Contributors

liveck and AhmedMostafa16

Assets 2

13 Aug 14:06

kozistr

v3.8.0

483a816

pytorch-optimizer v3.8.0

Change Log

Feature

Implement EmoNeco and EmoZeal optimizers. (#407)
Implement Refined Schedule-Free AdamW optimizer. (#409, #414)
- Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
- You can use this variant by setting decoupling_c parameter in the ScheduleFreeAdamW optimizer.
Add more built-in optimizers, NAdam, RMSProp, and LBFGS optimizers. (#415)
Support cautious variant for Muon optimizer. (#417)
Separate distributed functionality from Muon to DistribtuedMuon optimizer. (#418)
Implement StochasticAccumulator, which is a gradient hook. (#418)
- stochastic optimizer

Update

Re-implement Muon and AdaMuon optimizers based on the recent official implementation. (#408, #410)
- Their definitions have changed from the previous version, so please check out the documentation!
Update the missing optimizers from __init__.py. (#415)
Add the HuggingFace Trainer example. (#415)
Optimize the visualization outputs and change the visualization document to a table layout. (#416)

Dependency

Update mkdocs dependencies. (#417)

CI

Add some GitHub actions to automate some processes. (#411, #412, #413)

Contributions

thanks to @AidinHamedi

Contributors

AidinHamedi

Assets 2

28 Jul 14:33

kozistr

v3.7.0

ecdf6b6

pytorch-optimizer v3.7.0

Change Log

Feature

Implement AdaMuon optimizer. (#394, #395)
- Adaptive Muon Optimizer
Implement SPlus optimizer. (#396, #399)
- A Stable Whitening Optimizer for Efficient Neural Network Training
Implement EmoNavi, EmoFact, and EmoLynx optimizers. (#393, #400)
- An emotion-driven optimizer that feels loss and navigates accordingly

CI

Enable CI for Python 3.8 ~ 3.13. (#402, #404)

Fix

Adjust the value of eps to the fixed value 1e-15 when adding to exp_avg_sq. (#397, #398)
built-in type-hint in Kron optimizer. (#404)

Contributions

Thanks to @sobolevn

Contributors

sobolevn

Assets 2

05 Jul 11:42

kozistr

v3.6.1

77098e9

pytorch-optimizer v3.6.1

Change Log

Feature

Implement more cooldown types for WSD learning rate scheduler. (#382, #386)
Implement AdamWSN optimizer. (#387, #389)
- Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees
Implement AdamC optimizer. (#388, #390)
- Why Gradients Rapidly Increase Near the End of Training

Update

Change the default range of the beta parameter from [0, 1] to [0, 1). (#392)

Fix

Fix to use momentum buffer instead of the gradient to calculate LMO. (#385)

Assets 2

17 May 10:36

kozistr

v3.6.0

9753eda

pytorch-optimizer v3.6.0

Change Log

Feature

Implement Fira optimizer. (#376)
- Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Implement RACS and Alice optimizers. (#376)
- Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
Implement VSGD optimizer. (#377, #378)
- Variational Stochastic Gradient Descent for Deep Neural Networks
Enable training with complex parameters. (#370, #380)
- will raise NoComplexParameterError for unsupported optimizers, due to its design or not-yet-implemented.
Support maximize parameter. (#370, #380)
- maximize: maximize the objective with respect to the params, instead of minimizing.
Implement copy_stochastic() method. (#381)

Update

Support 2D< Tensor for RACS and Alice optimizers. (#380)
Remove the auxiliary variants from the default parameters of the optimizers and change the name of the state and parameter. (#380)
- use_gc, adanorm, cautious, stable_adamw, and adam_debias will be affected.
- You can still use these variants by passing the parameters to **kwargs.
- Notably, in case of adanorm variant, you need to pass adanorm (and adanorm_r for r option) parameter(s) to use this variant, and the name of the state will be changed from exp_avg_norm to exp_avg_adanorm.
Refactor reset() to init_group() method in the BaseOptimizer class. (#380)
Refactor SAM optimizer family. (#380)
Gather AdamP, SGDP things into pytorch_optimizer.optimizer.adamp.*. (#381)
- pytorch_optimizer.optimizer.sgdp.SGDP to pytorch_optimizer.optimizer.adamp.SGDP
- pytorch_optimizer.optimizer.util.projection to pytorch_optimizer.optimizer.adamp.projection
- pytorch_optimizer.optimizer.util.cosine_similarity_by_view to pytorch_optimizer.optimizer.adamp.cosine_similarity_by_view
Remove channel_view() and layer_view() from pytorch_optimizer.optimizer.util. (#381)

Fix

Fix shape mismatch issues in the Galore projection for reverse_std, right, and full projection types. (#376)

Assets 2

26 Apr 17:01

kozistr

v3.5.1

84b926c

pytorch-optimizer v3.5.1

Change Log

Feature

Implement ScionLight optimizer. (#369)

Update

Update SCION optimizer based on the official implementation. (#369)

Fix

Correct the learning rate ratio in Muon optimizer properly. (#371, #372, #373)

Assets 2

16 Mar 07:03

kozistr

v3.5.0

6397d56

pytorch-optimizer v3.5.0

Change Log

Feature

Support StableSPAM optimizer. (#358, #359)
- How to Train in 4-Bit More Stably than 16-Bit Adam
Support ScheduleFreeWrapper. (#334, #360)
Implement AdaGC optimizer. (#364, #366)
- Improving Training Stability for Large Language Model Pretraining
Implement Simplified-Ademamix optimizer. (#364, #366)
- Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Support Ackley function for testing optimization algorithms.

Update

Update Muon optimizer. (#355, #356)
- support decoupled weight decay.
- adjust default hyperparameters the same as the original implementation.
- support adjusted lr from the Moonlight. you can use it by setting use_adjusted_lr=True.
Tune the performance of the coupled Newton iteration method by 5% increase. (#360)
Update SCION optimizer. (#361)
- add scale parameter.
- update get_lmo_direction.

Fix

bias_correction2 in ScheduleFreeRAdam optimizer. (#354)
potential bug in SPAM optimizer. (#365)
initialize the z state within the step() of the ScheduleFreeWrapper. (#363, #366)

Assets 2

22 Feb 06:08

kozistr

v3.4.2

c09d18b

pytorch-optimizer v3.4.2

Change Log

Feature

Implement SCION optimizer. (#348, #352)
- Training Deep Learning Models with Norm-Constrained LMOs

Update

Update ScheduleFreeSGD, AdamW, RAdam optimizers with the latest. (#351, #353)
Remove use_palm variant in ScheduleFree optimizer due to instability. (#353)
Ranger25 optimizer. (#353)

Fix

Remove weight decouple parameter in ScheduleFree optimizers. (#351, #353)

Docs

Fix AliG optimizer visualization. (#350)

Contributions

thanks to @AidinHamedi, @hatonosuke

Contributors

hatonosuke and AidinHamedi

Assets 2

14 Feb 11:57

kozistr

v3.4.1

00fbae0

pytorch-optimizer v3.4.1

Change Log

Feature

Support GCSAM optimizer. (#343, #344)
- Gradient Centralized Sharpness Aware Minimization
- you can use it from SAM optimizer by setting use_gc=True.
Support LookSAM optimizer. (#343, #344)
- Towards Efficient and Scalable Sharpness-Aware Minimization

Update

Support alternative precision training for Shampoo optimizer. (#339)
Add more features to and tune Ranger25 optimizer. (#340)
- AGC + Lookahead variants
- change default beta1, beta2 to 0.95 and 0.98 respectively
Skip adding Lookahead wrapper in case of Ranger* optimizers, which already have it in create_optimizer(). (#340)
Improved optimizer visualization. (#345)
Rename pytorch_optimizer.optimizer.gc to pytorch_optimizer.optimizer.gradient_centralization to avoid possible conflict with Python built-in function gc. (#349)

Bug

Fix to update exp_avg_sq after calculating the denominator in ADOPT optimizer. (#346, #347)

Docs

Update the visualizations. (#340)

Contributions

thanks to @AidinHamedi

Contributors

AidinHamedi

Assets 2

Releases: kozistr/pytorch_optimizer

pytorch-optimizer v3.8.2

Change Log

Feature

Update

Fix

Docs

Uh oh!

pytorch-optimizer v3.8.1

Change Log

Feature

Update

Bug

Docs

Contribution

Contributors

Uh oh!

pytorch-optimizer v3.8.0

Change Log

Feature

Update

Dependency

CI

Contributions

Contributors

Uh oh!

pytorch-optimizer v3.7.0

Change Log

Feature

CI

Fix

Contributions

Contributors

Uh oh!

pytorch-optimizer v3.6.1

Change Log

Feature

Update

Fix

Uh oh!

pytorch-optimizer v3.6.0

Change Log

Feature

Update

Fix

Uh oh!

pytorch-optimizer v3.5.1

Change Log

Feature

Update

Fix

Uh oh!

pytorch-optimizer v3.5.0

Change Log

Feature

Update

Fix

Uh oh!

pytorch-optimizer v3.4.2

Change Log

Feature

Update

Fix

Docs

Contributions

Contributors

Uh oh!

pytorch-optimizer v3.4.1

Change Log

Feature

Update

Bug

Docs

Contributions

Contributors

Uh oh!