Releases: kozistr/pytorch_optimizer
Releases · kozistr/pytorch_optimizer
pytorch-optimizer v3.8.2
Change Log
Feature
- Speed-up
zeropower_via_newtonschulzup to 20% by utilizingtorch.baddmmandtorch.addmmops. (#448)
Update
- Refactor the type hints. (#448)
Fix
- Resolved compatibility issue with lower PyTorch versions where
torch.optim.optimizer.ParamTcould not be imported. (#448)
Docs
- Convert the docstring style from reST to google-style docstring. (#449)
pytorch-optimizer v3.8.1
Change Log
Feature
- Implement
FriendlySAMoptimizer. (#424, #434) - Implement
AdaGOoptimizer. (#436, #437) - Update
EXAdamoptimizer to the latest version. (#438) - Update
EmoNavioptimizer to the latest version. (#433, #439) - Implement
Condaoptimizer. (#440, #441)
Update
Bug
- Fix NaN problem when grad norm is zero in StableSPAM optimizer. (#431)
Docs
- Update the documentation page. (#428)
Contribution
thanks to @liveck, @AhmedMostafa16
pytorch-optimizer v3.8.0
Change Log
Feature
- Implement
EmoNecoandEmoZealoptimizers. (#407) - Implement
Refined Schedule-Free AdamWoptimizer. (#409, #414)- Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
- You can use this variant by setting
decoupling_cparameter in theScheduleFreeAdamWoptimizer.
- Add more built-in optimizers,
NAdam,RMSProp, andLBFGSoptimizers. (#415) - Support
cautiousvariant forMuonoptimizer. (#417) - Separate distributed functionality from
MuontoDistribtuedMuonoptimizer. (#418) - Implement
StochasticAccumulator, which is a gradient hook. (#418)
Update
- Re-implement
MuonandAdaMuonoptimizers based on the recent official implementation. (#408, #410)- Their definitions have changed from the previous version, so please check out the documentation!
- Update the missing optimizers from
__init__.py. (#415) - Add the HuggingFace Trainer example. (#415)
- Optimize the visualization outputs and change the visualization document to a table layout. (#416)
Dependency
- Update
mkdocsdependencies. (#417)
CI
Contributions
thanks to @AidinHamedi
pytorch-optimizer v3.7.0
Change Log
Feature
- Implement
AdaMuonoptimizer. (#394, #395) - Implement
SPlusoptimizer. (#396, #399) - Implement
EmoNavi,EmoFact, andEmoLynxoptimizers. (#393, #400)
CI
Fix
- Adjust the value of
epsto the fixed value1e-15when adding toexp_avg_sq. (#397, #398) - built-in type-hint in
Kronoptimizer. (#404)
Contributions
Thanks to @sobolevn
pytorch-optimizer v3.6.1
Change Log
Feature
- Implement more cooldown types for WSD learning rate scheduler. (#382, #386)
- Implement
AdamWSNoptimizer. (#387, #389) - Implement
AdamCoptimizer. (#388, #390)
Update
- Change the default range of the
betaparameter from[0, 1]to[0, 1). (#392)
Fix
- Fix to use
momentum bufferinstead of the gradient to calculate LMO. (#385)
pytorch-optimizer v3.6.0
Change Log
Feature
- Implement
Firaoptimizer. (#376) - Implement
RACSandAliceoptimizers. (#376) - Implement
VSGDoptimizer. (#377, #378) - Enable training with complex parameters. (#370, #380)
- will raise
NoComplexParameterErrorfor unsupported optimizers, due to its design or not-yet-implemented.
- will raise
- Support
maximizeparameter. (#370, #380)maximize: maximize the objective with respect to the params, instead of minimizing.
- Implement
copy_stochastic()method. (#381)
Update
- Support 2D< Tensor for
RACSandAliceoptimizers. (#380) - Remove the auxiliary variants from the default parameters of the optimizers and change the name of the state and parameter. (#380)
use_gc,adanorm,cautious,stable_adamw, andadam_debiaswill be affected.- You can still use these variants by passing the parameters to
**kwargs. - Notably, in case of
adanormvariant, you need to passadanorm(andadanorm_rforroption) parameter(s) to use this variant, and the name of the state will be changed fromexp_avg_normtoexp_avg_adanorm.
- Refactor
reset()toinit_group()method in theBaseOptimizerclass. (#380) - Refactor
SAMoptimizer family. (#380) - Gather
AdamP,SGDPthings intopytorch_optimizer.optimizer.adamp.*. (#381)pytorch_optimizer.optimizer.sgdp.SGDPtopytorch_optimizer.optimizer.adamp.SGDPpytorch_optimizer.optimizer.util.projectiontopytorch_optimizer.optimizer.adamp.projectionpytorch_optimizer.optimizer.util.cosine_similarity_by_viewtopytorch_optimizer.optimizer.adamp.cosine_similarity_by_view
- Remove
channel_view()andlayer_view()frompytorch_optimizer.optimizer.util. (#381)
Fix
- Fix shape mismatch issues in the Galore projection for
reverse_std,right, andfullprojection types. (#376)
pytorch-optimizer v3.5.1
pytorch-optimizer v3.5.0
Change Log
Feature
- Support
StableSPAMoptimizer. (#358, #359) - Support
ScheduleFreeWrapper. (#334, #360) - Implement
AdaGCoptimizer. (#364, #366) - Implement
Simplified-Ademamixoptimizer. (#364, #366) - Support
Ackleyfunction for testing optimization algorithms.
Update
- Update Muon optimizer. (#355, #356)
- support decoupled weight decay.
- adjust default hyperparameters the same as the original implementation.
- support adjusted lr from the Moonlight. you can use it by setting
use_adjusted_lr=True.
- Tune the performance of the coupled Newton iteration method by 5% increase. (#360)
- Update
SCIONoptimizer. (#361)- add
scaleparameter. - update
get_lmo_direction.
- add
Fix
pytorch-optimizer v3.4.2
Change Log
Feature
Update
- Update ScheduleFreeSGD, AdamW, RAdam optimizers with the latest. (#351, #353)
- Remove
use_palmvariant in ScheduleFree optimizer due to instability. (#353) - Ranger25 optimizer. (#353)
Fix
Docs
- Fix
AliGoptimizer visualization. (#350)
Contributions
thanks to @AidinHamedi, @hatonosuke
pytorch-optimizer v3.4.1
Change Log
Feature
- Support
GCSAMoptimizer. (#343, #344)- Gradient Centralized Sharpness Aware Minimization
- you can use it from
SAMoptimizer by settinguse_gc=True.
- Support
LookSAMoptimizer. (#343, #344)
Update
- Support alternative precision training for
Shampoooptimizer. (#339) - Add more features to and tune
Ranger25optimizer. (#340)AGC+Lookaheadvariants- change default beta1, beta2 to 0.95 and 0.98 respectively
- Skip adding
Lookaheadwrapper in case ofRanger*optimizers, which already have it increate_optimizer(). (#340) - Improved optimizer visualization. (#345)
- Rename
pytorch_optimizer.optimizer.gctopytorch_optimizer.optimizer.gradient_centralizationto avoid possible conflict with Python built-in functiongc. (#349)
Bug
Docs
- Update the visualizations. (#340)
Contributions
thanks to @AidinHamedi