Bump nncf from 2.6.0 to 2.18.0 #830

dependabot · 2025-09-08T03:05:50Z

Bumps nncf from 2.6.0 to 2.18.0.

Release notes

v2.18.0

Post-training Quantization:

Features:

(OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitrary user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types.

(OpenVINO) Added support for compressing FP8 (f8e4m3 and f8e5m2) weights to 4-bit data types, which is particularly beneficial for models like DeepSeek-R1.

Added group_size_fallback_mode parameter for advanced weight compression. It controls how nodes that do not support the default group size are handled. By default (IGNORE), such nodes are skipped. With ERROR, an exception is raised if the channel size is not divisible by the group size, while ADJUST attempts to modify the group size so it becomes valid.

(TorchFX) Added support for external quantizers in the quantize_pt2e API, including XNNPACKQuantizer and CoreMLQuantizer. Users now can quantize their models in ExecuTorch for the XNNPACK and CoreML backends via the nncf quantize_pt2e employing smooth quant, bias correction algorithms and a wide range of statistic collectors.

(ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an example demonstrating the data-aware weight compression pipeline using the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model in ONNX format.

Improvements:

Support of weight compression for models with the Rotary Positional Embedding block.

Support of weight compression for models with stateful self-attention blocks.

Tutorials:

Post-Training Optimization of Qwen-Agent

Post-Training Optimization of FLUX.1 Kontext Model

Post-Training Optimization of Qwen3 Embedding Model

Post-Training Optimization of GLM-4.1V-9B-Thinking Model

Compression-aware training:

Features:

(PyTorch) Enhanced initialization for "QAT with absorbable LoRA" using advanced compression methods (AWQ + Scale Estimation). This improvement replaces the previous basic data-free compression approach, enabling QAT to start with a more accurate model baseline and achieve superior final accuracy.

Improvements:

(PyTorch) Streamlined "QAT with absorbable LoRA" by removing checkpoint selection based on validation set. This change significantly reduces overall tuning time and maximum allocated memory. While the results on Wikitext are slightly worse, it provides a more efficient and faster tuning pipeline (e.g. reduced from 32 minutes to 25 minutes for SmoLM-1.7B).

Tutorials:

(TorchFX) Added example for compression of TinnyLama-1.1B.

Updated example to meet NPU implementation.

Implemented fast evaluation and improved output in example.

Deprecations/Removals:

Removed examples that used create_compressed_model API.

Requirements:

Updated PyTorch (2.8.0) and Torchvision (0.23.0) versions.

Set require setuptools>=77 to build package.

Acknowledgements

Thanks for contributions from the OpenVINO developer community: @bopeng1234 @jpablomch

v2.17.0

Post-training Quantization:

General:

(PyTorch) The function_hook module is now the default mechanism for model tracing. It has moved out from experimental status and has been moved to the core nncf.torch namespace.

Features:

(OpenVINO, PyTorch, TorchFX) Added 4-bit data-free AWQ (Activation-aware Weight Quantization) based on the per-column magnitudes of the weights making it possible to apply AWQ without a dataset for more accurate compression.

... (truncated)

Changelog

Sourced from nncf's changelog.

New in Release 2.18.0

Post-training Quantization:

Features:

(OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitrary user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types.

(OpenVINO) Added support for compressing FP8 (f8e4m3 and f8e5m2) weights to 4-bit data types, which is particularly beneficial for models like DeepSeek-R1.

Added group_size_fallback_mode parameter for advanced weight compression. It controls how nodes that do not support the default group size are handled. By default (IGNORE), such nodes are skipped. With ERROR, an exception is raised if the channel size is not divisible by the group size, while ADJUST attempts to modify the group size so it becomes valid.

(TorchFX) Added support for external quantizers in the quantize_pt2e API, including XNNPACKQuantizer and CoreMLQuantizer. Users now can quantize their models in ExecuTorch for the XNNPACK and CoreML backends via the nncf quantize_pt2e employing smooth quant, bias correction algorithms and a wide range of statistic collectors.

(ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an example demonstrating the data-aware weight compression pipeline using the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model in ONNX format.

Improvements:

Support of weight compression for models with the Rotary Positional Embedding block.

Support of weight compression for models with stateful self-attention blocks.

Tutorials:

Post-Training Optimization of Qwen-Agent

Post-Training Optimization of FLUX.1 Kontext Model

Post-Training Optimization of Qwen3 Embedding Model

Post-Training Optimization of GLM-4.1V-9B-Thinking Model

Compression-aware training:

Features:

(PyTorch) Enhanced initialization for "QAT with absorbable LoRA" using advanced compression methods (AWQ + Scale Estimation). This improvement replaces the previous basic data-free compression approach, enabling QAT to start with a more accurate model baseline and achieve superior final accuracy.

Improvements:

(PyTorch) Streamlined "QAT with absorbable LoRA" by removing checkpoint selection based on validation set. This change significantly reduces overall tuning time and maximum allocated memory. While the results on Wikitext are slightly worse, it provides a more efficient and faster tuning pipeline (e.g. reduced from 32 minutes to 25 minutes for SmoLM-1.7B).

Tutorials:

(TorchFX) Added example for compression of TinnyLama-1.1B.

Updated example to meet NPU implementation.

Implemented fast evaluation and improved output in example.

Deprecations/Removals:

Removed examples that used create_compressed_model API.

Requirements:

Updated PyTorch (2.8.0) and Torchvision (0.23.0) versions.

Set require setuptools>=77 to build package.

New in Release 2.17.0

Post-training Quantization:

General:

(PyTorch) The function_hook module is now the default mechanism for model tracing. It has moved out from experimental status and has been moved to the core nncf.torch namespace.

Features:

(OpenVINO, PyTorch, TorchFX) Added 4-bit data-free AWQ (Activation-aware Weight Quantization) based on the per-column magnitudes of the weights making it possible to apply AWQ without a dataset for more accurate compression.

(OpenVINO) Added support for quantizing of the value input for ScaledDotProductAttention for FP8.

(ONNX) Added support for data-free weight compression using INT4 (INT8) in the ONNX backend. Added an example for LLM weight compression in the ONNX backend. This example showcases the optimization of the TinyLlama-1.1B-Chat-v0.3 model in ONNX format using the NNCF weight compression API.

(ONNX) Added the BackendParameters.EXTERNAL_DATA_DIR parameter for the ONNX backend. This parameter specifies the absolute path to the directory where the model's external data files are stored. All external data files must be located in the same directory. It should be used when the model is loaded without external data using onnx.load("model.onnx", load_external_data=False), and the external data files are not in the current working directory of the process. This parameter can be omitted if the external data files are located in the current working directory of the process.

... (truncated)

Commits

fec6246 Release v2.18.0 of NNCF to master
b34d24e Release v2.17.0 of NNCF to master
7b04b6e Release v2.16.0 of NNCF to master
43c3c6e Release v2.15.0 of NNCF to master
5bfbed5 Release v2.14.1 of NNCF to master
66f63a4 Release v2.14.0 of NNCF to master
e0939f6 Release v2.13.0 of NNCF to master
af5e8ff Release v2.12.0 of NNCF to master
0336e85 Release v2.11.0 of NNCF to master
526f5ff Release v2.10.0 of NNCF to master
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [nncf](https://github.com/openvinotoolkit/nncf) from 2.6.0 to 2.18.0. - [Release notes](https://github.com/openvinotoolkit/nncf/releases) - [Changelog](https://github.com/openvinotoolkit/nncf/blob/v2.18.0/ReleaseNotes.md) - [Commits](openvinotoolkit/nncf@v2.6.0...v2.18.0) --- updated-dependencies: - dependency-name: nncf dependency-version: 2.18.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Sep 8, 2025

dependabot bot mentioned this pull request Sep 8, 2025

Bump nncf from 2.6.0 to 2.17.0 #822

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bump nncf from 2.6.0 to 2.18.0 #830

Bump nncf from 2.6.0 to 2.18.0 #830

Uh oh!

dependabot bot commented on behalf of github Sep 8, 2025

Uh oh!

Uh oh!

Bump nncf from 2.6.0 to 2.18.0 #830

Are you sure you want to change the base?

Bump nncf from 2.6.0 to 2.18.0 #830

Uh oh!

Conversation

dependabot bot commented on behalf of github Sep 8, 2025

v2.18.0

v2.17.0

New in Release 2.18.0

New in Release 2.17.0

Uh oh!

Uh oh!