Releases · KernelTuner/kernel_tuner

12 Oct 13:02

fjwillemsen

1.0.0b3

e980b23

Version 1.0.0b3 Pre-release

Pre-release

This is a beta release for early access to the new features. Not intended for production use.

This version contains several bugfixes:

Fix snap_to_nearest on non-numeric parameters by @stijnh in #221
Fixed an issue where some restrictions would not be recognized by the old check_restrictions function.
Fixed an issue where bayes_opt would not handle pruned parameters correctly.

Full Changelog: 1.0.0b2...1.0.0b3

Contributors

stijnh

Assets 2

11 Oct 16:37

fjwillemsen

1.0.0b2

0e009fd

Version 1.0.0b2 Pre-release

Pre-release

This is a beta release for early access to the new features. Not intended for production use.

Full Changelog: 1.0.0b1...1.0.0b2

Assets 2

11 Oct 07:03

fjwillemsen

1.0.0b1

42d8bfe

Version 1.0.0 beta 1 Pre-release

Pre-release

This is a beta release for early access to the new features. Not intended for production use.

What's Changed

HIP Backend by @MiloLurati in #199
Accuracy tuning by @stijnh in #189
Fix issue where HIP backend fails due to invalid arguments type by @stijnh in #216
Searchspace improvements and project meta modernization by @fjwillemsen in #214
Minor bugfix by @isazi in #219
OpenACC support by @isazi in #197
Fixed broken tests as per issue #217 by @fjwillemsen in #220

New Contributors

@MiloLurati made their first contribution in #199

Full Changelog: 0.4.5...1.0.0b1

Contributors

isazi, stijnh, and 2 other contributors

Assets 2

01 Jun 20:11

benvanwerkhoven

0.4.5

b3ff4cd

Version 0.4.5

Version 0.4.5 adds support of using PMT in combination with Kernel Tuner enabling power and energy measurements on a wide range of devices. In addition, we have worked extensively on the internals of Kernel Tuner and the interfaces of the separate components that together make up Kernel Tuner. Along with a few bugfixes, fixes of small errors in examples and documentation.

[0.4.5] - 2023-06-01

Added

PMTObserver to measure power and energy on various platforms

Changed

Improved functionality for storing output and metadata files
Updated PowerSensorObserver to support PowerSensor3
Refactored interal interfaces of runners and backends
Bugfix in interface to set objective and optimization direction

Assets 2

09 Mar 11:21

benvanwerkhoven

0.4.4

d0dc834

Version 0.4.4

Version 0.4.4 adds extended support for energy efficiency tuning. In particular, with the new capability to fit a performance model to the target GPUs power-frequency curve. How to use these features is demonstrated in:
https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/going_green_performance_model.py

And described in the paper:

Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning
R. Schoonhoven, B. Veenboer, B. van Werkhoven, K. J. Batenburg
International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC22) 2022
https://arxiv.org/abs/2211.07260

Other than that, we've implemented a new output and metadata JSON format that adheres to the 'T4' auto-tuning schema created by the auto-tuning community at the Lorentz Center workshop in March 2022.

From the changelog:

[0.4.4] - 2023-03-09

Added

Support for using time_limit in simulation mode
Helper functions for energy tuning
Example to show ridge frequency and power-frequency model
Functions to store tuning output and metadata

Changed

Changed what timings are stored in cache files
No longer inserting partial loop unrolling factor of 0 in CUDA

Assets 2

19 Oct 15:45

benvanwerkhoven

0.4.3

8792726

Version 0.4.3

The version 0.4.3 release consists of a large number of changes to the internals of Kernel Tuner, including the addition of a new backend based on Nvidia's official Python bindings for CUDA, as well as improved functionality for tuning energy efficiency, e.g. measuring core voltages, the measurement of power and the interface with NVML has also improved a lot.

Some of the changes are also in the "externals" of Kernel Tuner. In the sense that we have migrated from https://github.com/benvanwerkhoven/ to https://github.com/KernelTuner. The goal of this move is to bring the collection of repositories belonging to the larger Kernel Tuner project under one organization.

From the Changelog:

[0.4.3] - 2022-10-19

Added

A new backend that uses Nvidia cuda-python
Support for locked clocks in NVMLObserver
Support for measuring core voltages using NVML
Support for custom preprocessor definitions
Support for boolean scalar arguments in PyCUDA backend

Changed

Migrated from github.com/benvanwerkhoven to github.com/KernelTuner
Significant update to the documentation pages
Unified benchmarking loops across backends
Backends are no longer context managers
Replaced the method for measuring power consumption using NVML
Improved NVML measurements of temperature and clock frequencies
bugfix in parse_restrictions when using and/or in expressions
bugfix in GreedyILS when using neighbor method "adjacent"
bugfix in Bayesian Optimization for small problems

Assets 2

23 May 14:59

benvanwerkhoven

0.4.2

7f2bdc8

Version 0.4.2

Version 0.4.2 includes a lot of work on the search space representation, application of restrictions, and optimization strategies. In addition to the addition of several new optimization strategies, most optimization strategies should see improved performance both in terms of the number of evaluated kernel configurations as well as execution time.

Added

new optimization strategies: dual annealing, greedly ILS, ordered greedy MLS, greedy MLS
support for constant memory in cupy backend
constraint solver to cut down time spent in creating search spaces
support for custom tuning objectives
support for max_fevals and time_limit in strategy_options of all strategies

Removed

alternative Bayesian Optimization strategies that could not be used directly
C++ wrapper module that was too specific and hardly used

Changed

string-based restrictions are compiled into functions for improved performance
genetic algorithm, MLS, ILS, random, and simulated annealing use new search space object
diff evo, firefly, PSO are initialized using population of all valid configurations
all strategies except brute_force strictly adhere to max_fevals and time_limit
simulated annealing adapts annealing schedule to max_fevals if supplied
minimize, basinhopping, and dual annealing start from a random valid config

Assets 2

10 Sep 12:50

benvanwerkhoven

0.4.1

1da9a60

Version 0.4.1

This version adds a brand new Bayesian Optimization strategy, as well as some smaller features and fixes.

[0.4.1] - 2021-09-10

Added

support for PyTorch Tensors as input data type for kernels
support for smem_args in run_kernel
support for (lambda) function and string for dynamic shared memory size
a new Bayesian Optimization strategy

Changed

optionally store the kernel_string with store_results
improved reporting of skipped configurations

Assets 2

09 Apr 11:50

benvanwerkhoven

0.4.0

b09efbc

Version 0.4.0

This version adds a great deal of new functionality and extra flexibility and additional control to the user over what is being benchmarked and when. From the CHANGELOG:

Added

support for (lambda) function instead of list of strings for restrictions
support for (lambda) function instead of list for specifying grid divisors
support for (lambda) function instead of tuple for specifying problem_size
function to store the top tuning results
function to create header file with device targets from stored results
support for using tuning results in PythonKernel
option to control measurements using observers
support for NVML tunable parameters
option to simulate auto-tuning searches from existing cache files
Cupy backend to support C++ templated CUDA kernels
support for templated CUDA kernels using PyCUDA backend
documentation on tunable parameter vocabulary

Assets 2

04 Nov 19:56

benvanwerkhoven

0.3.2

2e9138f

Version 0.3.2

This version adds several new and recent features. Most importantly is the new feature to specify user-defined metrics for Kernel Tuner to compute along with the benchmarking results. User-defined metrics are composable, so you can define metrics that build upon other metrics. The documentation pages have also been updated to include this new feature and other recent changes.

An important change that might influence benchmark results reported by Kernel Tuner is the fact that the runner will now do a warm up of the device using the first kernel in the parameter space. This is to remove any startup or cold start delays that were significantly slowing down the first benchmarked kernel on many devices.

From the changelog:

[0.3.2] - 2020-11-04

Added

support loop unrolling using params that start with loop_unroll_factor
always insert "define kernel_tuner 1" to allow preprocessor ifdef kernel_tuner
support for user-defined metrics
support for choosing the optimization starting point x0 for most strategies

Changed

more compact output is printed to the terminal
sequential runner runs first kernel in the parameter space to warm up device
updated tutorials to demonstrate use of user-defined metrics

Assets 2

Releases: KernelTuner/kernel_tuner

Version 1.0.0b3

Contributors

Uh oh!

Version 1.0.0b2

Uh oh!

Version 1.0.0 beta 1

What's Changed

New Contributors

Contributors

Uh oh!

Version 0.4.5

[0.4.5] - 2023-06-01

Added

Changed

Uh oh!

Version 0.4.4

Version 0.4.4

[0.4.4] - 2023-03-09

Added

Changed

Uh oh!

Version 0.4.3

From the Changelog:

[0.4.3] - 2022-10-19

Added

Changed

Uh oh!

Version 0.4.2

Added

Removed

Changed

Uh oh!

Version 0.4.1

[0.4.1] - 2021-09-10

Added

Changed

Uh oh!

Version 0.4.0

Added

Uh oh!

Version 0.3.2

Version 0.3.2

[0.3.2] - 2020-11-04

Added

Changed

Uh oh!