1.1.0: Standardization, reporting mode, extension of visualizations, and many enhancements
LatestThis major release introduces substantial updates to further standardize and extend this software for evaluation of auto-tuning algorithms, in particular by the following:
- Visualization and reporting of results are modular, scores can be obtained programmatically using
report_experiments
. - New benchmark_hub repository and submodule hosting brute-forced benchmarking resources.
- Extended and improved visualizations:
- Heatmaps for comparison per search space and / or over time
- Head-to-head plots for direct comparisons on practical impact (beta)
- New color handling based on existing color maps
- Improved Experiments Schema:
- Strategy grouping and coloring
- Extended visualization settings such as visual minima and maxima
- Improvements to consistency:
- Improved and expanded tests
- File handling modes explicitly define encodings for consistent behavior
- Ensuring correct aggregation data is used
- Further improvements and changes:
- Switched to NumPy 2.0
- Python 3.12 and 3.13 support, 3.9 was dropped
- Auto-retry on missing data, smart cutoff handling, better error messages
- Execution Engine Enhancements
- Time-based cutoffs and runtime conversion
- Flexible support for optimization direction, cutoffs, objectives, and valid result thresholds
- Reformatted code with Ruff, improved test coverage
- Implemented a new baseline strategy execution framework (
ExecutedStrategyBaseline
) - Implemented automatic hiding of strategies used as an executed baseline
- Updated other plot types to respect the ‘hide’ key for strategies
- Introduced
get_colors
function for strategy grouping and improved color assignment
In addition, there are various bug fixes and minor improvements.
For more details, refer to the full changelog: 1.0.0...1.1.0