-
Notifications
You must be signed in to change notification settings - Fork 239
Conv:TF32: add more instances - 2 #2879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
I will take a look at this today or on monday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds new instances with TF32 compute type for various grouped convolution operations to support TensorFloat-32 precision on hardware that supports it. TF32 provides better performance than F32 while maintaining similar numerical accuracy.
- Adds TF32 compute type variants for grouped convolution forward operations across multiple dimensions (1D, 2D, 3D)
- Extends existing F32 instances with TF32 compute type for various specializations (scale, clamp, bias operations)
- Updates CMakeLists and header files to include the new TF32 instance files
Reviewed Changes
Copilot reviewed 143 out of 143 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
CMakeLists.txt files | Add new TF32 instance files to build configuration |
Header files (*.inc, *.hpp) | Add function declarations for TF32 instances |
Instance files (*.cpp) | Implement TF32 variants of grouped convolution operations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scale.hpp
Outdated
Show resolved
Hide resolved
This PR is based on PR-2867. Will turn to normal PR after 2867 merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 61 out of 62 changed files in this pull request and generated 8 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
TF32>>>& instances); | ||
|
||
void add_device_grouped_conv3d_bwd_weight_xdl_ndhwgc_gkzyxc_ndhwgk_f32_pad0_pipev2_instances( | ||
void add_device_grouped_conv3d_bwd_weight_xdl_ndhwgc_gkzyxc_ndhwgk_f32_pad0_pipev5_instances( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate function declaration. This function is already declared at line 651-653. The duplicate declaration should be removed.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be wrong check. Because:
- This is not a new code introduced via this PR.
- There is no duplicated function in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I stopped review at this point because there are a lot of changes which are not need so please:
- Remove non grouped conv code
- Dont add oher layouts than NHWGC
@@ -0,0 +1,207 @@ | |||
// SPDX-License-Identifier: MIT | |||
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. | |
// Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file will be removed.
ck::tensor_operation::device::ConvolutionBackwardDataSpecialization::Default; | ||
|
||
template <ck::index_t NDimSpatial> | ||
using DeviceConvNdBwdDataInstance = ck::tensor_operation::device::DeviceConvNdBwdDataNwcKxcNwk_Xdl< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont add non grouped conv examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. will remove those tests.
ck::tensor_operation::device::ConvolutionBackwardDataSpecialization::Default; | ||
|
||
template <ck::index_t NDimSpatial> | ||
using DeviceConvNdBwdDataInstance = ck::tensor_operation::device::DeviceConvNdBwdDataNwcKxcNwk_Xdl< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont add non grouped conv examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
double max_err = std::numeric_limits<double>::min(); | ||
for(std::size_t i = 0; i < ref.size(); ++i) | ||
{ | ||
const double o = *std::next(std::begin(out), i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you need compute data type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because TF32 compute need a different threshold. Additional information is needed to distinguish different compute data type with same in/out datatypes.
typename InElementwiseOperation, | ||
typename WeiElementwiseOperation, | ||
typename OutElementwiseOperation> | ||
typename OutElementwiseOperation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont add non grouped conv code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
typename BLayout, | ||
typename DsLayout, | ||
typename ELayout, | ||
ConvolutionBackwardDataSpecialization ConvSpec> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont extend NGCHW instances at now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
"Error: this operator requires the same compute type"); | ||
if constexpr(is_same_v<ComputeTypeA, TF32>) | ||
{ | ||
add_device_grouped_conv2d_bwd_data_xdl_gnhwk_gkyxc_gnhwc_f32_tf32_instances( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont add other instances than NHWGC at now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
void add_device_grouped_conv2d_bwd_data_xdl_gnhwk_gkyxc_gnhwc_f32_tf32_instances( | ||
std::vector<std::unique_ptr<DeviceGroupedConvBwdDataMultipleD<2, | ||
GNHWK, | ||
GKYXC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont add other instances than NHWGC at now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
static_assert(is_same_v<ComputeTypeA, ComputeTypeB>, | ||
"Error: ComputeTypeA and ComputeTypeB should be the same"); | ||
if constexpr(is_same_v<ComputeTypeA, float>) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont add other instances than NHWGC at now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
# ONLY XDL_KERNELS | ||
set(DEVICE_CONV2D_FWD_INSTANCES) | ||
list(APPEND DEVICE_CONV2D_FWD_INSTANCES device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp | ||
device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_tf32_instance.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont add non grouped conv code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Hi @bartekxk so, we only need nhwc/nhwgc/ndhwgc layout now? |
Proposed changes
Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.
add instances of below base classes:
Checklist
Please put an
x
into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-format
on all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered