Conv:TF32: add more instances - 2 #2879

yingluAMD · 2025-09-19T05:53:11Z

Proposed changes

Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.

add instances of below base classes:

DeviceConvFwd
DeviceGroupedConvBwdDataMultipleD

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

…stances

bartekxk · 2025-09-19T12:19:21Z

I will take a look at this today or on monday

Copilot

Pull Request Overview

This PR adds new instances with TF32 compute type for various grouped convolution operations to support TensorFloat-32 precision on hardware that supports it. TF32 provides better performance than F32 while maintaining similar numerical accuracy.

Adds TF32 compute type variants for grouped convolution forward operations across multiple dimensions (1D, 2D, 3D)
Extends existing F32 instances with TF32 compute type for various specializations (scale, clamp, bias operations)
Updates CMakeLists and header files to include the new TF32 instance files

Reviewed Changes

Copilot reviewed 143 out of 143 changed files in this pull request and generated 1 comment.

File	Description
CMakeLists.txt files	Add new TF32 instance files to build configuration
Header files (.inc, .hpp)	Add function declarations for TF32 instances
Instance files (*.cpp)	Implement TF32 variants of grouped convolution operations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scale.hpp

yingluAMD · 2025-09-22T08:15:42Z

This PR is based on PR-2867. Will turn to normal PR after 2867 merged.

Copilot

Pull Request Overview

Copilot reviewed 61 out of 62 changed files in this pull request and generated 8 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

profiler/src/profile_grouped_conv_fwd_clamp.cpp

profiler/src/profile_grouped_conv_bwd_weight.cpp

Copilot · 2025-09-25T22:14:20Z

...include/ck/library/tensor_operation_instance/gpu/grouped_convolution_backward_weight_xdl.inc

+                                                           TF32>>>& instances);

-void add_device_grouped_conv3d_bwd_weight_xdl_ndhwgc_gkzyxc_ndhwgk_f32_pad0_pipev2_instances(
+void add_device_grouped_conv3d_bwd_weight_xdl_ndhwgc_gkzyxc_ndhwgk_f32_pad0_pipev5_instances(


Duplicate function declaration. This function is already declared at line 651-653. The duplicate declaration should be removed.

Should be wrong check. Because:

This is not a new code introduced via this PR.

There is no duplicated function in this file.

bartekxk

Hi, I stopped review at this point because there are a lot of changes which are not need so please:

Remove non grouped conv code
Dont add oher layouts than NHWGC

bartekxk · 2025-09-25T22:14:07Z

example/17_convnd_bwd_data/convnd_bwd_data_xdl_fp32.cpp

@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: MIT
+// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.


Suggested change

// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.

// Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved.

this file will be removed.

bartekxk · 2025-09-25T22:14:25Z

example/17_convnd_bwd_data/convnd_bwd_data_xdl_fp32.cpp

+    ck::tensor_operation::device::ConvolutionBackwardDataSpecialization::Default;
+
+template <ck::index_t NDimSpatial>
+using DeviceConvNdBwdDataInstance = ck::tensor_operation::device::DeviceConvNdBwdDataNwcKxcNwk_Xdl<


Please dont add non grouped conv examples

ok. will remove those tests.

bartekxk · 2025-09-25T22:14:40Z

example/17_convnd_bwd_data/convnd_bwd_data_xdl_fp32_tf32.cpp

+    ck::tensor_operation::device::ConvolutionBackwardDataSpecialization::Default;
+
+template <ck::index_t NDimSpatial>
+using DeviceConvNdBwdDataInstance = ck::tensor_operation::device::DeviceConvNdBwdDataNwcKxcNwk_Xdl<


Please dont add non grouped conv examples

same as above.

bartekxk · 2025-09-25T22:16:25Z

include/ck/library/utility/check_err.hpp

+    double max_err = std::numeric_limits<double>::min();
+    for(std::size_t i = 0; i < ref.size(); ++i)
+    {
+        const double o = *std::next(std::begin(out), i);


Why you need compute data type here?

Because TF32 compute need a different threshold. Additional information is needed to distinguish different compute data type with same in/out datatypes.

bartekxk · 2025-09-25T22:16:41Z

include/ck/tensor_operation/gpu/device/device_conv_fwd.hpp

          typename InElementwiseOperation,
          typename WeiElementwiseOperation,
-          typename OutElementwiseOperation>
+          typename OutElementwiseOperation,


Please dont add non grouped conv code

bartekxk · 2025-09-25T22:19:20Z

...n_instance/gpu/grouped_conv_bwd_data/device_grouped_conv_bwd_data_transpose_xdl_instance.hpp

+          typename BLayout,
+          typename DsLayout,
+          typename ELayout,
+          ConvolutionBackwardDataSpecialization ConvSpec>


Please dont extend NGCHW instances at now

bartekxk · 2025-09-25T22:23:11Z

library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_backward_data.hpp

+                                  "Error: this operator requires the same compute type");
+                    if constexpr(is_same_v<ComputeTypeA, TF32>)
+                    {
+                        add_device_grouped_conv2d_bwd_data_xdl_gnhwk_gkyxc_gnhwc_f32_tf32_instances(


Please dont add other instances than NHWGC at now

bartekxk · 2025-09-25T22:23:21Z

...y/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_backward_data_xdl.inc

+void add_device_grouped_conv2d_bwd_data_xdl_gnhwk_gkyxc_gnhwc_f32_tf32_instances(
+    std::vector<std::unique_ptr<DeviceGroupedConvBwdDataMultipleD<2,
+                                                                  GNHWK,
+                                                                  GKYXC,


Please dont add other instances than NHWGC at now

bartekxk · 2025-09-25T22:23:30Z

...ary/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_backward_weight.hpp

+                    static_assert(is_same_v<ComputeTypeA, ComputeTypeB>,
+                                  "Error: ComputeTypeA and ComputeTypeB should be the same");
+                    if constexpr(is_same_v<ComputeTypeA, float>)
+                    {


Please dont add other instances than NHWGC at now

bartekxk · 2025-09-25T22:24:39Z

library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeLists.txt

 # ONLY XDL_KERNELS
 set(DEVICE_CONV2D_FWD_INSTANCES)
 list(APPEND DEVICE_CONV2D_FWD_INSTANCES device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
+                                        device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_tf32_instance.cpp


Please dont add non grouped conv code

yingluAMD · 2025-09-26T03:51:28Z

Hi, I stopped review at this point because there are a lot of changes which are not need so please:

Remove non grouped conv code

Dont add oher layouts than NHWGC

Hi @bartekxk so, we only need nhwc/nhwgc/ndhwgc layout now?

yingluAMD added 7 commits September 18, 2025 10:59

conv:tf32:add more instances

58e82b8

add instances of device_grouped_conv_fwd_xdl_f32_comp_instances

45d0057

add instances of device_grouped_conv_fwd_xdl_f32_tf32_mem_instances

823ee07

add instances of device_grouped_conv_fwd_xdl_large_tensor_f32_tf32_in…

255a25d

…stances

review

58a3fa1

tf32:conv:add instances for base class DeviceConvFwd

7f6962e

tf32:conv:add instances for base class DeviceGroupedConvBwdDataMultipleD

ddfc65d

bartekxk requested a review from Copilot September 19, 2025 14:33

Copilot AI reviewed Sep 19, 2025

View reviewed changes

library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scale.hpp Outdated Show resolved Hide resolved

tf32:conv:add instances for base class DeviceGroupedConvBwdWeight

de9a550

yingluAMD added 3 commits September 23, 2025 14:23

self review

b3db6c1

add tf32 in profiler

b3bb54f

Merge branch 'develop' into tf32_instance_0919

623a991

yingluAMD marked this pull request as ready for review September 25, 2025 03:14

yingluAMD requested review from illsilin, carlushuang, qianfengz, aosewski, poyenc, geyyer, bartekxk, andriy-ca, afagaj, asleepzzz, ThomasNing, coderfeli, aska-0096 and cgmillette as code owners September 25, 2025 03:14

yingluAMD requested review from shumway, vidyasagar-amd, a team and tenpercent as code owners September 25, 2025 03:14

yingluAMD added 5 commits September 25, 2025 11:17

remove useless instances

7a653cd

remove gnhwc/ngchw/ngcdhw instances

040aee6

remove useless bwd instances

a8d9fbe

change check_err for tf32

94da54b

fix clang-format fail

f54bab1

yingluAMD self-assigned this Sep 25, 2025

bartekxk requested a review from Copilot September 25, 2025 22:12

Copilot AI reviewed Sep 25, 2025

View reviewed changes

bartekxk reviewed Sep 25, 2025

View reviewed changes

remove non-ndhwgc/nhwgc/nhwc instances

374e6bb

yingluAMD added 3 commits September 26, 2025 12:06

complement ndhwgc instances

6f66571

update copyright datetime

1be39fc

add check in IsSupportedArgument()

a1b65ec

		@@ -0,0 +1,207 @@
		// SPDX-License-Identifier: MIT
		// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.

	// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
	// Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved.

Conv:TF32: add more instances - 2 #2879

Are you sure you want to change the base?

Conv:TF32: add more instances - 2 #2879

Conversation

yingluAMD commented Sep 19, 2025

Proposed changes

Checklist

Discussion

Uh oh!

bartekxk commented Sep 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

yingluAMD commented Sep 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bartekxk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yingluAMD commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

yingluAMD commented Sep 26, 2025 •

edited

Loading