Skip to content

Conversation

yingluAMD
Copy link
Contributor

Proposed changes

Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.

add instances of below base classes:

  • DeviceConvFwd
  • DeviceGroupedConvBwdDataMultipleD

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@bartekxk
Copy link
Contributor

I will take a look at this today or on monday

@bartekxk bartekxk requested a review from Copilot September 19, 2025 14:33
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds new instances with TF32 compute type for various grouped convolution operations to support TensorFloat-32 precision on hardware that supports it. TF32 provides better performance than F32 while maintaining similar numerical accuracy.

  • Adds TF32 compute type variants for grouped convolution forward operations across multiple dimensions (1D, 2D, 3D)
  • Extends existing F32 instances with TF32 compute type for various specializations (scale, clamp, bias operations)
  • Updates CMakeLists and header files to include the new TF32 instance files

Reviewed Changes

Copilot reviewed 143 out of 143 changed files in this pull request and generated 1 comment.

File Description
CMakeLists.txt files Add new TF32 instance files to build configuration
Header files (*.inc, *.hpp) Add function declarations for TF32 instances
Instance files (*.cpp) Implement TF32 variants of grouped convolution operations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yingluAMD
Copy link
Contributor Author

This PR is based on PR-2867. Will turn to normal PR after 2867 merged.

@yingluAMD yingluAMD self-assigned this Sep 25, 2025
@bartekxk bartekxk requested a review from Copilot September 25, 2025 22:12
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 61 out of 62 changed files in this pull request and generated 8 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

TF32>>>& instances);

void add_device_grouped_conv3d_bwd_weight_xdl_ndhwgc_gkzyxc_ndhwgk_f32_pad0_pipev2_instances(
void add_device_grouped_conv3d_bwd_weight_xdl_ndhwgc_gkzyxc_ndhwgk_f32_pad0_pipev5_instances(
Copy link
Preview

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate function declaration. This function is already declared at line 651-653. The duplicate declaration should be removed.

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be wrong check. Because:

  1. This is not a new code introduced via this PR.
  2. There is no duplicated function in this file.

Copy link
Contributor

@bartekxk bartekxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I stopped review at this point because there are a lot of changes which are not need so please:

  • Remove non grouped conv code
  • Dont add oher layouts than NHWGC

@@ -0,0 +1,207 @@
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file will be removed.

ck::tensor_operation::device::ConvolutionBackwardDataSpecialization::Default;

template <ck::index_t NDimSpatial>
using DeviceConvNdBwdDataInstance = ck::tensor_operation::device::DeviceConvNdBwdDataNwcKxcNwk_Xdl<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont add non grouped conv examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. will remove those tests.

ck::tensor_operation::device::ConvolutionBackwardDataSpecialization::Default;

template <ck::index_t NDimSpatial>
using DeviceConvNdBwdDataInstance = ck::tensor_operation::device::DeviceConvNdBwdDataNwcKxcNwk_Xdl<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont add non grouped conv examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

double max_err = std::numeric_limits<double>::min();
for(std::size_t i = 0; i < ref.size(); ++i)
{
const double o = *std::next(std::begin(out), i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you need compute data type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because TF32 compute need a different threshold. Additional information is needed to distinguish different compute data type with same in/out datatypes.

typename InElementwiseOperation,
typename WeiElementwiseOperation,
typename OutElementwiseOperation>
typename OutElementwiseOperation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont add non grouped conv code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

typename BLayout,
typename DsLayout,
typename ELayout,
ConvolutionBackwardDataSpecialization ConvSpec>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont extend NGCHW instances at now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

"Error: this operator requires the same compute type");
if constexpr(is_same_v<ComputeTypeA, TF32>)
{
add_device_grouped_conv2d_bwd_data_xdl_gnhwk_gkyxc_gnhwc_f32_tf32_instances(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont add other instances than NHWGC at now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

void add_device_grouped_conv2d_bwd_data_xdl_gnhwk_gkyxc_gnhwc_f32_tf32_instances(
std::vector<std::unique_ptr<DeviceGroupedConvBwdDataMultipleD<2,
GNHWK,
GKYXC,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont add other instances than NHWGC at now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

static_assert(is_same_v<ComputeTypeA, ComputeTypeB>,
"Error: ComputeTypeA and ComputeTypeB should be the same");
if constexpr(is_same_v<ComputeTypeA, float>)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont add other instances than NHWGC at now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

# ONLY XDL_KERNELS
set(DEVICE_CONV2D_FWD_INSTANCES)
list(APPEND DEVICE_CONV2D_FWD_INSTANCES device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_tf32_instance.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont add non grouped conv code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@yingluAMD
Copy link
Contributor Author

yingluAMD commented Sep 26, 2025

Hi, I stopped review at this point because there are a lot of changes which are not need so please:

  • Remove non grouped conv code
  • Dont add oher layouts than NHWGC

Hi @bartekxk so, we only need nhwc/nhwgc/ndhwgc layout now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants