[CK Tile] contraction multi d #2901

msaffari-amd · 2025-09-23T07:37:03Z

Proposed changes

Contraction + Multi D Kernel in ck_tile using universal gemm. kernel supports multi G, M, N, K dimensions to do contraction on inputs and apply element wise on multi number of D tensors. an example to has been implemented too.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

…el. it is a temporary commit

…fferent reference calculation algorithms

…me logs

…B, E

…hes and some code cleaning

Copilot

Pull Request Overview

This PR introduces a batched contraction implementation for CK Tile that supports multi-dimensional tensor contractions using the universal GEMM kernel. The implementation supports tensors with multiple G, M, N, K dimensions and applies element-wise operations on multiple D tensors.

Multi-dimensional batched contraction kernel with configurable tensor dimensions
Universal GEMM-based implementation with support for split-K batching
Complete example implementation with CPU reference validation

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
include/ck_tile/ops/batched_contraction/pipeline/batched_contraction_problem.hpp	Defines problem template for batched contraction operations
include/ck_tile/ops/batched_contraction/kernel/batched_contraction_kernel.hpp	Main kernel implementation with host/device argument structures
include/ck_tile/ops/batched_contraction/kernel/batched_conratction_utils.hpp	Utility header (currently empty)
include/ck_tile/ops/batched_contraction.hpp	Main include header aggregating all contraction components
example/ck_tile/CMakeLists.txt	Adds batched contraction example to build
example/ck_tile/40_batched_contraction/*	Complete example implementation with utilities and test cases

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

include/ck_tile/ops/batched_contraction/kernel/batched_contraction_kernel.hpp

include/ck_tile/ops/batched_contraction.hpp

include/ck_tile/ops/batched_contraction/kernel/batched_contraction_kernel.hpp

example/ck_tile/40_batched_contraction/CMakeLists.txt

example/ck_tile/40_batched_contraction/batched_contraction.cpp

example/ck_tile/40_batched_contraction/run_batched_contraction_example.inc

include/ck_tile/ops/batched_contraction/pipeline/batched_contraction_problem.hpp

…ased on review feedback

example/ck_tile/40_batched_contraction/CMakeLists.txt

example/ck_tile/40_batched_contraction/contraction_utils.hpp

example/ck_tile/40_batched_contraction/run_batched_contraction_example.inc

aosewski · 2025-09-26T14:17:11Z

include/ck_tile/ops/batched_contraction/kernel/batched_contraction_kernel.hpp

+          ck_tile::index_t NumDimN,
+          ck_tile::index_t NumDimK,
+          ck_tile::index_t NumDTensor = 0>
+struct BatchedContractionKernelArgs


When I look at those kernel args, they seem to me a bit unclear. Comparing it to old CK definition of this op: https://github.com/ROCm/composable_kernel/blob/a0e48cb317ad8c3dfb9a188c44ba4ef8f1364cb3/include/ck/tensor_operation/gpu/device/device_batched_contraction_multiple_d.hpp CK Tile's one is unclear to me. What for are ..total values?

this implementation leverages the universal_gemm kernel.
The _total are used as a bridge between tensor contraction and GEMM.
These _total fields represent the products of dimensions within each group:

M_total = M0 × M1 × ... × M_{NumDimM-1}

N_total = N0 × N1 × ... × N_{NumDimN-1}

K_total = K0 × K1 × ... × K_{NumDimK-1}

G_total = G0 × G1 × ... × G_{NumDimG-1} (batch size)

include/ck_tile/ops/batched_contraction/kernel/batched_contraction_kernel.hpp

include/ck_tile/ops/batched_contraction.hpp

aosewski · 2025-09-26T14:38:53Z

include/ck_tile/ops/batched_contraction/kernel/batched_contraction_kernel.hpp

+          typename TilePartitioner_,
+          typename GemmPipeline_,
+          typename EpiloguePipeline_>
+struct BatchedContractionKernel


It would be good to have some at least short documentation of this operation. Like in old CK:

composable_kernel/include/ck/tensor_operation/gpu/device/device_batched_contraction_multiple_d.hpp

Lines 15 to 26 in a0e48cb

// Tensor Contraction:

// input : A

// input : B

// input : D0, D1, ...

// output : E

// C = a_op(A) * b_op(B)

// E = cde_op(C, D0, D1, ...)

// Assume:

// A[G0, G1, ..., M0, M1, M2, ..., K0, K1, K2, ...]

// B[G0, G1, ..., N0, N1, N2, ..., K0, K1, K2, ...]

// D[G0, G1, ..., M0, M1, M2, ..., N0, N1, N2, ...]

// E[G0, G1, ..., M0, M1, M2, ..., N0, N1, N2, ...]

Sure, I add them

msaffari-amd added 22 commits September 4, 2025 16:51

Initial commit. create batched_contraction_kernel file

d4bc3a9

initial problem definition

15ec347

implement initial example to launch kernel

7e731d4

add universal gemm to contraction. initial phase

a440c72

complete implementation for special case all Dims are 1 and no Ds

c9bf6f6

clean code

dbaacf1

initial changes to support multi dimensional G

1bdc69f

more progress in implementing multiple G

751f229

tmp commit

bc2fabe

manage dynamic NumDimG in kernel

bd70caf

improving example for multi M,N,K,G handling. start generalizing kern…

baf2dd5

…el. it is a temporary commit

implement the example for general Multi dimension G M N K and test di…

dc35e4f

…fferent reference calculation algorithms

2 functions for reference using multi dimensional and flat indexing

94f63bc

clean the code for muti dimentional G, M, N, K contraction and add so…

4782643

…me logs

Add Make descriptor function in kernel for merging Ms, Ns, Ks for A, …

ad71007

…B, E

some cleaning on kernel

1e65d79

clean the code for calculating the offsets from flatten batch number

9a1b0a3

Start adding MultiD support to kernel and example

fdea5b8

more changes to manage multi D in kernel and example

eaf377d

manage passing multi d to kernel and testing.

638d554

complete multi D support in kernel. modify example code to support it

a7e7156

Correct algorithm to calc the correct offset values for D tensor batc…

8384658

…hes and some code cleaning

msaffari-amd requested review from illsilin, carlushuang, qianfengz, aosewski, poyenc, geyyer, bartekxk and andriy-ca as code owners September 23, 2025 07:37

msaffari-amd requested review from afagaj, asleepzzz, tenpercent, ThomasNing, coderfeli, aska-0096, shumway and vidyasagar-amd as code owners September 23, 2025 07:37

msaffari-amd added 2 commits September 23, 2025 07:40

Minor fix

131f166

Merge branch 'develop' into LWPCK-3688-cktile-contraction-multi-D

5d83464

msaffari-amd force-pushed the LWPCK-3688-cktile-contraction-multi-D branch from 44f78c9 to 5d83464 Compare September 23, 2025 08:25

bartekxk changed the title ~~Lwpck 3688 cktile contraction multi d~~ [CK Tile] contraction multi d Sep 23, 2025

bartekxk requested a review from Copilot September 23, 2025 22:00

Copilot AI reviewed Sep 23, 2025

View reviewed changes

bartekxk reviewed Sep 23, 2025

View reviewed changes

Generalize example code for variable NumD tensors and apply cleanup b…

4f89943

…ased on review feedback

bartekxk previously approved these changes Sep 25, 2025

View reviewed changes

Merge branch 'develop' into LWPCK-3688-cktile-contraction-multi-D

a0e48cb

aosewski requested changes Sep 26, 2025

View reviewed changes

Refactored code and addressed review feedback

a3e0b29

msaffari-amd dismissed bartekxk’s stale review via a3e0b29 September 29, 2025 14:26

msaffari-amd requested a review from cgmillette as a code owner September 29, 2025 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK Tile] contraction multi d #2901

[CK Tile] contraction multi d #2901

msaffari-amd commented Sep 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aosewski Sep 26, 2025

Uh oh!

msaffari-amd Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

aosewski Sep 26, 2025

Uh oh!

msaffari-amd Sep 29, 2025

Uh oh!

Uh oh!

	// Tensor Contraction:
	// input : A
	// input : B
	// input : D0, D1, ...
	// output : E
	// C = a_op(A) * b_op(B)
	// E = cde_op(C, D0, D1, ...)
	// Assume:
	// A[G0, G1, ..., M0, M1, M2, ..., K0, K1, K2, ...]
	// B[G0, G1, ..., N0, N1, N2, ..., K0, K1, K2, ...]
	// D[G0, G1, ..., M0, M1, M2, ..., N0, N1, N2, ...]
	// E[G0, G1, ..., M0, M1, M2, ..., N0, N1, N2, ...]

[CK Tile] contraction multi d #2901

Are you sure you want to change the base?

[CK Tile] contraction multi d #2901

Conversation

msaffari-amd commented Sep 23, 2025

Proposed changes

Checklist

Discussion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aosewski Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

msaffari-amd Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aosewski Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

msaffari-amd Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!