Skip to content

Conversation

@YoussefEssDS
Copy link
Contributor

Description

This PR will:
1- Introduce GroupedData.with_column, allowing grouped datasets to evaluate Ray Data expressions per group while preserving existing columns.
2- Validate the supplied expression type (reject non‑Expr and DownloadExpr since the expression evaluator can’t visit downloads as far as I understand) and reuse the projection engine so grouping flows stay aligned with the dataset-level expression API.
3- Add tests for grouped expression usage through udf-based and arithmetic expressions.

Related issues

Closes #57907

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
@YoussefEssDS YoussefEssDS requested a review from a team as a code owner October 28, 2025 01:20
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces GroupedData.with_column, which allows applying expressions to grouped data. The implementation is clean, reusing existing components like map_groups and eval_projection. The added tests cover both UDF-based and arithmetic expressions, ensuring the new feature is well-tested. My feedback includes a minor suggestion to improve code clarity in the tests by removing a redundant dataset creation.

Comment on lines 152 to 159
ds = ray.data.from_items(
[
{"group": 1, "value": 1},
{"group": 1, "value": 2},
{"group": 2, "value": 3},
{"group": 2, "value": 4},
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve code clarity and avoid redundancy, you can remove this duplicate dataset creation. The ds variable is already defined with the same data earlier in the test and can be reused for the second assertion.

@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Oct 28, 2025
Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
@YoussefEssDS
Copy link
Contributor Author

Hi @alexeykudinkin do you have any further suggestions? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ray Data] Support expressions on GroupedData.map_groups

1 participant