Skip to content

Conversation

hladush
Copy link
Contributor

@hladush hladush commented Oct 7, 2025

Background

There is a need to have an access to clickhouse

Implementation

Implemented clickhouse plugin

  • Supports types mapping

Test

Run on local machine with local clickhouse

@Copilot Copilot AI review requested due to automatic review settings October 7, 2025 22:24
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Initial implementation of a ClickHouse plugin for Heimdall to enable synchronous SQL query execution against ClickHouse databases with comprehensive type mapping support.

  • Adds complete ClickHouse plugin implementation with authentication and connection management
  • Implements comprehensive type mapping system for ClickHouse data types including nullable and low cardinality variants
  • Provides synchronous query execution with optional result collection

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
plugins/clickhouse/clickhouse.go Plugin entry point that creates new ClickHouse command handlers
plugins/clickhouse/README.md Comprehensive documentation with configuration examples and usage instructions
internal/pkg/object/command/clickhouse/clickhouse.go Main implementation handling connections, query execution, and result collection
internal/pkg/object/command/clickhouse/column_types.go Type mapping system for converting ClickHouse types to Go types
internal/pkg/object/command/clickhouse/dummy_rows.go Mock implementation for non-result queries
go.mod Dependency additions for ClickHouse driver and related libraries
README.md Updated plugin table to include ClickHouse

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

)

type commandContext struct {
Username string `yaml:"username,omitempty" json:"username,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Username and password is stored on the command level because different teams eventually might have different access permitions and it should be managed on command level not on the cluster level.
Please share your thoughts

Copy link
Contributor

@sanketjadhavSF sanketjadhavSF Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my thought is ideally it should be a connection string in cluster level something like this => clickhouse://username:password@host:port/database?ssl=true and then at command level we control different access to different teams using the RBAC roles.
WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts that have a second line of defense is always nice, specially when companies don't have RBAC.

@hladush hladush force-pushed the ihladush/add_clickhouse branch from ee65e7e to 868cb18 Compare October 7, 2025 22:26

func (exc *executionContext) exec(ctx context.Context) (driver.Rows, error) {
if exc.returnResult {
return exc.conn.Query(ctx, exc.query, exc.params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need any validation to allow only single query when returnResult is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand clickhouse doesn't support multiquery execution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! in that case the Query method itself does not allow mutiquery statements.

if exc.returnResult {
return exc.conn.Query(ctx, exc.query, exc.params)
}
return dummyRowsInstance, exc.conn.Exec(ctx, exc.query, exc.params)
Copy link
Contributor

@sanketjadhavSF sanketjadhavSF Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need to support Async multi query executions through .sql files if anyone wants to run transformations inside clickhouse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate and add more context to this idea, I'm not sure that I'm following

}
}

conn, err := clickhouse.Open(&clickhouse.Options{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from the basic username and password the user should be able to override or control the connection settings from either command or cluster level as it supports only support single query execution at a time.

Reference for enabling settings in connection open() method: https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/connect_settings.go

@@ -0,0 +1,139 @@
# 🗃️ ClickHouse Plugin

The **ClickHouse Plugin** enables Heimdall to execute SQL queries on configured ClickHouse clusters. It connects directly to ClickHouse instances and executes queries synchronously, with support for parameterized queries and optional result collection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synchronously this should depend on the command configuration. we'll configure our command as sync, but nothing prevents it to be async, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

Comment on lines +19 to +32
```yaml
- name: clickhouse-analytics
status: active
plugin: clickhouse
version: 24.8
description: Execute ClickHouse queries
context:
username: analytics_user
password: secure_password
tags:
- type:clickhouse
cluster_tags:
- type:clickhouse
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commands are async by default. If we wan it ot be sync, we'll need to set the property.

Comment on lines 128 to 138
{
"columns": [
{"name": "user_id", "type": "UInt64"},
{"name": "events", "type": "UInt64"}
],
"data": [
[12345, 156],
[67890, 203],
[11111, 89]
]
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the result we should support only these types: var ( primitiveTypes = set.New([]string{ `boolean`, `int`, `long`, `float`, `double`, `bytes`, `string`, }) )

Comment on lines 35 to 40
type executionContext struct {
query string
params map[string]any
returnResult bool
conn driver.Conn
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason conn cannot be in jobContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be in the jobContext.
What I'm trying to achieve is separation between heimdall configs(task, cluster) user's input and all data which plugin need for execution. Data which we use for execution in the separate object executionContext as a result everywhere when we need something for execution we pass only 1 object instead of 3.
In the nearest future we'll add another parameter which is needed for execution is username eventually it will be userrole etc.
My assumption that passing only 1 object instead of 3 it's simplify code maintains.

@hladush hladush force-pushed the ihladush/add_clickhouse branch from 3ffa965 to 394dc33 Compare October 21, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants