-
Notifications
You must be signed in to change notification settings - Fork 6
initial implementation for clickhouse #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Initial implementation of a ClickHouse plugin for Heimdall to enable synchronous SQL query execution against ClickHouse databases with comprehensive type mapping support.
- Adds complete ClickHouse plugin implementation with authentication and connection management
- Implements comprehensive type mapping system for ClickHouse data types including nullable and low cardinality variants
- Provides synchronous query execution with optional result collection
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
plugins/clickhouse/clickhouse.go | Plugin entry point that creates new ClickHouse command handlers |
plugins/clickhouse/README.md | Comprehensive documentation with configuration examples and usage instructions |
internal/pkg/object/command/clickhouse/clickhouse.go | Main implementation handling connections, query execution, and result collection |
internal/pkg/object/command/clickhouse/column_types.go | Type mapping system for converting ClickHouse types to Go types |
internal/pkg/object/command/clickhouse/dummy_rows.go | Mock implementation for non-result queries |
go.mod | Dependency additions for ClickHouse driver and related libraries |
README.md | Updated plugin table to include ClickHouse |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
) | ||
|
||
type commandContext struct { | ||
Username string `yaml:"username,omitempty" json:"username,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Username and password is stored on the command level because different teams eventually might have different access permitions and it should be managed on command level not on the cluster level.
Please share your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my thought is ideally it should be a connection string in cluster level something like this => clickhouse://username:password@host:port/database?ssl=true
and then at command level we control different access to different teams using the RBAC roles.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts that have a second line of defense is always nice, specially when companies don't have RBAC.
ee65e7e
to
868cb18
Compare
|
||
func (exc *executionContext) exec(ctx context.Context) (driver.Rows, error) { | ||
if exc.returnResult { | ||
return exc.conn.Query(ctx, exc.query, exc.params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need any validation to allow only single query when returnResult is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand clickhouse doesn't support multiquery execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! in that case the Query method itself does not allow mutiquery statements.
if exc.returnResult { | ||
return exc.conn.Query(ctx, exc.query, exc.params) | ||
} | ||
return dummyRowsInstance, exc.conn.Exec(ctx, exc.query, exc.params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need to support Async multi query executions through .sql files if anyone wants to run transformations inside clickhouse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate and add more context to this idea, I'm not sure that I'm following
} | ||
} | ||
|
||
conn, err := clickhouse.Open(&clickhouse.Options{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from the basic username and password the user should be able to override or control the connection settings from either command or cluster level as it supports only support single query execution at a time.
Reference for enabling settings in connection open() method: https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/connect_settings.go
plugins/clickhouse/README.md
Outdated
@@ -0,0 +1,139 @@ | |||
# 🗃️ ClickHouse Plugin | |||
|
|||
The **ClickHouse Plugin** enables Heimdall to execute SQL queries on configured ClickHouse clusters. It connects directly to ClickHouse instances and executes queries synchronously, with support for parameterized queries and optional result collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synchronously
this should depend on the command configuration. we'll configure our command as sync, but nothing prevents it to be async, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
```yaml | ||
- name: clickhouse-analytics | ||
status: active | ||
plugin: clickhouse | ||
version: 24.8 | ||
description: Execute ClickHouse queries | ||
context: | ||
username: analytics_user | ||
password: secure_password | ||
tags: | ||
- type:clickhouse | ||
cluster_tags: | ||
- type:clickhouse | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commands are async by default. If we wan it ot be sync, we'll need to set the property.
{ | ||
"columns": [ | ||
{"name": "user_id", "type": "UInt64"}, | ||
{"name": "events", "type": "UInt64"} | ||
], | ||
"data": [ | ||
[12345, 156], | ||
[67890, 203], | ||
[11111, 89] | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the result we should support only these types: var ( primitiveTypes = set.New([]string{ `boolean`, `int`, `long`, `float`, `double`, `bytes`, `string`, }) )
type executionContext struct { | ||
query string | ||
params map[string]any | ||
returnResult bool | ||
conn driver.Conn | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason conn
cannot be in jobContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be in the jobContext.
What I'm trying to achieve is separation between heimdall configs(task, cluster) user's input and all data which plugin need for execution. Data which we use for execution in the separate object executionContext as a result everywhere when we need something for execution we pass only 1 object instead of 3.
In the nearest future we'll add another parameter which is needed for execution is username eventually it will be userrole etc.
My assumption that passing only 1 object instead of 3 it's simplify code maintains.
3ffa965
to
394dc33
Compare
Background
There is a need to have an access to clickhouse
Implementation
Implemented clickhouse plugin
Test
Run on local machine with local clickhouse