Skip to content

Conversation

poornas
Copy link
Contributor

@poornas poornas commented Aug 26, 2025

This is for bucket level Quality of service feature

@poornas poornas force-pushed the bqos3 branch 2 times, most recently from 5e80643 to 5b5337e Compare August 26, 2025 01:33
@harshavardhana
Copy link
Member

PTAL @klauspost need your feedback here.

@klauspost
Copy link
Contributor

@poornas Why is this in minio-go? This isn't S3 API, correct? I assume this is part of a bigger change since bucket-level QOS aren't that useful without a node-QoS.

@harshavardhana
Copy link
Member

@poornas Why is this in minio-go? This isn't S3 API, correct? I assume this is part of a bigger change since bucket-level QOS aren't that useful without a node-QoS.

This is bucket level settings and it's implemented at the same level as S3 API.

Just like bucket inventory APIs that are also not AWS S3 specific we have extended it extensively.

@klauspost
Copy link
Contributor

Sounds much like an admin API to me. But whatever.

Is there any docs on the YAML? This doesn't tell too much - maybe it should be here as well?

@harshavardhana
Copy link
Member

Sounds much like an admin API to me. But whatever.

Is there any docs on the YAML? This doesn't tell too much - maybe it should be here as well?

@poornas ^^

@poornas
Copy link
Contributor Author

poornas commented Sep 17, 2025

@klauspost , added the yaml docs to aistor PR - didn't expect claude to generate such comprehensive docs on the feature

@poornas
Copy link
Contributor Author

poornas commented Sep 18, 2025

@vadmeste @klauspost , PTAL

Comment on lines +140 to +148
type QOSMetric struct {
APIName string `json:"apiName"`
Rule QOSRule `json:"rule"`
Totals CounterMetric `json:"totals"`
Throttled CounterMetric `json:"throttleCount"`
ExceededRateLimit CounterMetric `json:"exceededRateLimitCount"`
ClientDisconnCount CounterMetric `json:"clientDisconnectCount"`
ReqTimeoutCount CounterMetric `json:"reqTimeoutCount"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in terms of overall design, I am still a bit unsure how this ties into the total system.

Rates can only realistically be applied per node. So are these numbers divided by the node count? And each bucket has it's own settings.

So given a servers capacity 'n', for this to be effective it will be divided by the node count and divided with the bucket count.

So each bucket would end up with a very small req/s... So I feel like I'm probably missing the bigger picture. Is there per node QoS setting as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design was initially to rate limit at node level - the way you are describing. However, QOS was defined more like a template and depending on the number of buckets the rate limits set would be arbitrary and not easy for users to decide sane limits.

The scope has since been changed to make QOS bucket centric, - it would be more useful to throttle specific API’s based on workload seen in bucket specific metrics.

The QOS config can be configured on a per bucket and as-needed basis . A rule limiting PutObject API to x concurrent requests for a bucket would imply total limit is x * n (number of nodes). Bucket level QOS will allow taxing callers/API's that are known to be problematic to system performance based off metrics rather than admins setting a limit at node level that may be harder to control

drawback of bucket QOS is that overall limits is harder to infer - prometheus metrics and qos status will likely help with this

@poornas poornas requested a review from klauspost September 22, 2025 16:29
Copy link
Contributor

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not doing to leave this hanging, when someone clearly thinks this is a good idea.

@poornas
Copy link
Contributor Author

poornas commented Sep 24, 2025

can this be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants