Skip to content

Conversation

dargudear-google
Copy link
Contributor

Remove the rotation controller and rely exclusively on RequiresRepublish for secret rotation.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 1, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @dargudear-google. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 1, 2024
@dargudear-google dargudear-google changed the title Use in secret rotation Use RequiresRepublish in secret rotation. Sep 7, 2024
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 7, 2024
@dargudear-google dargudear-google marked this pull request as ready for review September 10, 2024 13:30
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 10, 2024
@k8s-ci-robot k8s-ci-robot requested a review from ritazh September 10, 2024 13:30
Copy link

@amitmodak amitmodak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach broadly LGTM! IIUC, once this PR is submitted, auto-rotation won't work for k8s clusters < 1.21. What is the process to announce and manage this breaking change?

@dargudear-google
Copy link
Contributor Author

Approach broadly LGTM! IIUC, once this PR is submitted, auto-rotation won't work for k8s clusters < 1.21. What is the process to announce and manage this breaking change?

Discussed in the last community call that we should publish that "For clusters < 1.21, please use v.1.4.5 or earlier."

@codecov-commenter
Copy link

codecov-commenter commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 32.55814% with 58 lines in your changes missing coverage. Please review.

Project coverage is 32.07%. Comparing base (87f51ec) to head (76e6600).
Report is 195 commits behind head on main.

Files with missing lines Patch % Lines
...rollers/secretproviderclasspodstatus_controller.go 13.15% 31 Missing and 2 partials ⚠️
pkg/secrets-store/nodeserver.go 58.06% 10 Missing and 3 partials ⚠️
pkg/secrets-store/secrets-store.go 20.00% 8 Missing ⚠️
pkg/secrets-store/utils.go 50.00% 2 Missing and 1 partial ⚠️
cmd/secrets-store-csi-driver/main.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1622      +/-   ##
==========================================
- Coverage   35.83%   32.07%   -3.77%     
==========================================
  Files          63       57       -6     
  Lines        3759     3838      +79     
==========================================
- Hits         1347     1231     -116     
- Misses       2268     2501     +233     
+ Partials      144      106      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nilekhc
Copy link
Contributor

nilekhc commented Oct 3, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 3, 2024
@dargudear-google dargudear-google force-pushed the rotation branch 7 times, most recently from aa6ffcb to 8d332b7 Compare October 7, 2024 09:20
Copy link
Contributor

@nilekhc nilekhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass.

In bats tests, where sleep duration/timeout is increased, could you re-evaluate if it's necessary? If yes then could you add a note to make sure we have context?

@enj enj added this to SIG Auth Apr 29, 2025
@aramase aramase moved this from Subprojects - Needs Triage to Changes Requested in SIG Auth Apr 29, 2025
@dargudear-google dargudear-google requested a review from aramase May 14, 2025 18:29
@dargudear-google
Copy link
Contributor Author

/retest-required

@ThirdEyeSqueegee
Copy link

Hi @aramase, are y'all still expecting to include this and #1755 in the next release? We'd love to see both these features make it into the driver to support our use case.

@aramase
Copy link
Member

aramase commented Jun 11, 2025

Hi @aramase, are y'all still expecting to include this and #1755 in the next release? We'd love to see both these features make it into the driver to support our use case.

Yes, this will need to be merged before we can merge #1755. We want to include both changes in the next minor release. I still need to review the recent updates. I'm currently in the middle of the Kubernetes v1.34 enhancements freeze but will take a look soon.

Copy link
Member

@micahhausler micahhausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my main suggestion would be to read the CSIDriver into a local cache and have NodePubilshVolume() read the requiresRepublish value from that local cache. That way users can avoid a CSI Driver restart if they change the value

reader client.Reader
providerClients *PluginClientBuilder
tokenClient *k8s.TokenClient
rotationConfig *rotationConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of instead of having a rotationConfig that isn't thread-safe or configurable, using a (wrapped) K8s client to dynamically read this value off the CSIDriver?

Right now, if a user changes requiresRepublish on their driver configuration, they'll have to restart every CSI driver pod to update if this is enabled, because you're setting this value once in main.go.

In my PR I created a CSIDriver client that would just watch the one CSIDriver resource, and NodePublishVolume() would read from the local cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we're doing a filtered watch on the specific driver name, it would be fine.

However, this does mean we will now have to ask users to set and unset requiresRepublish on the CSI driver object to enable/disable rotation unlike the current approach of setting the flag on the driver. This will be a breaking change.

How often is the value going to be flipped between true/false? Would a single restart of the driver to change from no rotation to rotation enabled be cumbersome?

Copy link
Member

@micahhausler micahhausler Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we're doing a filtered watch on the specific driver name, it would be fine.

👍 yep thats the plan

However, this does mean we will now have to ask users to set and unset requiresRepublish on the CSI driver object to enable/disable rotation unlike the current approach of setting the flag on the driver. This will be a breaking change.

We could decide if we want to make this a "must be enabled on both CSIDriver object and driver flag" or just "fully controlled on CSIDriver object" to be enabled.

I'd advocate toward the latter, as the --enable-secret-rotation flag help text only claims to be alpha. That is kind of a breaking change, but for an alpha feature, which seems within reason.

enableSecretRotation    = flag.Bool(
    "enable-secret-rotation", 
    false, 
    "Enable secret rotation feature [alpha]")

How often is the value going to be flipped between true/false? Would a single restart of the driver to change from no rotation to rotation enabled be cumbersome?

Well, probably no one or very few users who have the driver installed today has requiresRepublish set, so they're all going to have to flip it. If they don't flip the value before upgrading the driver, they'll have to do a driver restart. Also if we made it "must be enabled on driver flag and CSIDriver object", you're bound to get some users trying to make it work by flipping the CSIDriver object flag expecting it to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what are we doing here is setting the requiresRepublish to true, but that doesn't start the rotation unless customer set enable-secret-rotation. For the customers, behaviour won't change. It will just change how we implement the rotation.

Copy link
Member

@aramase aramase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pass.

reader client.Reader
providerClients *PluginClientBuilder
tokenClient *k8s.TokenClient
rotationConfig *rotationConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we're doing a filtered watch on the specific driver name, it would be fine.

However, this does mean we will now have to ask users to set and unset requiresRepublish on the CSI driver object to enable/disable rotation unlike the current approach of setting the flag on the driver. This will be a breaking change.

How often is the value going to be flipped between true/false? Would a single restart of the driver to change from no rotation to rotation enabled be cumbersome?

@dargudear-google dargudear-google force-pushed the rotation branch 2 times, most recently from 6bd5eaf to 2e0ab2f Compare August 13, 2025 14:22
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 21, 2025
@micahhausler
Copy link
Member

@dargudear-google is this PR still active? I see it is out of date with no changes in a month

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2025
@aramase
Copy link
Member

aramase commented Sep 30, 2025

@dargudear-google could you rebase the PR when you get a chance? I'll do another round of review this week for the metrics + rotation changes and if all looks good, we'll plan this for the v1.6.0 release along with other feature PRs.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2025
@k8s-ci-robot
Copy link
Contributor

@dargudear-google: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
release-secrets-store-csi-driver-e2e-azure 50ac633 link true /test release-secrets-store-csi-driver-e2e-azure
pull-secrets-store-csi-driver-e2e-azure b1766e0 link true /test pull-secrets-store-csi-driver-e2e-azure
pull-secrets-store-csi-driver-e2e-windows b1766e0 link true /test pull-secrets-store-csi-driver-e2e-windows

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dargudear-google
Copy link
Contributor Author

@dargudear-google could you rebase the PR when you get a chance? I'll do another round of review this week for the metrics + rotation changes and if all looks good, we'll plan this for the v1.6.0 release along with other feature PRs.

Done

@dargudear-google
Copy link
Contributor Author

@dargudear-google is this PR still active? I see it is out of date with no changes in a month

Yes, rebased and resolved the issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.