-
Notifications
You must be signed in to change notification settings - Fork 2.8k
WIP: automatically enforce release-blocking job runtime policy #35336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenTheElder The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hmm, I suppose we want to intentionally relax that one for the older release branches ... probably worth simply dropping that commit and focusing on the timeouts for now. |
Actually, we can restrict the interval check to non-release branches pretty easily and start there. |
133ae50
to
73ffdad
Compare
there are a few more but they're all in |
You can see that they run faster than this here: https://prow.k8s.io/?job=ci-crio-cgroupv1-node-e2e-conformance https://prow.k8s.io/?job=ci-crio-cgroupv2-node-e2e-conformance
…-line with release-blocking policy
these are taking <1h, closer to 30m
this job is taking <1h, closer to 30m
7a9447d
to
7f7998a
Compare
@BenTheElder: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
remaining jobs:
These are all from the releng generated jobs #32340 |
decorate: true | ||
decoration_config: | ||
timeout: 240m | ||
timeout: 1h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why this one got 1h? the one above has 2h and that's within limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit messages mention this for each change, we don't want to set high timeouts when we don't need to, these jobs are running well within 1h. If they start to take longer, that's a red flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only set them to the maximum to avoid regressing them. In general these jobs tend to have excessively high timeouts, which will mask failure modes. If a job suddenly goes from <1h to 2h that's almost certainly an excessive retry/timeout that will lead to failure or a massive regression.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
prevent release blocking jobs from setting timeouts that obviously violate the release blocking jobs policy, which says:
IE they should nominally take two hours to run and they should not exceed 3h in scheduling frequency.
I'm not attempting crons yet.
For now, 2h30m seems like a reasonable middle ground max timeout (should usually take 2h, and should run every 3h).
For max interval the policy is clear: <= 3h.
We have a lot of jobs failing these two checks.