-
Couldn't load subscription status.
- Fork 6.8k
[core] Clean up test_raylet_resubscribe_to_worker_death and relevant Raylet logs
#58244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
test_raylet_resubscribe_worker_death and relevant Raylet logstest_raylet_resubscribe_to_worker_death and relevant Raylet logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves the test_raylet_resubscribe_worker_death test by making it more robust and easier to understand. The changes, including adding explicit checks for OwnerDiedError and ensuring GCS is responsive after a restart, are excellent for test stability. Additionally, the cleanup of log messages across various Raylet components enhances clarity and moves towards more structured logging, which is beneficial for debugging. I've found one minor typo in a log message, but otherwise, the changes are solid.
| : io_service_(io_service) {} | ||
|
|
||
| PeriodicalRunner::~PeriodicalRunner() { | ||
| RAY_LOG(DEBUG) << "PeriodicalRunner is destructed"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessarily noisy when debug logs were on
| if (worker) { | ||
| // The client is a worker. | ||
| std::shared_ptr<WorkerInterface> worker; | ||
| if ((worker = worker_pool_.GetRegisteredWorker(client))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dayshah is this frowned upon by c++ enjoyers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, i kinda like the if value syntax, scopes it so it gets destroyed earlier + can't be accessed outside
test_gcs_fault_tolerance.py:: test_worker_raylet_resubscriptionis still flaky in CI despite bumping up the timeout. I found it hard to read the system logs from the test, so cleaned up the relevant areas a bit. Also cleaned up the test logic and increased the timeout further to 20s. If it's still flaky after this, it's likely that the underlying issue is a real bug.