Skip to content

Conversation

@edoakes
Copy link
Collaborator

@edoakes edoakes commented Oct 28, 2025

test_gcs_fault_tolerance.py:: test_worker_raylet_resubscription is still flaky in CI despite bumping up the timeout. I found it hard to read the system logs from the test, so cleaned up the relevant areas a bit. Also cleaned up the test logic and increased the timeout further to 20s. If it's still flaky after this, it's likely that the underlying issue is a real bug.

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes requested a review from a team as a code owner October 28, 2025 15:14
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Oct 28, 2025
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes changed the title [core] Clean up test_raylet_resubscribe_worker_death and relevant Raylet logs [core] Clean up test_raylet_resubscribe_to_worker_death and relevant Raylet logs Oct 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the test_raylet_resubscribe_worker_death test by making it more robust and easier to understand. The changes, including adding explicit checks for OwnerDiedError and ensuring GCS is responsive after a restart, are excellent for test stability. Additionally, the cleanup of log messages across various Raylet components enhances clarity and moves towards more structured logging, which is beneficial for debugging. I've found one minor typo in a log message, but otherwise, the changes are solid.

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
: io_service_(io_service) {}

PeriodicalRunner::~PeriodicalRunner() {
RAY_LOG(DEBUG) << "PeriodicalRunner is destructed";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessarily noisy when debug logs were on

if (worker) {
// The client is a worker.
std::shared_ptr<WorkerInterface> worker;
if ((worker = worker_pool_.GetRegisteredWorker(client))) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dayshah is this frowned upon by c++ enjoyers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, i kinda like the if value syntax, scopes it so it gets destroyed earlier + can't be accessed outside

@ray-gardener ray-gardener bot added the core Issues that should be addressed in Ray Core label Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants