Skip to content

Conversation

xuyehe
Copy link

@xuyehe xuyehe commented Aug 23, 2022

Why are these changes needed?

To integrate Redis into aggregator for saving aggregation data.

Related issue number

N/A

Checks

  • I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/
  • I've made sure the following tests are passing.
  • Testing Configurations
    • Dry Run (20 training rounds & 1 evaluation round)
    • Cifar 10 (20 training rounds & 1 evaluation round)
    • Femnist (20 training rounds & 1 evaluation round)

Note:

  1. All tests are run on cpu.
  2. Met with the following KeyError bug on the same line in Femnist once with Redis, and also once without Redis (i.e. original code). Sample error output:
File "/users/xuyehe/FedScale-rd/fedscale/core/aggregation/aggregator.py", line 386, in client_completion_handler
    duration=self.virtual_client_clock[results['clientId']]['computation'] +
KeyError: 665

@fanlai0990 fanlai0990 changed the title Redis Redis Support for FedScale Aug 24, 2022
docker/driver.py Outdated
print_help: bool = False
if len(sys.argv) > 1:
if sys.argv[1] == 'submit' or sys.argv[1] == 'start':
redis_exec = '/usr/bin/redis-server'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the hardcoded path? Instead, use some like os.environ["REDIS_PATH"]

- main
dependencies:
- python=3.7
- tensorboard
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you mentioned that some tf version is not compatible. Can you please update tf version here too?


# ======== redis client ========
# <host, port, ...> should be specified in args
host = "127.0.0.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove any hard code in the future, using args.xxx instead.

@fanlai0990
Copy link
Collaborator

Thanks a lot for your contribution! Let's try to run an e2e evaluation to validate the learning curve and Redis overhead first.

@fanlai0990 fanlai0990 mentioned this pull request Aug 29, 2022
5 tasks
@fanlai0990
Copy link
Collaborator

fanlai0990 commented Oct 12, 2022

Thanks for your contribution. Can you please

  • remove all prints or replace them with logging.info
  • summarize the results of the overhead analysis

We really appreciate your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants