Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Add off-policy'ness metric to new API stack. #48227

Conversation

sven1977
Copy link
Contributor

Add off-policy'ness metric to new API stack.

The old stack has a convenient metric measuring how many policy updates the sampling/behavior policy is behind the actually trained one. For algorithms like APPO and IMPALA, this metrics is crucial for understanding learning deficiencies and its values should be as close to 1 as possible (1 update behind).

The metric can be found in the result dict under (analogous to the old API stack's metric):
learners/[module_id]/diff_num_grad_updates_vs_sampler_policy

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Sorry, something went wrong.

wip
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) October 24, 2024 09:42
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 24, 2024
…off_policyness_metric_to_new_api_stack
wip
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@github-actions github-actions bot disabled auto-merge October 24, 2024 12:34
@sven1977 sven1977 enabled auto-merge (squash) October 24, 2024 13:38
@sven1977 sven1977 added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. rllib RLlib related issues rllib-newstack labels Oct 24, 2024
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -98,21 +98,6 @@
LR_KEY = "learning_rate"


@dataclass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally this thing is gone :)

@sven1977 sven1977 merged commit 2673bf3 into ray-project:master Oct 24, 2024
6 checks passed
@sven1977 sven1977 deleted the add_off_policyness_metric_to_new_api_stack branch October 24, 2024 17:54
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-newstack tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants