Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/update controller is_stuck() #1891

Merged
merged 9 commits into from
May 21, 2024
Merged

Fix/update controller is_stuck() #1891

merged 9 commits into from
May 21, 2024

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented May 18, 2024

Fix the stuck check in the controller:

  • increase to 4 historical tuples, but prevent 4 x (any action, any observation)
  • add detection for a pattern where every other tuple is the same (e.g. -1, -3, -5 are the same, -2, -4, -6 are the same) - source https://opendevin.slack.com/archives/C06U8UTKSAD/p1715691234611019
  • add filtering on history, to ignore MessageAction with source=USER, and still detect the rest.

Please note:

  • I think CmdOutputObservation and maybe the KillAction should ignore the command_id for the purpose of this check. I made an override of eq for the first, but I reverted from this PR to give it some thought. Any feedback is appreciated!

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I think CmdOutputObservation and maybe the KillAction should ignore the command_id for the purpose of this check. I made an override of eq for the first, but I reverted from this PR to give it some thought. Any feedback is appreciated!

I agree on this point -- we are only interested in comparing the "content" of the observation, but not something else. Happy to hear what other people think

@li-boxuan
Copy link
Collaborator

li-boxuan commented May 21, 2024

I think it would be interesting to run this PR against ./evaluation/swe_bench/scripts/run_infer.sh [llm-config-group] CodeActAgent 1. This command will run the 1st task in swe-bench-lite. This first task exhausted all 50 turns (and cost me ~$6) but seems OpenDevin was just repeating itself and wasting money...

FYI log is here:
instance_django__django-15202.log
lol damn this log contains my API key... we should definitely fix that. Anyways, I have revoked my key so don't worry about that 🤣

@@ -0,0 +1,224 @@
from unittest.mock import Mock, patch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to say "it'd be great to get some unit tests in here" and then I scrolled down a bit 😄

Thanks for this!

@xingyaoww xingyaoww merged commit 1e51bb9 into OpenDevin:main May 21, 2024
25 checks passed
@enyst enyst deleted the is-stuck branch May 21, 2024 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants