Fix/update controller is_stuck() #1891

enyst · 2024-05-18T19:59:04Z

Fix the stuck check in the controller:

increase to 4 historical tuples, but prevent 4 x (any action, any observation)
add detection for a pattern where every other tuple is the same (e.g. -1, -3, -5 are the same, -2, -4, -6 are the same) - source https://opendevin.slack.com/archives/C06U8UTKSAD/p1715691234611019
add filtering on history, to ignore MessageAction with source=USER, and still detect the rest.

Please note:

I think CmdOutputObservation and maybe the KillAction should ignore the command_id for the purpose of this check. I made an override of eq for the first, but I reverted from this PR to give it some thought. Any feedback is appreciated!

remove now unused method

This reverts commit 76b4b76.

…ctual command only

…re the actual command only" This reverts commit 6418d85.

xingyaoww

LGTM!

I think CmdOutputObservation and maybe the KillAction should ignore the command_id for the purpose of this check. I made an override of eq for the first, but I reverted from this PR to give it some thought. Any feedback is appreciated!

I agree on this point -- we are only interested in comparing the "content" of the observation, but not something else. Happy to hear what other people think

li-boxuan · 2024-05-21T07:44:44Z

I think it would be interesting to run this PR against ./evaluation/swe_bench/scripts/run_infer.sh [llm-config-group] CodeActAgent 1. This command will run the 1st task in swe-bench-lite. This first task exhausted all 50 turns (and cost me ~$6) but seems OpenDevin was just repeating itself and wasting money...

FYI log is here:
instance_django__django-15202.log
lol damn this log contains my API key... we should definitely fix that. Anyways, I have revoked my key so don't worry about that 🤣

rbren · 2024-05-21T13:46:38Z

tests/unit/test_is_stuck.py

@@ -0,0 +1,224 @@
+from unittest.mock import Mock, patch


I was about to say "it'd be great to get some unit tests in here" and then I scrolled down a bit 😄

Thanks for this!

enyst added 9 commits May 17, 2024 18:02

Refactor monologue to use the messages in state history

76b4b76

remove now unused method

is_stuck update

a8634e3

Merge branch 'main' into is-stuck

2b2e196

fix is_stuck

5291bfa

unit tests

6b2774c

fix tests

eb6f68a

Revert "Refactor monologue to use the messages in state history"

6d1963d

This reverts commit 76b4b76.

Override eq for CmdOutputObservation to ignore the pid, compare the a…

6418d85

…ctual command only

Revert "Override eq for CmdOutputObservation to ignore the pid, compa…

9f56c93

…re the actual command only" This reverts commit 6418d85.

xingyaoww approved these changes May 21, 2024

View reviewed changes

li-boxuan mentioned this pull request May 21, 2024

Conceal API key when serializing config #1936

Closed

rbren reviewed May 21, 2024

View reviewed changes

rbren approved these changes May 21, 2024

View reviewed changes

xingyaoww merged commit 1e51bb9 into OpenDevin:main May 21, 2024
25 checks passed

enyst deleted the is-stuck branch May 21, 2024 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/update controller is_stuck() #1891

Fix/update controller is_stuck() #1891

enyst commented May 18, 2024 •

edited

xingyaoww left a comment •

edited

li-boxuan commented May 21, 2024 •

edited

rbren May 21, 2024

Fix/update controller is_stuck() #1891

Fix/update controller is_stuck() #1891

Conversation

enyst commented May 18, 2024 • edited

xingyaoww left a comment • edited

Choose a reason for hiding this comment

li-boxuan commented May 21, 2024 • edited

rbren May 21, 2024

Choose a reason for hiding this comment

enyst commented May 18, 2024 •

edited

xingyaoww left a comment •

edited

li-boxuan commented May 21, 2024 •

edited