Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cron job to close stale PRs #93

Closed
brettcannon opened this issue May 18, 2017 · 36 comments
Closed

Cron job to close stale PRs #93

brettcannon opened this issue May 18, 2017 · 36 comments

Comments

@brettcannon
Copy link
Member

I'm classifying "stale" as someone who created a PR but has not signed the CLA or someone who received a code review requesting changes and and never followed up with more commits or review comments. These two scenarios specifically make sure that the PR submitter is the hold up and not any core dev(s). (A PR that can't be merged probably shouldn't be classified as such since they shouldn't have to perpetually maintain a PR if they don't get a review from a core dev).

It would probably be nice to receive a warning that a PR will be closed within a week, but re-opening a PR I don't think is hard either so it should be a big deal to just close either if it's too much work to give the warning (that would need to be verified, though, that a PR submitter and re-open their own PR before making a decision about this).

@matrixise
Copy link
Member

+1. on today, there are 17 open PR where the CLA is not signed.

@souravsingh
Copy link

@Mariatta I am interested in working on this.

@Mariatta
Copy link
Member

Mariatta commented Oct 4, 2017

@souravsingh Sure! I think this is not a straight forward task, and it has some fun challenges :)

My initial thoughts about this:

A PR is considered "stale" if:

  • the last activity on it has a timestamp older than X days (1 month? 2 months? 3 months?)

and

  • it has CLA not signed label, or
  • it has awaiting changes label

The workflow:

  1. bedevere applies the stale label to PRs with above criteria
  2. bedevere leaves a comment saying that this PR will be closed unless changes are made.
  3. if a PR has the stale label and there has been no change 7 days after the label has been applied, then bedevere will just close the PR. Not sure how to detect this. Perhaps by retrieving all the comments from the PR, and see the timestamp when the comment from step 2 was made?

Side task, I think it can be a separate task. It will be a webhook instead of a cron.

  • Remove the stale label when awaiting changes or CLA not signed label has been removed.

I think the code can be added to bedevere, so we don't need to create any new bot.

@brettcannon
Copy link
Member Author

My one worry with adding this to Bedevere is GitHub quota. Going through 453 issues to find out details about them is going to eat through quota rather quickly and might be too much with Bedevere's current quota usage.

@Mariatta
Copy link
Member

Mariatta commented Oct 5, 2017

Hm, the rate limit is something I haven't considered.
If we were to create a new app/bot for this, will it also be at risk of hitting the rate limit?

@brettcannon
Copy link
Member Author

Quite possibly. We might not know until we do the queries and see what the count comes out to. This might require smearing the work across the day, e.g. start at midnight, find out how many pages of issues there are, and then do a page per hour (where hopefully there are less than 24 pages 😉 ).

@Mariatta
Copy link
Member

Mariatta commented Oct 6, 2017

For the first task of applying the stale label, (is stale the right term?), the bot won't be going through all open PRs, only the ones with CLA not signed label (2 pages) or awaiting changes (1 page) labels.

The API calls will be:
https://api.github.com/repos/python/cpython/issues?labels=CLA%20not%20signed&sort=updated&direction=asc&state=open
and
https://api.github.com/repos/python/cpython/issues?labels=awaiting%20changes&sort=updated&direction=asc&state=open

@brettcannon
Copy link
Member Author

Nice catch on the GET arguments! So it looks like 40 issues in total ATM are either labeled "CLA not signed" or "awaiting changes". But the problem is that out of 19 pages worth of issues, only the first 5 have any "awaiting" label really applied to them. So that means the 17 issues that are awaiting changes could be extrapolated to be about 68 (although possibly more since those PRs are probably/hopefully languishing for a reason), which puts it at 90 issues or more (aside: maybe we should write a script that we run once that goes through and does a back-fill of all issues without an "awaiting" label? We could have a custom message when we set "awaiting changes" saying "we're totally guessing here, so please say 'I didn't expect the Spanish Inquisition' if we're wrong" or something).

So let's worst-case it to 200 issues. If a PR is now considered stale/inactive, then there will be a comment and an added label (so 2 requests). If a PR has passed it's one week grace period, then it will be a comment and closure (2 requests). Add in pagination on the issue list and we're looking at worst-case 400+ requests out of our (I believe) 5000 request quota, so we're approaching 10%.

I have opened python/bedevere#64 so that we can see what kind of margin we currently have so we don't have to keep guessing.

chetankm-cs pushed a commit to chetankm-cs/bedevere that referenced this issue Oct 8, 2017
@chetankm-cs
Copy link

Hi,
I am working on this, I have added the code for labelling stale pr.
I need help on setting a cron for this.

@brettcannon
Copy link
Member Author

@chetankm-cs it will be however you set up a cron job on Heroku.

@Mariatta
Copy link
Member

Mariatta commented Oct 8, 2017

One option is to use the crontab scheduler in celery.

chetankm-cs pushed a commit to chetankm-cs/bedevere that referenced this issue Oct 9, 2017
chetankm-cs pushed a commit to chetankm-cs/bedevere that referenced this issue Oct 9, 2017
@chetankm-cs
Copy link

I have sent a pull request for this issue.
Any feedback will be appreciated.
Thanks

@webknjaz
Copy link
Contributor

There's existing integration from GitHub for this: https://probot.github.io/apps/stale/
You might want to just plug it (it's configurable with a yml file in repo).

@erlend-aasland
Copy link

Can we update the stale workflow to automatically close stale PRs without a signed CLA? That should be pretty uncontroversial, as we cannot accept PRs from persons who did not sign the CLA. I'll put up a PR if this sounds good to you all.

Propsed patch

Note: this patch has not been tested; GitHub Actions is not my area of expertise.

diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml
index e3b8b9f942..85eb108348 100644
--- a/.github/workflows/stale.yml
+++ b/.github/workflows/stale.yml
@@ -16,7 +16,18 @@ jobs:
     - uses: actions/stale@v4
       with:
         repo-token: ${{ secrets.GITHUB_TOKEN }}
+        exempt-pr-labels: 'CLA not signed'
         stale-pr-message: 'This PR is stale because it has been open for 30 days with no activity.'
         stale-pr-label: 'stale'
         days-before-stale: 30
         days-before-close: -1
+
+    - uses: actions/stale@v4
+      with:
+        repo-token: ${{ secrets.GITHUB_TOKEN }}
+        only-pr-labels: 'CLA not signed'
+        stale-pr-message: 'This PR is stale because it has been open for 30 days with no activity. If the CLA is not signed within one week, it will be closed.'
+        stale-pr-label: 'stale'
+        close-pr-message: 'Closing this stale PR because the CLA is still not signed.'
+        days-before-stale: 30
+        days-before-close: 7

@webknjaz
Copy link
Contributor

webknjaz commented Jan 9, 2022

@erlend-aasland the patch looks good but I'd update the message to include a link to the documentation explaining why it's important and how to sign it.

@erlend-aasland
Copy link

erlend-aasland commented Jan 9, 2022

[...] update the message to include a link to the documentation explaining why it's important and how to sign it.

Good point, I'll do that on the PR. I'll wait for @brettcannon or @Mariatta to chime in before opening a PR, though. I created a PR; if the idea is rejected, I'll close it.

@ambv
Copy link

ambv commented Jan 9, 2022

Idea looks good to me but I'd give the author 14 days for signing the CLA as this sometimes requires communicating with the employer, etc. 7 days is a little tight.

Question: if there is activity on the PR after the label is applied, or the label is subsequently removed, will the PR still be closed? If not, the current suggested message is misleading.

As for giving more information on how to sign the CLA I think that's redundant with the original message bedevere-bot is sending asking for the CLA. We can put a link to https://devguide.python.org/pullrequest/#licensing as a reminder.

@erlend-aasland
Copy link

Question: if there is activity on the PR after the label is applied, or the label is subsequently removed, will the PR still be closed? If not, the current suggested message is misleading.

I'll try and find out. If someone with more GitHub Actions experience than me already knows, please shout out :)

@hugovk
Copy link
Member

hugovk commented Jan 9, 2022

Question: if there is activity on the PR after the label is applied, or the label is subsequently removed, will the PR still be closed?

I believe not: activity will remove the stale label (#387 (comment)), and both the stale and CLA not signed labels are required for closing.

@hugovk
Copy link
Member

hugovk commented Jan 9, 2022

And to test it, I've applied PR python/cpython#30500 to my fork's main, with an extra commit to reduce the time limits to a day and to act upon a fake CLA not signed label: hugovk/cpython@615866c

@erlend-aasland Would you like to create a dummy PR (e.g. edit README) to my fork https://github.com/hugovk/cpython? I'll add the fake CLA not signed label and we'll find out in 2-3 of days.

Actually, let's do two dummy PRs: one I'll add the label to, and one I won't.

@erlend-aasland
Copy link

Thanks, @hugovk, will do!

@erlend-aasland
Copy link

erlend-aasland commented Jan 9, 2022

Quoting the stale action REAME:

"If an update/comment occur on stale issues or pull requests, the stale label will be removed and the timer will restart"

@hugovk
Copy link
Member

hugovk commented Jan 10, 2022

The two test PRs:

First cron run:

Both steps ran and processed the two PRs. No actions taken because it's less than 1 day since there was activity. Will check back tomorrow!

@hugovk
Copy link
Member

hugovk commented Jan 10, 2022

Test restarted:

We updated the PR to avoid double negatives ("has 'CLA signed'?" instead of "does not have 'CLA not signed'?" python/cpython#30500 (comment)) and I've updated the test repo and PRs to match, reseting the inactivity counters.

@hugovk
Copy link
Member

hugovk commented Jan 11, 2022

First run with new test:

Looks good so far, as last time:

Both steps ran and processed the two PRs. No actions taken because it's less than 1 day since there was activity. Will check back tomorrow!

@erlend-aasland
Copy link

@ambv:

Question: if there is activity on the PR after the label is applied, or the label is subsequently removed, will the PR still be closed? If not, the current suggested message is misleading.

No, it will not be closed. See python/cpython#30500 (comment)

@hugovk
Copy link
Member

hugovk commented Jan 15, 2022

The two test PRs:

First cron run:

Both steps ran and processed the two PRs. No actions taken because it's less than 1 day since there was activity. Will check back tomorrow!

Test complete, both worked as expected! 👍

@erlend-aasland
Copy link

python/cpython#30500 has now been merged. Should we still keep this ticket open?

@hugovk
Copy link
Member

hugovk commented Feb 17, 2022

We could keep this open a little longer to monitor its progress, especially after the next daily cron run?

Here's the current list of 22 stale + CLA not signed:

https://github.com/python/cpython/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc+label%3Astale+label%3A%22CLA+not+signed%22

image
image

I expect, if no other activity before the cron:

@hugovk
Copy link
Member

hugovk commented Feb 18, 2022

The new cron ran and we now have 15 stale+CLA not signed open:

https://github.com/python/cpython/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc+label%3Astale+label%3A%22CLA+not+signed%22

I expect, if no other activity before the cron:

  • the top one to have stale removed (it's had activity today): #29918

  • at least 17 to be closed (all last updated more than days-before-stale: 30 + days-before-close: 14 days ago):

Only 4 were closed:

#29264 #29084 #28904 #28110

13 still open:

#22696 #27848 #27552 #27341 #25744 #26930 #26458 #24556 #24942 #24474 #23998 #22695 #22671

  • The other four still open or possibly closed depending on when the stale label was actually applied: #27256 #29865 #28987 #29900

1 closed, 1 merged 🎉, 2 still open.


So why didn't we close all the expected ones? Because the cron job action doesn't run on all the PRs.

The last run: https://github.com/python/cpython/actions/runs/1861772715

From "Check PRs with 'CLA not signed' label":

Warning: No more operations left! Exiting...
Warning: If you think that not enough issues were processed you could try to increase the quantity related to the operations-per-run (​[https://github.com/actions/stale#operations-per-run​)](https://github.com/actions/stale#operations-per-run%E2%80%8B)) option which is currently set to 30
Statistics:
Processed PRs: 383
New stale PRs: 1
Closed PRs: 6
Added PRs labels: 1
Added PRs comments: 7
Fetched items: 400
Fetched items events: 7
Fetched items comments: 7
Operations performed: 32

That warning:

Warning: If you think that not enough issues were processed you could try to increase the quantity related to the operations-per-run (​https://github.com/actions/stale#operations-per-run​)) option which is currently set to 30

https://github.com/actions/stale#operations-per-run explains how the action is rate limited, and we can increase it but "if reached will block these API calls for one hour (or API calls from other actions using the same user (a.k.a.: the github-token from the repo-token option)".

The limit:

When using GITHUB_TOKEN, the rate limit is 1,000 requests per hour per repository.

https://docs.github.com/en/rest/overview/resources-in-the-rest-api#requests-from-github-actions

That sounds pretty high, I expect we can raise it. If operations-per-run: 30 dealt with 400 PRs, shall we try x4: operations-per-run: 120 to deal with the 1,592 open PRs?

Note we have two steps running in the action, one for adding the stale label, one for closing stale+no-CLA.


Also:

We can change the order PRs are fetched, from default ascending: false (get newest first) to ascending: true (get oldest first).

https://github.com/actions/stale#ascending

Does it make sense to deal with the older PRs first? They're more likely to be stale/abandoned. At least for the closing step.

@erlend-aasland
Copy link

If operations-per-run: 30 dealt with 400 PRs, shall we try x4: operations-per-run: 120 to deal with the 1,592 open PRs?

Sounds good to me :)

Does it make sense to deal with the older PRs first?

+1

@hugovk
Copy link
Member

hugovk commented Feb 18, 2022

Please see PR python/cpython#31407.

@hugovk
Copy link
Member

hugovk commented Feb 19, 2022

That was merged and ran last night, and is looking good.

https://github.com/python/cpython/runs/5255121854?check_suite_focus=true


Some more detail.

From "Check PRs with 'CLA not signed' label":

...
Warning: No more operations left! Exiting...
Warning: If you think that not enough issues were processed you could try to increase the quantity related to the operations-per-run (​[https://github.com/actions/stale#operations-per-run​)](https://github.com/actions/stale#operations-per-run%E2%80%8B)) option which is currently set to 120
Statistics:
Processed PRs: 37
New stale PRs: 30
Added PRs labels: 30
Added PRs comments: 30
Fetched items: 100
Fetched items events: 31
Fetched items comments: 31
Operations performed: 123

-> Switching to processing oldest first has helped, it's labelled some old PRs from 2017.


From "Check PRs with 'CLA not signed' label":

...
Batch #16 processed.
No more issues found to process. Exiting...
Statistics:
Processed PRs: 1587
New stale PRs: 4
No longer stale PRs: 4
Closed PRs: 10
Deleted PRs labels: 4
Added PRs labels: 4
Added PRs comments: 14
Fetched items: 1587
Fetched items events: 19
Fetched items comments: 19
Operations performed: 87

->

13 still open:

These 10 are now closed:

#27848 #27552 #27341 #26930 #26458 #24942 #24474 #23998 #22695 #22671

And 3 still open:

#22696 curiously had the stale label removed:

  [#24556] Remove the stale label since the pull request has a comment and the workflow should remove the stale label when updated
  [#24556] The pull request is no longer stale. Removing the stale label...
  [#24556] Removing the label "stale" from this pull request...
  [#24556] The label "stale" was removed
  [#24556] Skipping the process since the pull request is now un-stale
  [#24556] 2 operations consumed for this pull request

So what happened? The label was applied some time ago, comments were then made, the label should have been removed but the PR dropped off the list of processed PRs until now. So that's okay.

Same story for the other two: #25744 #24556

@erlend-aasland
Copy link

Looks good. Thanks for going through all the details for us, @hugovk!

@hugovk
Copy link
Member

hugovk commented Feb 19, 2022

Just came to mind, the BPO -> GitHub issues migration is happening soon:
https://discuss.python.org/t/github-issues-migration-is-coming-soon/13791

We don't want this action running on issues (at least not yet, it's something to reconsider once the migration is done).

And we're using only-pr-labels, so should be good:

Override only-labels but only to process the pull requests that contain all these label(s).

https://github.com/actions/stale#only-pr-labels

@hugovk
Copy link
Member

hugovk commented Mar 10, 2022

Okay, things are looking good here, let's close this :)

Currently 1 open CLA not signed + stale, and 37 closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants