Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

known_hosts bloats up on self-hosted runners #106

Closed
AllDmeat opened this issue Jan 17, 2022 · 8 comments · Fixed by #171
Closed

known_hosts bloats up on self-hosted runners #106

AllDmeat opened this issue Jan 17, 2022 · 8 comments · Fixed by #171

Comments

@AllDmeat
Copy link

AllDmeat commented Jan 17, 2022

In index.js you are always executing fs.appendFileSync. We are using self-hosted runners and such behaviour lead us to known_hosts reaching almost 4k of same lines and breaking some git clone-related steps. Manually removing this file helped us, but I think it should be done automatically in your post-step action.

@mpdude
Copy link
Member

mpdude commented Jan 17, 2022

Oh yes!

Removing in the post-step would be one option. But do self-hosted runners share state for multiple jobs? Might it happen that one job removes the line while another one is still running?

If so, maybe it would make more sense to add the line only if it does not already exist?

@AllDmeat
Copy link
Author

AllDmeat commented Jan 17, 2022

maybe it would make more sense to add the line only if it does not already exist?

Maybe yes

do self-hosted runners share state for multiple jobs?

AFAIK — no. But better check out documentation.

@cojocar
Copy link

cojocar commented Feb 10, 2022

If so, maybe it would make more sense to add the line only if it does not already exist?

This would make sense.

Is adding the fingerprints to known_hosts really needed? Alternatively, how about adding a switch e.g., update-known-hosts: bool that would control if the known_hosts will be altered? I can imagine that one can pass parameters to git/ssh to ignore the known_hosts altogether or, especially for self-hosting, you can save the fingerprints beforehand.

@mpdude
Copy link
Member

mpdude commented May 23, 2022

Adding the known_hosts entry was necessary a while ago before GitHub's worker images came with the keys pre-installed. Nowadays, probably we don't need it for GitHub environments at all. For self-hosted runners, one can pre-install the keys, also for other servers that might possbily be needed. I'd be glad not having to maintain and guarantee for the correct keys in this action.

So, do you see a nice way how we could deprecate and eventually remove the feature of adding the keys?

@michaelgambold
Copy link

how about adding a switch e.g., update-known-hosts: bool that would control if the known_hosts will be altered?

^ This would be my preferred action. So the default would be to add to known hosts for the current version (keep backwards compatibility) but allow an option to "skip" adding to the known hosts.

@michaelgambold
Copy link

Would something like this PR #142 work?

@mpdude
Copy link
Member

mpdude commented Oct 24, 2022

That does not really point out to users that they might need to review/change something on their side, and opens up the new question of how to get rid of that switch in the future.

We might start printing a deprecation warning (if the GHA system permits us to do so?), stating that future versions will no longer add the keys. Then, on the next minor release (we're still 0.x!) drop the keys.

WDYT?

@michaelgambold
Copy link

We might start printing a deprecation warning (if the GHA system permits us to do so?), stating that future versions will no longer add the keys. Then, on the next minor release (we're still 0.x!) drop the keys.

Sounds fine to me.

mpdude added a commit that referenced this issue Mar 24, 2023
We need to fix the SSH keys shipped with this action:
https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/

But, we have another issue
(#108) with regards to host
keys: On self-hosted runners which are not ephemeral the known_host file
fills up with repeated entries, because every action run adds a new line
with the same host keys.

Also, on those machines, the old key will still be in the `known_hosts`
file.

IMHO this action should not be repsonsible for shipping SSH host keys,
that's too much responsibility.

This section in the code is a leftover from early days when GitHub
provided runners did not include SSH keys at all. For a long time
already, GH takes care of placing their SSH keys in their runner images.

For self-hosted runners, those people setting up the runner should fetch
and verify SSH keys themselves and put it into the `known_hosts` file.

I know this is a breaking change and is going to annoy users. But on the
other hand, there is no better opportunity to drop this feature than
with an emergency-style key revocation as today.

Closes #106, closes #129, closes #169, closes #170, closes #172.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants