Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reverse lookup broken on Mac OS runners #8649

Open
3 of 10 tasks
oliver-sanders opened this issue Oct 24, 2023 · 10 comments
Open
3 of 10 tasks

reverse lookup broken on Mac OS runners #8649

oliver-sanders opened this issue Oct 24, 2023 · 10 comments
Assignees
Labels
bug report investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS

Comments

@oliver-sanders
Copy link

oliver-sanders commented Oct 24, 2023

Description

Reverse lookup of the host name is not working on the Mac OS runner.

ubuntu-latest:

$ nslookup fv-az955-853:
...
Name:	fv-az955-853.mlkcatuscfmejm4ctfapoghrmg.cx.internal.cloudapp.net

macos-latest:

$ nslookup $(hostname -f)
...
** server can't find Mac-1698147376508.local: NXDOMAIN

For an example, see the nslookup and python.socket steps of this workflow run:

https://github.com/oliver-sanders/actions-dns-test/actions/runs/6626432376/job/17999359243

First spotted a couple of weeks ago.

For context, see these two similar instances where reverse DNS stopped working on the Linux images:

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • macOS 11
  • macOS 12
  • macOS 13
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Image: macos-12
Version: 20230921.1

Image: macos-13
Version: 20231204.4

Is it regression?

Yes, seen with runners with macos version 12.7.1 or above.

Expected behavior

Reverse lookup should return the hostname.

Actual behavior

Reverse lookup results in error.

Repro steps

To reproduce, see this workflow:

https://github.com/oliver-sanders/actions-dns-test/actions/runs/6626432376/job/17999359243

@shamil-mubarakshin shamil-mubarakshin self-assigned this Oct 24, 2023
@shamil-mubarakshin shamil-mubarakshin added OS: macOS investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Oct 24, 2023
@shamil-mubarakshin
Copy link
Contributor

Hi @oliver-sanders,
Thanks for reporting. We are investigating the issue

@shamil-mubarakshin
Copy link
Contributor

@oliver-sanders, after poking around, nslookup doesn't seem to be the right tool for DNS lookups on macOS, which is also mentioned on tool's man page. It also leaves me wondering whether this behavior always been the case.
Using dscacheutil gives more stable results, honouring local files (similar hack was with Ubuntu in the past, but the issue was in IP inconsistency). E.g. below should return host IPs:

echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts 
dscacheutil -q host -a name $(hostname -f)

We will continue investigating and see if something else could be done

@oliver-sanders
Copy link
Author

oliver-sanders commented Oct 25, 2023

@shamil-mubarakshin, thanks for looking in.

Didn't know there were issues with nslookup on Mac OS, interesting.

I also used Python's socket bindings in my tests which show similar failures for reverse lookups which had worked previously:

socket.gethostname()                              : Mac-1698147376508.local
socket.getfqdn()                                  : Mac-1698147376508.local
socket.getfqdn(socket.gethostname())              : Mac-1698147376508.local
socket.getfqdn(socket.getfqdn())                  : Mac-1698147376508.local
socket.gethostbyname_ex(socket.gethostname())[0]  : [Errno 8] nodename nor servname provided, or not known
socket.gethostbyname_ex(socket.getfqdn())[0]      : [Errno 8] nodename nor servname provided, or not known

I managed to dig out an example of a workflow where the Mac OS job failed the first two times and passed on the third: https://github.com/cylc/cylc-flow/actions/runs/6634707075

With this message in the failed runs:

socket.gaierror: [Errno 8] nodename nor servname provided, or not known: 'Mac-1698197657674.local'
# attempt 1 - fail
  Image: macos-12
  Version: 20230921.1

# attempt 2 - fail
   Image: macos-12
  Version: 20231017.6

# attempt 3 - pass
   Image: macos-12
  Version: 20230921.4

rail added a commit to rail/cockroach that referenced this issue Oct 31, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 31, 2023
113502: bincheck: bind 127.0.0.1 r=celiala a=rail

Previously, bincheck started a single node database instance without specifying the address/port it listens on. In this case the server code tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128 for the details.

At some point this method stopped working on MacOS GitHub workers. There is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the problematic code.

Epic: none
Release note: None

Co-authored-by: Rail Aliiev <rail@iqchoice.com>
blathers-crl bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 31, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
blathers-crl bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 31, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
blathers-crl bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 31, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Nov 15, 2023
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Nov 16, 2023
@oliver-sanders
Copy link
Author

Unfortunately the workaround isn't quite enough for my use case due to other interactions which require additional workarounds. We still occasionally get test runners where reverse lookup works.

rickystewart pushed a commit to cockroachdb/cockroach that referenced this issue Nov 27, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
MetRonnie pushed a commit to MetRonnie/cylc-flow that referenced this issue Dec 8, 2023
@MetRonnie
Copy link

MetRonnie commented Dec 14, 2023

Getting some funky behaviour with Python 3.7 socket library (with @shamil-mubarakshin's above patch applied).

Runner: macOS 12.6.9:

>>> socket.gethostname()                           
'Mac-1702490668849.local'

>>> socket.gethostbyname_ex('Mac-1702490668849.local')                
('mac-1702490668849.local', [], ['192.168.64.23'])

>>> socket.getfqdn()                               
'Mac-1702490668849.local'

>>> socket.gethostbyname_ex('Mac-1702490668849.local')                
('Mac-1702490668849.local', ['Mac-1702490668849'], ['192.168.64.23'])

(This does not happen with macOS 12.7.1 runner (see #8642):)

>>> socket.gethostname()                           
'Mac-1702490723337.local'

>>> socket.gethostbyname_ex('Mac-1702490723337.local')                
('mac-1702490723337.local', [], ['10.213.1.225'])

>>> socket.getfqdn()                               
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa

>>> socket.gethostbyname_ex('Mac-1702490723337.local')                
('mac-1702490723337.local', [], ['10.213.1.225'])

MetRonnie pushed a commit to cylc/cylc-flow that referenced this issue Dec 18, 2023
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Dec 19, 2023
@oliver-sanders
Copy link
Author

oliver-sanders commented Dec 19, 2023

Tried out macos 13 beta image and ran into the same issue (updated the OP).

Error from Python's socket interface:

socket.gaierror: [Errno 8] nodename nor servname provided, or not known: 'Mac-1702983423766.local'

Runner information:

Current runner version: '2.311.0'
Operating System
  macOS
  13.6.1
  22G313
Runner Image
  Image: macos-13
  Version: 20231204.4
  Included Software: https://github.com/actions/runner-images/blob/macos-13/20231204.4/images/macos/macos-13-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/macos-13%2F20231204.4

The macos 11 image is fine.

oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Dec 19, 2023
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Dec 19, 2023
tbaudier added a commit to OpenGATE/opengate that referenced this issue Dec 20, 2023
According to
actions/runner-images#8642
actions/runner-images#8649

the macos runners are evolving leading to error of VM

It seems there is no problem with macos-11
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 15, 2024
This is likely the culprit for the sccache issues we were seeing on macos.  See actions/runner-images#8649 for more issues on the bug affecting macos-12 and macos-13

PiperOrigin-RevId: 598476243
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 16, 2024
This is likely the culprit for the sccache issues we were seeing on macos.  See actions/runner-images#8649 for more information on the bug affecting macos-12 and macos-13

PiperOrigin-RevId: 598476243
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 16, 2024
This is likely the culprit for the sccache issues we were seeing on macos.  See actions/runner-images#8649 for more information on the bug affecting macos-12 and macos-13

PiperOrigin-RevId: 598476243
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 16, 2024
This is likely the culprit for the sccache issues we were seeing on macos.  See actions/runner-images#8649 for more information on the bug affecting macos-12 and macos-13

PiperOrigin-RevId: 598476243
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 16, 2024
This is likely the culprit for the sccache issues we were seeing on macos.  See actions/runner-images#8649 for more information on the bug affecting macos-12 and macos-13

PiperOrigin-RevId: 598476243
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 16, 2024
This is likely the culprit for the sccache issues we were seeing on macos.  See actions/runner-images#8649 for more information on the bug affecting macos-12 and macos-13

PiperOrigin-RevId: 598476243
@squakez
Copy link

squakez commented Jan 17, 2024

I've just tried to downgrade to macos-11 but apparently we're hitting the very same issue. I've run a simple test forking @oliver-sanders repo to verify the local DNS is working on any of the macos but it seems it's failing for all the available macos runners: https://github.com/squakez/actions-dns-test/actions/runs/7559304732/job/20582826961

@oliver-sanders
Copy link
Author

oliver-sanders commented Jan 24, 2024

@shamil-mubarakshin suggested (#8649 (comment)) that nslookup might not be the right tool for the job on Mac OS although I don't know the reasons why. Maybe worth testing via another interface.

The Python interfaces I rely on for my use case do work reliably on the macos-11 image but are broken on all newer images. My project is sticking with the macos-11 runners for now, but this old runner will be withdrawn in due course at which point we will have to drop macos as our usage is too complex to work around with the patch in #8649 (comment).

It might be worth following this issue #7508 to see whether the issue is inherited by the new image.

@squakez
Copy link

squakez commented Jan 24, 2024

Yeah, I've seen that. However in my case the problem is not the direct usage of nslookup. It is docker process that is using the local dns service to resolve a local name defined in /etc/hosts/. What it seems to me is that the local DNS service is completely off (I've checked the host has nothing running on port 53 as well), so, any resolution of local names is failing. I found a workaround by using localhost ip, but definetely, it is something that would require some attention as we'd expect a full functionality parities between the different runners. Let's see how if goes in future runners.

@oliver-sanders
Copy link
Author

it seems to me is that the local DNS service is completely off

^ that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS
Projects
None yet
Development

No branches or pull requests

5 participants