Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: return an InvalidParameter error when ep settings are wrong #47159

Merged
merged 1 commit into from
Jan 22, 2024

Conversation

akerouanton
Copy link
Member

@akerouanton akerouanton commented Jan 22, 2024

- What I did

Since v25.0 (commit ff50388), we validate endpoint settings when containers are created, instead of doing so when containers are started. However, a container created prior to that release would still trigger validation error at start-time. In such case, the API returns a 500 status code because the Go error isn't wrapped into an InvalidParameter error. This is now fixed.

- How to verify it

  1. Start the daemon from branch v24 ;
  2. Create a container with an invalid networking config ;
  3. Restart the daemon from branch v25 ;
  4. Make sure there's no such log:
ERRO[2024-01-22T11:36:22.370050219Z] Handler for POST /v1.43/containers/86dfef1d48b1/start returned error: invalid endpoint settings:
network-scoped alias is supported only for containers in user defined networks 

- A picture of a cute animal (not mandatory but encouraged)

Since v25.0 (commit ff50388), we validate endpoint settings when
containers are created, instead of doing so when containers are started.
However, a container created prior to that release would still trigger
validation error at start-time. In such case, the API returns a 500
status code because the Go error isn't wrapped into an InvalidParameter
error. This is now fixed.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Copy link
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

  • Create a container with an invalid networking config ;
  • Restart the daemon from branch v25 ;
  • Make sure there's no such log:

Hm.. it's unexpected for an existing container to no longer start; can we add a migration step to fix the container when upgrading from a previous version? Similar to the migration step that was added in #46926 for the (now removed) logentries logging driver?

@thaJeztah thaJeztah added this to the 26.0.0 milestone Jan 22, 2024
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change LGTM, but left a comment above to look at migration of existing containers

@thaJeztah
Copy link
Member

@akerouanton or @robmry could one of you open a cherry-pick for the 25.0 branch?

@thaJeztah thaJeztah merged commit f0eef50 into moby:master Jan 22, 2024
126 checks passed
@akerouanton
Copy link
Member Author

Hm.. it's unexpected for an existing container to no longer start; can we add a migration step to fix the container when upgrading from a previous version? Similar to the migration step that was added in #46926 for the (now removed) logentries logging driver?

Discussed internally -- I don't think any existing container would no longer start. It's rather that, it's now impossible to create a container with an invalid container. When network config
validation was added to the ContainerCreate route, I took care of returning an InvalidParameter to make sure the right HTTP status code is returned. The validation made at start-time was kept as-is to make sure containers created at an older time would still trigger validation errors. However, I didn't update validateEndpointSettings callsite to wrap the returned error with an InvalidParameter error.

@akerouanton akerouanton deleted the fix-bad-http-code branch January 22, 2024 14:19
@thaJeztah
Copy link
Member

@akerouanton Did that container start in v24, or did it fail to start? (wondering if the validation rules changed between v24 and v25; perhaps the logic (validate on start) is the same, but if the validation rules changed 🤔)

@akerouanton
Copy link
Member Author

@thaJeztah There might be -- I'm waiting for more details on docker/for-linux#1481. If that's the case, I agree we need to either fix the validation or add some migration logic.

I'll open a backport.

@thaJeztah
Copy link
Member

@s4ke
Copy link
Contributor

s4ke commented Feb 5, 2024

This validation logic has another unintended side-effect for Docker Swarm users, I think. We have a Docker Swarm cluster with the wrong subnet config of 10.0.0.1/16. I think this needs to be cleaned up, but it would be nice to leave an upgrade path for everyone that has a cluster with a slightly broken (but working) config.

@akerouanton
Copy link
Member Author

@s4ke Thanks for reporting. Could you please open a ticket describing how this issue manifests, etc...?

@thaJeztah
Copy link
Member

Is it in that case that swarm has a service with an invalid config stored, which now fails (i.e. if it tries to (re)create tasks from that config, it fails? 🤔

@s4ke
Copy link
Contributor

s4ke commented Feb 5, 2024

@thaJeztah Exactly that. I am currently in the process of recreating at least part of our clusters, but I think that we are not the only people running into this. Sadly we have had this error also in our swarmsible repo (https://github.com/neuroforgede/swarmsible), so anyone that is using this will have run into the same issue as the examples there were wrong, but that is not your problem :)

@thaJeztah
Copy link
Member

Do you know what option is invalid? (curious if this is something we could adjust for if the container is created through the SwarmKit executor)

@s4ke
Copy link
Contributor

s4ke commented Feb 5, 2024

You mean the exact field in the service spec?

EDIT: just checked. The service spec nor the output of docker inspect does not mention anything regarding the subnet config. This issue stems from the ingress creation that was done wrong:

        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.1/16",
                    "Gateway": "10.0.0.1"
                }
            ]
        },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants