Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Istio MCS Generating ServiceExports #29763

Closed
wants to merge 60 commits into from

Conversation

josephpeacock
Copy link
Contributor

@josephpeacock josephpeacock commented Dec 23, 2020

This PR accomplishes stage 2 in the Istio MCS RFC

In short, it allows Istio to be configured to create ServiceExport objects for non cluster-local services in an Istio-managed cluster.

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ X ] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Pull Request Attributes

Please check any characteristics that apply to this pull request.

[ ] Does not have any changes that may affect Istio users.

@istio-policy-bot
Copy link

😊 Welcome @josephpeacock! This is either your first contribution to the Istio istio repo, or it's been
awhile since you've been here.

You can learn more about the Istio working groups, code of conduct, and contributing guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@google-cla google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Dec 23, 2020
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 23, 2020
@josephpeacock
Copy link
Contributor Author

This PR accomplishes #29385 and is dependent upon #29763

@josephpeacock josephpeacock changed the title Serviceexport Istio MCS Generating ServiceExports Dec 23, 2020
@josephpeacock josephpeacock requested review from a team as code owners December 23, 2020 22:16
@istio-testing istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 23, 2020
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the implement, could not see when the serviceExport will be consumed and by who

@@ -3620,6 +3620,11 @@ rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "watch", "list"]

#Used for MCS serviceexport management
- apiGroups: [""]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in core group, i think it can not be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unfamiliar with the manifests, could you recommend a more appropriate place?

#Used for MCS serviceexport management
- apiGroups: [""]
resources: ["serviceexports"]
verbs: ["create", "delete"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no get list needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and watch, probably update and patch will be needed as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR only needs create/delete on serviceexports. The eventual PR to consume them will likely need the ability to watch, but I want to add the minimal set of permissions necessary for this functionality.

queue queue.Instance
serviceInformer cache.SharedInformer

clusterLocalHosts []string //hosts marked as cluster-local, which will not have serviceeexports generated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for, concerned about the peformance

return false
}

func (sc *ServiceExportController) createServiceExportIfNotPresent(service *v1.Service) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who will consume the ServiceExports created here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short answer is anyone will be able to. I'm working in parallel with @nmittler on the implementation of the Istio MCS RFC. I took "phase 2", and there is another issue open for "phase 1" (#29384)

The consumption of ServiceExports is out of scope for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear how this works from the RFC, should wait for the draft implement.

As i read through the mcs-api desgin, it is not clear to me how this can solve multicluster topology issue.

Can you elaborate on how the EndpointSlice create is used? https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/1645-multi-cluster-services-api#importing-services

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzxuzhonghu

The basic MCS flow is:

  1. Something generates a ServiceExport for a service in a cluster
  2. The MCS controller (Istio?) sees the ServiceExport and creates ServiceImport and EndpointSlices in all of the other clusters in the clusterset (i.e. mesh).
  3. In the other clusters, the ServiceImport and EndpointSlices are used by K8s to program kube proxy to allow service calls to span clusters.

What this PR is attempting to accomplish is 1. This simply creates the ServiceExport resource ... it does nothing with EndpointSlices. This will have no real impact on the system if there is no MCS controller performing the ServiceImport.

Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why Istio is responsible for this. I presume MCSD will have multiple implementors, otherwise its not really useful. Will linkerd, consul, etc all be sending similar PRs, writing the same exact implementation of the same thing in 3 slightly different ways?

I would have expected the MCSD repo to have a controller, which is either deployed standalone, or maybe, once its past experimental, linked into Istiod as a library to reduce the deployment complexity.

I am a bit concerned that every project is going to implement the same controller slightly different and we will not actually have a consistent API like was desired.

#Used for MCS serviceexport management
- apiGroups: [""]
resources: ["serviceexports"]
verbs: ["create", "delete"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and watch, probably update and patch will be needed as well?

@@ -79,6 +79,14 @@ type Multicluster struct {
secretNamespace string
secretController *secretcontroller.Controller
syncInterval time.Duration

//MCS Settings - settings for behavior of controllers relating to MCS
mcsSettings *MCSSettings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is no need for this to be a pointer as far as I know

queue queue.Instance
serviceInformer cache.SharedInformer

clusterLocalHosts []string //hosts marked as cluster-local, which will not have serviceeexports generated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the API here. Why are we mixing an Istio API ("cluster local") with the MCSD api? Isn't the goal of MCSD that we are just another implementor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the goal of MCSD that we are just another implementor?

Yes, but one implementation detail that we get to determine is which services are exported (the API intentionally leaves that up to whatever entity is generating serviceExports). So we're piggybacking off of this Istio API with the assumption that cluster local hosts should not be exported for MCS.


_, err := sc.client.MulticlusterV1alpha1().ServiceExports(service.Namespace).Create(context.TODO(), &serviceExport, metav1.CreateOptions{})

if err != nil && strings.Contains(err.Error(), "already exists") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use errors.IsAlreadyExists

_, err := sc.client.MulticlusterV1alpha1().ServiceExports(service.Namespace).Create(context.TODO(), &serviceExport, metav1.CreateOptions{})

if err != nil && strings.Contains(err.Error(), "already exists") {
err = nil //This is the error thrown by the client if there is already an object with the name in the namespace. If that's true, we do nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the contents are wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this implementation, we are making the intentional choice that we don't care, and don't support cases in which some other entity (either human or 3rd party MCS controller) modifies a ServiceExport. So if it exists, we stop processing (to minimize cases where the other entity overwrite may be desired behavior).

}

func (sc *ServiceExportController) doInitialFullSync() {
allServices, err := sc.serviceClient.Services("").List(context.TODO(), metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling .List() is not acceptable for performance, we should be using the informer cache. Direct API calls like this have bit us many times in the past

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jan 1, 2021
@josephpeacock
Copy link
Contributor Author

@howardjohn @hzxuzhonghu Maybe this is something we should discuss at the next Environments working group meeting? I was under the impression that the approved RFC represented an agreement that we were going to build the specified functionality into Istio. In meeting with @nmittler, we decided that phases one and two from that document could be done in parallel, and that I would put together a PR encompassing only stage two.

I'm happy to shelve or permanently close this PR if it's agreed that building MCS functionality into Istio is the wrong direction for the project. You both had some very good specific feedback in the code as well, and I appreciate the prompt review. But I'd like for us to agree on the big picture direction before I spend more time rebasing and/or updating this PR.

@howardjohn
Copy link
Member

howardjohn commented Jan 4, 2021 via email

@josephpeacock
Copy link
Contributor Author

Fair enough, let me answer the questions that I feel that I can speak to:

I don't really understand why Istio is responsible for this. I presume MCSD will have multiple implementors, otherwise its not really useful. Will linkerd, consul, etc all be sending similar PRs, writing the same exact implementation of the same thing in 3 slightly different ways?

I think this is the intention of the k8s spec. They explicitly say in their document that anything should be able to be an MCS controller:

mcs-controller - A controller that syncs services across clusters and makes them available for multi-cluster service discovery and connectivity. There may be multiple implementations, this doc describes expected common behavior. The controller may be a single controller, multiple decentralized controllers, or a human using kubectl to create resources. This document aims to support any implementation that fulfills the behavioral expectations of this API.

In this case, I'd argue it's a question of value-add. Do we see value in building out MCS controller functionality in Istio? (with the alternative being having users need to use another MCS controller in tandem with Istio if they want that functionality). And I do think being able to use service names to route across multiple clusters would really simplify some multicluster Istio installations.

I would have expected the MCSD repo to have a controller, which is either deployed standalone, or maybe, once its past experimental, linked into Istiod as a library to reduce the deployment complexity.

The only official k8s MCS repo that I know of at the moment is for the MCS API. I don't know if there is a plan for an official k8s MCS controller, or if they are intending to leave it entirely up to the community.

I am a bit concerned that every project is going to implement the same controller slightly different and we will not actually have a consistent API like was desired.

You're probably right that there may be opportunities for a wider conversation about how specific the API is and should be moving forward on the k8s side. But I'm not sure if that's relevant to Istio's implementation, unless we want to wait to implement until the spec is more well-defined.


MCSServiceExportEnabled bool

ClusterLocalHosts []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this duplicate the configuration in MeshConfig?

@@ -232,6 +232,11 @@ func NewServer(args *PilotArgs) (*Server, error) {
s.initMeshNetworks(args, s.fileWatcher)
s.initMeshHandlers()

e.PushContext.Mesh = e.Mesh()
args.RegistryOptions.ClusterLocalHosts = e.PushContext.GetClusterLocalHosts(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't meshconfig be updated?

return sc.deleteServiceExportIfPresent(obj)
}

func convertToService(obj interface{}) (*v1.Service, error) {
Copy link
Contributor

@nmittler nmittler Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this elsewhere ... can we use a common function?

return cm, nil
}

func (sc *ServiceExportController) isServiceClusterLocal(service *v1.Service) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this function to the context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by this, can you clarify?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context is where you're getting information regarding cluster-local services. Could it just expose this method instead of defining it here?

//cannot use the auto-generated client as it hardcodes the namespace in the client struct, and we can't have one client per watched ns
err := sc.client.MulticlusterV1alpha1().ServiceExports(service.Namespace).Delete(context.TODO(), service.Name, metav1.DeleteOptions{})

if err != nil && strings.Contains(err.Error(), "not found") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a standard error here, similar to errors.IsAlreadyExists?

@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jan 7, 2021
@josephpeacock
Copy link
Contributor Author

Ready for re-review.

func (sc *ServiceExportController) createServiceExportIfNotPresent(service *v1.Service) error {
serviceExport := v1alpha1.ServiceExport{}
serviceExport.Namespace = service.Namespace
serviceExport.Name = service.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add an owner reference to this? This would also eliminate the need for handling deletions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually something that is explicitly called out by the K8s MCS Spec, and was decided against. Instead, they chose to rely entirely on the name mapping. I tried to make this implementation match the spec as closely as possible.

@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 11, 2021
@istio-testing
Copy link
Collaborator

@josephpeacock: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
integ-pilot-multicluster-tests_istio e8a4426 link /test integ-pilot-multicluster-tests_istio

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@nmittler
Copy link
Contributor

@josephpeacock thanks for all the help. I'm taking this over in #31401. Shall we just close this one out?

@josephpeacock
Copy link
Contributor Author

Yup, let's do that. Thanks for taking this over!

nmittler pushed a commit to nmittler/istio that referenced this pull request Mar 18, 2021
istio-testing pushed a commit that referenced this pull request Mar 18, 2021
Taken over from #29763

Co-authored-by: Joe Peacock <joe.peacock.joe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants