Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubeflow 1.9] Distributions and Kubeflow 1.9 #2611

Open
rimolive opened this issue Jan 25, 2024 · 14 comments
Open

[Kubeflow 1.9] Distributions and Kubeflow 1.9 #2611

rimolive opened this issue Jan 25, 2024 · 14 comments

Comments

@rimolive
Copy link
Member

rimolive commented Jan 25, 2024

This issue will be used to track the progress of and coordinate with distributions along the 1.9 release.

While we hope all distros will manage to be ready when the KF 1.9 release is out, this is sometimes difficult to achieve. In this issue, we want to both keep track of the progress of distributions towards the KF 1.9 release and also know which of the distros will be working on KF 1.9 (testing during the distribution testing cycle) even if they can't meet the KF 1.9 deadline.

Tagging distribution owners identified from previous releases (Any new or missed distro owners, please comment on this issue)

Distribution Representative(s) State
AWS @surajkota not participating in 1.9
Charmed Kubeflow @DnPlas participating in 1.9
Google Cloud @gkcalat
@zijianjoy
@Linchin
not participating in 1.9
IBM IKS @Tomcli
@yhwang
participating in 1.9
Microsoft not participating in 1.9
Nutanix @johnugeorge
@nagar-ajay
participating in 1.9
Red Hat OpenShift AI @rimolive participating in 1.9
Oracle Cloud Infrastructure @julioo not participating in 1.9
DeployKF @thesuperzapper participating in 1.9
VMWare @liuqi
@xujinheng
participating in 1.9
QBO @alexeadem participating in 1.9

Please let us know if you'll be participating in the 1.9 release by answering the following questions:

  • Are you planning on having your distro ready in sync with the KF 1.9 release?
  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
  • If you cannot participate, when can the community expect your distro to be ready for release 1.9?

Please note the release timelines are being discussed in #2606.

cc @kubeflow/release-team @jbottum

@kubeflow-bot kubeflow-bot added this to To Do in Needs Triage Jan 25, 2024
@ca-scribner
Copy link

@rimolive can you remove @DnPlas from Charmed Kubeflow and replace her with myself? ty!

to your questions, for Charmed Kubeflow:

  • Are you planning on having your distro ready in sync with the KF 1.9 release?
    • yes
  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
    • yes

@thesuperzapper
Copy link
Member

@rimolive deployKF will participate in 1.9, but it's not 100% clear exactly what that will look like.


Separately, given "Kubeflow on AWS" did not participate in 1.8, and announced they were no longer supporting their distribution in awslabs/kubeflow-manifests#794, I think its unlikely they will do 1.9?

Given this, I proposed moving them to "legacy" on the Kubeflow website on this PR kubeflow/website#3641.

However, I also want to avoid confusion with users, because they might think that Kubeflow no longer supports AWS due to the "Kubeflow on AWS" name. So I also think we should merge kubeflow/website#3643 at the same time, which tells users that "Kubeflow on XXXX" is just a name, and NOT the ONLY way to use Kubeflow on that platform.

@yhwang
Copy link
Member

yhwang commented Jan 31, 2024

For IBM IKS:

Are you planning on having your distro ready in sync with the KF 1.9 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@liuqi
Copy link

liuqi commented Feb 3, 2024

For VMware Distro:

Are you planning on having your distro ready in sync with the KF 1.9 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@alexeadem
Copy link

For QBO Distro:

Are you planning on having your distro ready in sync with the KF 1.9 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@tiansiyuan
Copy link

For VMware Distro:

Are you planning on having your distro ready in sync with the KF 1.9 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@rimolive
Copy link
Member Author

rimolive commented May 6, 2024

Calling all Distribution owners! I'm proud to announce our first Release Candidate for Kubeflow 1.9!

You can find the release details in the following URL:

https://github.com/kubeflow/manifests/releases/tag/v1.9.0-rc.0

We'll be working on another Release Candidate when we have Notebooks and KServe Models Webapp updated and ready for KF 1.9. We can use this issue to keep track of blocker issues for distributions while we work on fixing them.

cc @ca-scribner @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard

@juliusvonkohout
Copy link
Member

juliusvonkohout commented May 7, 2024

We also have to update cert-manager, knative, istio, seldon, bentoml etc which will come in later RCs.

@StefanoFioravanzo
Copy link
Member

@ca-scribner @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard Can you please acknowledge that you are aware of Kubeflow 1.9 RC0 and are aware the the distributions testing phase has started? Please react with a thumbs up if everything is okay from your side and you are proceeding with testing.

@thesuperzapper
Copy link
Member

thesuperzapper commented May 13, 2024

deployKF is mostly waiting on the updates from Notebooks (kubeflow/kubeflow#7453), but I am aware that a 1.9.0-RC0 was cut with other components.

@alexeadem
Copy link

alexeadem commented May 13, 2024

What do we mean by '(around 1.28)' here: https://github.com/kubeflow/manifests/tree/v1.9.0-rc.0?tab=readme-ov-file#prerequisites

Is that v1.28.0 and v1.27.11?

I'm proceeding with the testing in QBO.

OK: Everything is looking good in QBO. Tested by doing a vector addition test.

Details:

git branch
* (HEAD detached at v1.9.0-rc.0)

In Kubernetes v1.28.0:

qbo get nodes kubeflow_v1_9_0_nvidia | jq .nodes[]?.image
"kindest/node:v1.28.0"
"kindest/node:v1.28.0"
"kindest/node:v1.28.0"

with NVIDIA GPU Operator

helm list -n gpu-operator
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/alex/.qbo/kubeflow_v1_9_0_nvidia.conf
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/alex/.qbo/kubeflow_v1_9_0_nvidia.conf
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
gpu-operator-1715634796 gpu-operator    1               2024-05-13 21:13:18.636880948 +0000 UTC deployed        gpu-operator-v24.3.0    v24.3.0 

And Kustomize

./kustomize version
v5.4.1
  • There is only a small change I had to do:

It looks like platform-agnostic-multi-user-pns is not longer available
./kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user-pns | kubectl apply -f -

as per kubeflow/pipelines#5285

So I used the following instead. I'll update the QBOT installer for this version
./kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user | kubectl apply -f -

This is what it was deployed

kubectl get pods --all-namespaces -o jsonpath="{..image}" | sed 's/ /\n/g' | sort | uniq
docker.io/istio/pilot:1.17.5
docker.io/istio/proxyv2:1.17.5
docker.io/kindest/kindnetd:v20220726-ed811e41
docker.io/kindest/local-path-provisioner:v0.0.22-kind.0
docker.io/kserve/kserve-controller:v0.12.1
docker.io/kserve/models-web-app:v0.10.0
docker.io/kubeflow/training-operator:v1-f8f7363
docker.io/kubeflowkatib/katib-controller:v0.17.0-rc.0
docker.io/kubeflowkatib/katib-db-manager:v0.17.0-rc.0
docker.io/kubeflowkatib/katib-ui:v0.17.0-rc.0
docker.io/kubeflownotebookswg/centraldashboard:v1.8.0
docker.io/kubeflownotebookswg/jupyter-scipy:v1.8.0
docker.io/kubeflownotebookswg/jupyter-web-app:v1.8.0
docker.io/kubeflownotebookswg/kfam:v1.8.0
docker.io/kubeflownotebookswg/notebook-controller:v1.8.0
docker.io/kubeflownotebookswg/poddefaults-webhook:v1.8.0
docker.io/kubeflownotebookswg/profile-controller:v1.8.0
docker.io/kubeflownotebookswg/pvcviewer-controller:v1.8.0
docker.io/kubeflownotebookswg/tensorboard-controller:v1.8.0
docker.io/kubeflownotebookswg/tensorboards-web-app:v1.8.0
docker.io/kubeflownotebookswg/volumes-web-app:v1.8.0
docker.io/library/mysql:8.0.29
docker.io/library/python:3.7
docker.io/metacontrollerio/metacontroller:v2.0.4
gcr.io/knative-releases/knative.dev/eventing/cmd/controller@sha256:92967bab4ad8f7d55ce3a77ba8868f3f2ce173c010958c28b9a690964ad6ee9b
gcr.io/knative-releases/knative.dev/eventing/cmd/webhook@sha256:ebf93652f0254ac56600bedf4a7d81611b3e1e7f6526c6998da5dd24cdc67ee1
gcr.io/knative-releases/knative.dev/net-istio/cmd/controller@sha256:421aa67057240fa0c56ebf2c6e5b482a12842005805c46e067129402d1751220
gcr.io/knative-releases/knative.dev/net-istio/cmd/webhook@sha256:bfa1dfea77aff6dfa7959f4822d8e61c4f7933053874cd3f27352323e6ecd985
gcr.io/knative-releases/knative.dev/serving/cmd/activator@sha256:c2994c2b6c2c7f38ad1b85c71789bf1753cc8979926423c83231e62258837cb9
gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler@sha256:8319aa662b4912e8175018bd7cc90c63838562a27515197b803bdcd5634c7007
gcr.io/knative-releases/knative.dev/serving/cmd/controller@sha256:98a2cc7fd62ee95e137116504e7166c32c65efef42c3d1454630780410abf943
gcr.io/knative-releases/knative.dev/serving/cmd/domain-mapping-webhook@sha256:7368aaddf2be8d8784dc7195f5bc272ecfe49d429697f48de0ddc44f278167aa
gcr.io/knative-releases/knative.dev/serving/cmd/domain-mapping@sha256:f66c41ad7a73f5d4f4bdfec4294d5459c477f09f3ce52934d1a215e32316b59b
gcr.io/knative-releases/knative.dev/serving/cmd/webhook@sha256:4305209ce498caf783f39c8f3e85dfa635ece6947033bf50b0b627983fd65953
gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1
gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
gcr.io/ml-pipeline/api-server:2.2.0
gcr.io/ml-pipeline/cache-deployer:2.2.0
gcr.io/ml-pipeline/cache-server:2.2.0
gcr.io/ml-pipeline/frontend:2.2.0
gcr.io/ml-pipeline/metadata-envoy:2.2.0
gcr.io/ml-pipeline/metadata-writer:2.2.0
gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance
gcr.io/ml-pipeline/mysql:8.0.26
gcr.io/ml-pipeline/persistenceagent:2.2.0
gcr.io/ml-pipeline/scheduledworkflow:2.2.0
gcr.io/ml-pipeline/viewer-crd-controller:2.2.0
gcr.io/ml-pipeline/visualization-server:2.2.0
gcr.io/ml-pipeline/workflow-controller:v3.4.16-license-compliance
gcr.io/tfx-oss-public/ml_metadata_store_server:1.14.0
ghcr.io/dexidp/dex:v2.36.0
kserve/kserve-controller:v0.12.1
kserve/models-web-app:v0.10.0
kubeflow/training-operator:v1-f8f7363
kubeflownotebookswg/jupyter-scipy:v1.8.0
mysql:8.0.29
nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0
nvcr.io/nvidia/gpu-operator:v24.3.0
nvcr.io/nvidia/k8s-device-plugin:v0.15.0-ubi8
nvcr.io/nvidia/k8s/container-toolkit:v1.15.0-ubuntu20.04
nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04
python:3.7
quay.io/jetstack/cert-manager-cainjector:v1.12.2
quay.io/jetstack/cert-manager-controller:v1.12.2
quay.io/jetstack/cert-manager-webhook:v1.12.2
quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
registry.k8s.io/coredns/coredns:v1.10.1
registry.k8s.io/etcd:3.5.9-0
registry.k8s.io/kube-apiserver:v1.28.0
registry.k8s.io/kube-controller-manager:v1.28.0
registry.k8s.io/kube-proxy:v1.28.0
registry.k8s.io/kube-scheduler:v1.28.0
registry.k8s.io/nfd/node-feature-discovery:v0.15.4

@juliusvonkohout
Copy link
Member

juliusvonkohout commented May 14, 2024

@alexeadem please check the updated release notes
https://github.com/kubeflow/manifests/releases/tag/v1.9.0-rc.0 1.27-1.29 officially
Yes, we made emissary the default in 1.7 or 1.8

@DnPlas
Copy link
Contributor

DnPlas commented May 21, 2024

Hi @rimolive @StefanoFioravanzo, a couple of things:

  1. Could I please ask to replace @ca-scribner with me as the distribution owner?
  2. We are aware that the distribution testing phase has started, but we have identified that components from the kubeflow/kubeflow repository are missing. Is this something coming in another RC? Is this planned?

@rimolive
Copy link
Member Author

rimolive commented May 23, 2024

Hi @rimolive @StefanoFioravanzo, a couple of things:

  1. Could I please ask to replace @ca-scribner with me as the distribution owner?

Done

  1. We are aware that the distribution testing phase has started, but we have identified that components from the kubeflow/kubeflow repository are missing. Is this something coming in another RC? Is this planned?

We decided to move on with rc0 because many components were upgraded, but there's a plan for rc1 with the remainder components. Is there one specific you are expecting to test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

10 participants