Deploy new epoxy-extension-server + setup_k8s.sh related changes (#236)

* Replaces old epoxy extension services with new server The token-server and bmc-store-password ePoxy extensions are now replaced by a new ePoxy "extension server." Instead of individual extension container images, they are now all combined into a single binary and container image that listens on a single port. * Uses date string versions for images not in mlab-oti See long comment in change set for details * Deploys a single ePoxy extension server Previously there were separate token-server and bmc-store-password containers and systemd units. These have now been combined into a single extension server that listens on a single port. * Fixes nameing violation for virtual images * Installs apparmor package flannel was failing to start on sandbox nodes, causing the node to be ina NotReady state because networking was not ready. The pod description had this event: "Error: failed to create containerd container: get apparmor_parser version: exec: "apparmor_parser": executable file not found in $PATH" This appears to be related to some changes going on in containerd: containerd/containerd#8087 * Fixes a syntax error in mount-data-api.sh * Fixes ordering of control plane services The create-control-plane.service is supposed to run _after_ mount-data-api, but that ordering was broken because the name of the service changed and I failed to update the "After" block with the new name. * Don't fail the build when cluster vesion not available If the query to the live cluster for its version fails, then don't bother doing any version checking. The live cluster may not even exist, and possibly needs the images from this build so that it can be created. * Redundant check for working cluster before init Adds an additional, redundant check for the existence of /etc/kubernetes/admin.conf before initializing the cluster. A bug in our config caused the service unit to run even though that file existed, and kubeadm overwrote numerous things before finally erroring out. Can't hurt to add the additional check in this file. For nodes joining the cluster, wait for 90s (up from 60s) before trying to join to give the primary control plane node time to finish setting everything up. I discovered that 60s was not quite enough, and nodes joining the control plane might get a connection refused from the primary API endpoint. * Don't mkdir /etc/kubernetes/manifests on API machines On control plane machines, /etc/kubernetes is supposed to be a symlink to /mnt/cluster-data/kubernetes. When /etc/kubernetes already exists as a regular dir, then ln creates a symlink inside /etc/kubernetes, breaking the configuration and breakage of the create-control-plane service. Anyway, on control plane nodes that directory will be created automatically by kubeadm. * Makes setup_k8s.sh parse allocate_k8s_token v2 data ePoxy extension allocate_k8s_token V2 returns all the data needed to join the cluster. This commit removes all templating from setup_k8s.sh and moves it into the physical image filesystem. It is now a static script which can fetch everything it needs from allocate_k8s_token V2. * Makes setup_k8s.sh executable * Refactors the join-cluster.sh script Previously, the script assumed that all VMs were going to be part of a MIG. We have decided to have a hybrid approach with both MIGs and standard VMs, which required a few changes. Additionally, configure the script to the V2 allocate_k8s_token ePoxy extension, which returns all the data needed to join the cluster, not just the token. This also required some refactoring of the code.
m-lab · May 9, 2023 · ca6a0b1 · ca6a0b1
1 parent 8e69c8f
commit ca6a0b1
Show file tree

Hide file tree

Showing 14 changed files with 149 additions and 132 deletions.
diff --git a/actions/stage3_ubuntu/stage3post.json b/actions/stage3_ubuntu/stage3post.json
@@ -3,15 +3,8 @@
       "env": {
          "PATH": "/bin:/usr/bin:/usr/local/bin"
       },
-      "files": {
-         "setup_k8s": {
-            "url": "https://storage.googleapis.com/epoxy-{{kargs `epoxy.project`}}/latest/stage3_ubuntu/setup_k8s.sh"
-         }
-      },
       "commands": [
-         "# Make setup_k8s.sh executable and then run with the epoxy.ipv4 config",
-         "/usr/bin/chmod 755 {{.files.setup_k8s.name}}",
-         "{{.files.setup_k8s.name}} {{kargs `epoxy.project`}} {{kargs `epoxy.ipv4`}} {{kargs `epoxy.hostname`}} {{kargs `epoxy.allocate_k8s_token`}}"
+         "/opt/mlab/bin/setup_k8s.sh {{kargs `epoxy.project`}} {{kargs `epoxy.ipv4`}} {{kargs `epoxy.hostname`}} {{kargs `epoxy.allocate_k8s_token`}}"
       ]
    }
 }
diff --git a/...buntu/opt/mlab/conf/setup_k8s.sh.template → ...s/stage3_ubuntu/opt/mlab/bin/setup_k8s.sh b/...buntu/opt/mlab/conf/setup_k8s.sh.template → ...s/stage3_ubuntu/opt/mlab/bin/setup_k8s.sh
@@ -6,11 +6,10 @@ ln --force --symbolic /var/log/setup_k8s.log-$curr_date /var/log/setup_k8s.log
 
 set -euxo pipefail
 
-# This script is intended to be called by epoxy as the action for the last stage
-# in the boot process.  The actual epoxy config that calls this file can be
-# found at:
-#   https://github.com/m-lab/epoxy-images/blob/dev/actions/stage3_coreos/stage3post.json
-
+# This script is intended to be called by epoxy_client as the action for the
+# last stage in the boot process.  The actual epoxy config that calls this file
+# can be found at:
+#    https://github.com/m-lab/epoxy-images/blob/main/actions/stage3_ubuntu/stage3post.json
 # This should be the final step in the boot process. Prior to this script
 # running, we should have made sure that the disk is partitioned appropriately
 # and mounted in the right places (one place to serve as a cache for Docker
@@ -33,10 +32,6 @@ METRO="${SITE/[0-9]*/}"
 # Also, be 100% sure /sbin and /usr/sbin are in PATH.
 export PATH=$PATH:/sbin:/usr/sbin:/opt/bin:/opt/mlab/bin
 
-# Make sure to download any and all necessary auth tokens prior to this point.
-# It should be a simple wget from the control plane node to make that happen.
-LOAD_BALANCER="api-platform-cluster.${GCP_PROJECT}.measurementlab.net"
-
 # Capture K8S version for later usage.
 RELEASE=$(kubelet --version | awk '{print $2}')
 
@@ -52,9 +47,9 @@ mkdir --parents /etc/kubernetes/manifests
 
 systemctl daemon-reload
 
-# Fetch k8s token via K8S_TOKEN_URL. Curl should report most errors to stderr,
-# so write stderr to a file so we can read any error code.
-TOKEN=$( curl --fail --silent --show-error -XPOST --data-binary "{}" \
+# Fetch k8s cluster join data from K8S_TOKEN_URL. Curl should report most errors
+# to stderr, so write stderr to a file so we can read any error code.
+JOIN_DATA=$( curl --fail --silent --show-error -XPOST --data-binary "{}" \
     ${K8S_TOKEN_URL} 2> $K8S_TOKEN_ERROR_FILE )
 # IF there was an error and the error was 408 (Request Timeout), then reboot
 # the machine to reset the token timeout.
@@ -63,10 +58,14 @@ if [[ -n $ERROR_408 ]]; then
   /sbin/reboot
 fi
 
-kubeadm join "${LOAD_BALANCER}:6443" \
-  --v 4 \
-  --token "${TOKEN}" \
-  --discovery-token-ca-cert-hash {{CA_CERT_HASH}}
+# $JOIN_DATA should contain a simple JSON block with all the information needed
+# to join the cluster.
+api_address=$(echo "$JOIN_DATA" | jq -r '.api_address')
+ca_hash=$(echo "$JOIN_DATA" | jq -r '.ca_hash')
+token=$(echo "$JOIN_DATA" | jq -r '.token')
+
+kubeadm join "$api_address"  --v 4  --token "$token" \
+  --discovery-token-ca-cert-hash "$ca_hash"
 
 systemctl daemon-reload
 systemctl enable kubelet

diff --git a/configs/virtual_ubuntu/etc/systemd/system/bmc-store-password.service b/configs/virtual_ubuntu/etc/systemd/system/bmc-store-password.service
diff --git a/configs/virtual_ubuntu/etc/systemd/system/create-control-plane.service b/configs/virtual_ubuntu/etc/systemd/system/create-control-plane.service
@@ -1,6 +1,6 @@
 [Unit]
 Description=Initialize the kubernetes control plane
-After=kubelet.service mount-cluster-data.service
+After=kubelet.service mount-data-api.service
 Requires=mount-data-api.service
 # The presence of this file indicates that the cluster has already been created.
 # If it exists, do not run this unit.

diff --git a/configs/virtual_ubuntu/etc/systemd/system/epoxy-extension-server.service b/configs/virtual_ubuntu/etc/systemd/system/epoxy-extension-server.service
@@ -0,0 +1,24 @@
+[Unit]
+Description=ePoxy extension server
+After=docker.service mount-data-api.service
+Requires=docker.service mount-data-api.service
+
+# Run the ePoxy extension server (supporting the ePoxy Extension API).
+#
+# Mount /opt/bin so that the container has access to kubeadm, and
+# /etc/kubernetes so that kubeadm has access to admin.conf"
+[Service]
+TimeoutStartSec=120
+Restart=always
+ExecStartPre=-/usr/bin/docker stop %N
+ExecStartPre=-/usr/bin/docker rm %N
+ExecStart=/usr/bin/docker run --publish 8800:8800 \
+                              --volume /etc/kubernetes:/etc/kubernetes:ro \
+                              --volume /opt/bin:/opt/bin:ro \
+                              --name %N -- \
+                              measurementlab/epoxy-extensions:v0.4.0 \
+                              -bin-dir /opt/bin
+ExecStop=/usr/bin/docker stop %N
+
+[Install]
+WantedBy=multi-user.target
diff --git a/configs/virtual_ubuntu/etc/systemd/system/token-server.service b/configs/virtual_ubuntu/etc/systemd/system/token-server.service
diff --git a/configs/virtual_ubuntu/opt/mlab/bin/create-control-plane.sh b/configs/virtual_ubuntu/opt/mlab/bin/create-control-plane.sh
@@ -33,6 +33,18 @@ k8s_version=$(kubectl version --client=true --output=json | jq --raw-output '.cl
 # The internal DNS name of this machine.
 internal_dns="api-platform-cluster-${zone}.${zone}.c.${project}.internal"
 
+# If this file exists, then the cluster must already be initialized. The
+# systemd service unit file that runs this script also has a conditional check
+# for this file and should not run if it exists. This is just a backup,
+# redundant check, just in case for some reason the file exists but the service
+# unit gets run anyway. This happened to me (kinkade), where a small bug in the
+# configurations caused this service to run, even though this file existed, and
+# kubeadm overwrote that file and others before finally erroring out due a
+# preflight check failure.
+if [[ -f /etc/kubernetes/admin.conf ]]; then
+  exit 0
+fi
+
 # Evaluate the kubeadm config template
 sed -e "s|{{PROJECT}}|${project}|g" \
     -e "s|{{INTERNAL_IP}}|${internal_ip}|g" \
@@ -165,21 +177,13 @@ function initialize_cluster() {
   gcloud compute project-info add-metadata --metadata "lb_dns=${lb_dns}" --project $project
   gcloud compute project-info add-metadata --metadata "token_server_dns=${token_server_dns}" --project $project
 
-  # Add the current CA cert hash to the setup_k8s.sh script which physical
-  # platform nodes use to join the cluster, then push the evaluated template to
-  # GCS.
-  sed -e "s/{{CA_CERT_HASH}}/${ca_cert_hash}/" /opt/mlab/conf/setup_k8s.sh.template > setup_k8s.sh
-  cache_control="Cache-Control:private, max-age=0, no-transform"
-  gsutil -h "$cache_control" cp setup_k8s.sh "gs://epoxy-${project}/latest/stage3_ubuntu/setup_k8s.sh"
-
   # TODO (kinkade): the only thing using these admin cluster credentials is
   # Cloud Build for the k8s-support repository, which needs to apply
   # workloads to the cluster. We need to find a better way for Cloud Build to
   # authenticate to the cluster so that we don't have to store admin cluster
   # credentials in GCS.
   gsutil -h "$cache_control" cp /etc/kubernetes/admin.conf "gs://k8s-support-${project}/admin.conf"
 
-
   # Apply the flannel DamoneSets and related resources to the cluster so that
   # cluster networking will come up. Without it, nodes will never consider
   # themselves ready.
@@ -219,7 +223,7 @@ function join_cluster() {
   # Once the first API endpoint is up, it still has some housekeeping work to
   # do before other control plane machines are ready to joing the cluster. Give
   # it a bit to finish.
-  sleep 60
+  sleep 90
 
   token=$(get_bootstrap_token)
   ca_cert_hash=$(

diff --git a/configs/virtual_ubuntu/opt/mlab/bin/join-cluster.sh b/configs/virtual_ubuntu/opt/mlab/bin/join-cluster.sh
@@ -12,13 +12,36 @@ METADATA_URL="http://metadata.google.internal/computeMetadata/v1"
 CURL_FLAGS=(--header "Metadata-Flavor: Google" --silent)
 
 # Collect data necessary to proceed.
+epoxy_extension_server="epoxy-extension-server.${project}.measurementlab.net"
 external_ip=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/instance/network-interfaces/0/access-configs/0/external-ip")
 hostname=$(hostname)
 k8s_labels=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/instance/attributes/k8s_labels")
 k8s_node=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/instance/attributes/k8s_node")
 lb_dns=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/project/attributes/lb_dns")
 project=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/project/project-id")
-token_server_dns=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/project/attributes/token_server_dns")
+
+# MIG instances will have an "instance-template" attribute, other VMs will not.
+# Record the HTTP status code of the request into a variable. 200 means
+# "instance-template" exists and that this is a MIG instance. 404 means it is
+# not part of a MIG. We use this below to determine whether to attempt to
+# append the unique 4 char suffix of MIG instances to the k8s node name.
+is_mig=$(
+  curl "${CURL_FLAGS[@]}" --output /dev/null --write-out "%{http_code}" \
+    http://metadata.google.internal/computeMetadata/v1/instance/attributes/instance-template
+)
+
+# If this is a MIG instance, determine the random 4 char suffix of the instance
+# name, and then append that to the base k8s node name. The result should be a
+# typical M-Lab node/DNS name with a "-<xxxx>" string on the end. With this,
+# the node name is still unique, but we can easily just strip off the last 5
+# characters to get the name of the load balancer. Among other things, the
+# uuid-annotator can use this value as its -hostname flag so that it knows how
+# to annotate the data on this MIG instance.
+node_name="$k8s_node"
+if [[ $is_mig == "200" ]]; then
+  node_suffix="${hostname##*-}"
+  node_name="${k8s_node}-${node_suffix}"
+fi
 
 # Don't try to join the cluster until at least one control plane node is ready.
 # Keep trying this forever, until it succeeds, as there is no point in going
@@ -39,52 +62,43 @@ done
 # Wait a while after the control plane is accessible on the API port, since in
 # the case where the cluster is being initialized, there are a few housekeeping
 # items to handle, such as uploading the latest CA cert hash to the project metadata.
-sleep 60
-
+sleep 90
 
 # Generate a JSON snippet suitable for the token-server, and then request a
 # token.  https://github.com/m-lab/epoxy/blob/main/extension/request.go#L36
 extension_v1="{\"v1\":{\"hostname\":\"${hostname}\",\"last_boot\":\"$(date --utc +%Y-%m-%dT%T.%NZ)\"}}"
 
-# Fetch a token from the token-server.
+# Fetch cluster bootstrap join data from the epoxy-extension-server.
 #
-# TODO (kinkade): this only works from within GCP, so is not a long term
-# solution. It is just a stop-gap to get GCP VMs able to join the cluster until
-# we have implemented a more global solution that will support any cloud
-# provider. Going through ePoxy will not work for VMs in a managed instance
-# group (MIG), since siteinfo, ePoxy and Locate will only know about the load
-# balancer IP address, not the possibly ephemeral public IP of an auto-scaled
-# instance in a MIG.
-token=$(curl --data "$extension_v1" "http://${token_server_dns}:8800/v1/allocate_k8s_token" || true)
+# TODO (kinkade): here we are querying the epoxy-extension-server directly
+# through the GCP private network. This only works from within GCP, so is not a
+# long term solution. It is just a stop-gap to get GCP VMs able to join the
+# cluster until we have implemented a more global solution that will support
+# any cloud provider. Additionally, going through ePoxy will not work for VMs
+# in a managed instance group (MIG), since siteinfo, ePoxy and Locate will only
+# know about the load balancer IP address, not the possibly ephemeral public IP
+# of an auto-scaled instance in a MIG.
+join_data=$(
+  curl --data "$extension_v1" "http://${epoxy_extension_server}:8800/v2/allocate_k8s_token" || true
+)
 
-if [[ -z $token ]]; then
-  echo "Failed to get a cluster bootstrap join token from the token-server"
+if [[ -z $join_data ]]; then
+  echo "Failed to get cluster bootstrap join data from the epoxy-extension-server"
   exit 1
 fi
 
-# TODO (kinkade): this is GCP specific and will not work outside of GCP. This
-# will have to be made more generic before we can join VMs from other cloud
-# providers. A current proposal is to have the token-server return not only a
-# token, but also the CA cert hash, but this has yet to be implemented.
-#
-# Fetch the ca_cert_hash stored in project metadata.
-ca_cert_hash=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/project/attributes/platform_cluster_ca_hash")
+# $JOIN_DATA should contain a simple JSON block with all the information needed
+# to join the cluster.
+api_address=$(echo "$join_data" | jq -r '.api_address')
+ca_hash=$(echo "$join_data" | jq -r '.ca_hash')
+token=$(echo "$join_data" | jq -r '.token')
 
 # Set up necessary labels for the node.
 sed -ie "s|KUBELET_KUBECONFIG_ARGS=|KUBELET_KUBECONFIG_ARGS=--node-labels=$k8s_labels |g" \
   /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
 
-# Determine the random 4 char suffix of the instance name, and then append
-# that to the base k8s node name. The result should be a typical M-Lab node/DNS
-# name with a "-<xxxx>" string on the end. With this, the node name is still
-# unique, but we can easily just strip off the last 5 characters to get the name
-# of the load balancer. Among other things, the uuid-annotator can use this
-# value as its -hostname flag so that it knows how to annotate the data on this
-# MIG instance.
-node_suffix="${hostname##*-}"
-node_name="${k8s_node}-${node_suffix}"
-
-kubeadm join $lb_dns:6443 --token $token --discovery-token-ca-cert-hash $ca_cert_hash --node-name $node_name
+kubeadm join "$api_address"  --v 4  --token "$token" \
+  --discovery-token-ca-cert-hash "$ca_hash" --node-name $node_name
 
 # https://github.com/flannel-io/flannel/blob/master/Documentation/kubernetes.md#annotations
 kubectl --kubeconfig /etc/kubernetes/kubelet.conf annotate node $node_name \

diff --git a/configs/virtual_ubuntu/opt/mlab/bin/mount-data-api.sh b/configs/virtual_ubuntu/opt/mlab/bin/mount-data-api.sh
@@ -29,7 +29,7 @@ dev_name=$(
 if [[ -z $dev_name ]]; then
   echo "Failed to determine the persistent disk device name"
   exit 1
-}
+fi
 dev_path="/dev/disk/by-id/google-${dev_name}"
 
 # If the disk isn't formatted, then format it.

diff --git a/packer/configure_image.sh b/packer/configure_image.sh
@@ -10,6 +10,15 @@ set -euxo pipefail
 # experiments.
 mkdir -p /var/local/metadata
 
+# If this directory doesn't exist, then the kubelet complains bitterly,
+# polluting the logs terribly. On control plane nodes this will be created
+# automatically by the kubeadm. The cluster kubelet config is generated by
+# kubeadm when the cluster is initialized and stored as a configmap, which all
+# other nodes will download and use. We can't remove staticPodPath from the
+# kubelet config, because control plane kubelets use it, so just create the
+# directory on every node to avoid log pollution.
+mkdir -p /etc/kubernetes/manifests
+
 # Enable systemd units
 systemctl enable check-reboot.service
 systemctl enable check-reboot.timer

diff --git a/packer/configure_image_api.sh b/packer/configure_image_api.sh
@@ -28,7 +28,6 @@ echo -e "\nexport KUBECONFIG=/etc/kubernetes/admin.conf\n" >> /root/.bashrc
 systemctl enable docker
 systemctl enable reboot-api-node.service
 systemctl enable reboot-api-node.timer
-systemctl enable token-server.service
-systemctl enable bmc-store-password.service
+systemctl enable epoxy-extension-server.service
 systemctl enable mount-data-api.service
 systemctl enable create-control-plane.service
diff --git a/packer/configure_image_common.sh b/packer/configure_image_common.sh
@@ -22,6 +22,7 @@ sed -i -e '/secure_path/ s|"$|:/opt/bin"|' /etc/sudoers
 # Install required packages.
 apt update
 apt install -y \
+  apparmor \
   busybox \
   conntrack \
   containerd \
@@ -65,14 +66,5 @@ curl --silent --show-error --location \
 # For convenience, when an operator needs to login and inspect things with crictl.
 echo -e "\nexport CONTAINER_RUNTIME_ENDPOINT=unix:///run/containerd/containerd.sock\n" >> /root/.bashrc
 
-# If this directory doesn't exist, then the kubelet complains bitterly,
-# polluting the logs terribly. On control plane nodes this will be created
-# automatically by the kubeadm. The cluster kubelet config is generated by
-# kubeadm when the cluster is initialized and stored as a configmap, which all
-# other nodes will download and use. We can't remove staticPodPath from the
-# kubelet config, because control plane kubelets use it, so just create the
-# directory on every node to avoid log pollution.
-mkdir -p /etc/kubernetes/manifests
-
 # Enable systemd units
 systemctl enable kubelet.service