Skip to content

Commit

Permalink
Deploy new epoxy-extension-server + setup_k8s.sh related changes (#236)
Browse files Browse the repository at this point in the history
* Replaces old epoxy extension services with new server

The token-server and bmc-store-password ePoxy extensions are now
replaced by a new ePoxy "extension server." Instead of individual
extension container images, they are now all combined into a single
binary and container image that listens on a single port.

* Uses date string versions for images not in mlab-oti

See long comment in change set for details

* Deploys a single ePoxy extension server

Previously there were separate token-server and bmc-store-password
containers and systemd units. These have now been combined into a single
extension server that listens on a single port.

* Fixes nameing violation for virtual images

* Installs apparmor package

flannel was failing to start on sandbox nodes, causing the node to be
ina NotReady state because networking was not ready.  The pod
description had this event:

"Error: failed to create containerd container: get apparmor_parser
version: exec: "apparmor_parser": executable file not found in $PATH"

This appears to be related to some changes going on in containerd:

containerd/containerd#8087

* Fixes a syntax error in mount-data-api.sh

* Fixes ordering of control plane services

The create-control-plane.service is supposed to run _after_
mount-data-api, but that ordering was broken because the name of the
service changed and I failed to update the "After" block with the new
name.

* Don't fail the build when cluster vesion not available

If the query to the live cluster for its version fails, then don't
bother doing any version checking. The live cluster may not even exist,
and possibly needs the images from this build so that it can be created.

* Redundant check for working cluster before init

Adds an additional, redundant check for the existence of
/etc/kubernetes/admin.conf before initializing the cluster. A bug in our
config caused the service unit to run even though that file existed, and
kubeadm overwrote numerous things before finally erroring out. Can't
hurt to add the additional check in this file.

For nodes joining the cluster, wait for 90s (up from 60s) before trying
to join to give the primary control plane node time to finish setting
everything up. I discovered that 60s was not quite enough, and nodes
joining the control plane might get a connection refused from the
primary API endpoint.

* Don't mkdir /etc/kubernetes/manifests on API machines

On control plane machines, /etc/kubernetes is supposed to be a symlink
to /mnt/cluster-data/kubernetes. When /etc/kubernetes already exists as
a regular dir, then ln creates a symlink inside /etc/kubernetes,
breaking the configuration and breakage of the create-control-plane
service. Anyway, on control plane nodes that directory will be created
automatically by kubeadm.

* Makes setup_k8s.sh parse allocate_k8s_token v2 data

ePoxy extension allocate_k8s_token V2 returns all the data needed to
join the cluster. This commit removes all templating from setup_k8s.sh
and moves it into the physical image filesystem. It is now a static
script which can fetch everything it needs from allocate_k8s_token V2.

* Makes setup_k8s.sh executable

* Refactors the join-cluster.sh script

Previously, the script assumed that all VMs were going to be part of a
MIG. We have decided to have a hybrid approach with both MIGs and
standard VMs, which required a few changes.

Additionally, configure the script to the V2 allocate_k8s_token ePoxy
extension, which returns all the data needed to join the cluster, not
just the token. This also required some refactoring of the code.
  • Loading branch information
nkinkade committed May 9, 2023
1 parent 8e69c8f commit ca6a0b1
Show file tree
Hide file tree
Showing 14 changed files with 149 additions and 132 deletions.
9 changes: 1 addition & 8 deletions actions/stage3_ubuntu/stage3post.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,8 @@
"env": {
"PATH": "/bin:/usr/bin:/usr/local/bin"
},
"files": {
"setup_k8s": {
"url": "https://storage.googleapis.com/epoxy-{{kargs `epoxy.project`}}/latest/stage3_ubuntu/setup_k8s.sh"
}
},
"commands": [
"# Make setup_k8s.sh executable and then run with the epoxy.ipv4 config",
"/usr/bin/chmod 755 {{.files.setup_k8s.name}}",
"{{.files.setup_k8s.name}} {{kargs `epoxy.project`}} {{kargs `epoxy.ipv4`}} {{kargs `epoxy.hostname`}} {{kargs `epoxy.allocate_k8s_token`}}"
"/opt/mlab/bin/setup_k8s.sh {{kargs `epoxy.project`}} {{kargs `epoxy.ipv4`}} {{kargs `epoxy.hostname`}} {{kargs `epoxy.allocate_k8s_token`}}"
]
}
}
31 changes: 15 additions & 16 deletions ...buntu/opt/mlab/conf/setup_k8s.sh.template → ...s/stage3_ubuntu/opt/mlab/bin/setup_k8s.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,10 @@ ln --force --symbolic /var/log/setup_k8s.log-$curr_date /var/log/setup_k8s.log

set -euxo pipefail

# This script is intended to be called by epoxy as the action for the last stage
# in the boot process. The actual epoxy config that calls this file can be
# found at:
# https://github.com/m-lab/epoxy-images/blob/dev/actions/stage3_coreos/stage3post.json

# This script is intended to be called by epoxy_client as the action for the
# last stage in the boot process. The actual epoxy config that calls this file
# can be found at:
# https://github.com/m-lab/epoxy-images/blob/main/actions/stage3_ubuntu/stage3post.json
# This should be the final step in the boot process. Prior to this script
# running, we should have made sure that the disk is partitioned appropriately
# and mounted in the right places (one place to serve as a cache for Docker
Expand All @@ -33,10 +32,6 @@ METRO="${SITE/[0-9]*/}"
# Also, be 100% sure /sbin and /usr/sbin are in PATH.
export PATH=$PATH:/sbin:/usr/sbin:/opt/bin:/opt/mlab/bin

# Make sure to download any and all necessary auth tokens prior to this point.
# It should be a simple wget from the control plane node to make that happen.
LOAD_BALANCER="api-platform-cluster.${GCP_PROJECT}.measurementlab.net"

# Capture K8S version for later usage.
RELEASE=$(kubelet --version | awk '{print $2}')

Expand All @@ -52,9 +47,9 @@ mkdir --parents /etc/kubernetes/manifests

systemctl daemon-reload

# Fetch k8s token via K8S_TOKEN_URL. Curl should report most errors to stderr,
# so write stderr to a file so we can read any error code.
TOKEN=$( curl --fail --silent --show-error -XPOST --data-binary "{}" \
# Fetch k8s cluster join data from K8S_TOKEN_URL. Curl should report most errors
# to stderr, so write stderr to a file so we can read any error code.
JOIN_DATA=$( curl --fail --silent --show-error -XPOST --data-binary "{}" \
${K8S_TOKEN_URL} 2> $K8S_TOKEN_ERROR_FILE )
# IF there was an error and the error was 408 (Request Timeout), then reboot
# the machine to reset the token timeout.
Expand All @@ -63,10 +58,14 @@ if [[ -n $ERROR_408 ]]; then
/sbin/reboot
fi

kubeadm join "${LOAD_BALANCER}:6443" \
--v 4 \
--token "${TOKEN}" \
--discovery-token-ca-cert-hash {{CA_CERT_HASH}}
# $JOIN_DATA should contain a simple JSON block with all the information needed
# to join the cluster.
api_address=$(echo "$JOIN_DATA" | jq -r '.api_address')
ca_hash=$(echo "$JOIN_DATA" | jq -r '.ca_hash')
token=$(echo "$JOIN_DATA" | jq -r '.token')

kubeadm join "$api_address" --v 4 --token "$token" \
--discovery-token-ca-cert-hash "$ca_hash"

systemctl daemon-reload
systemctl enable kubelet
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[Unit]
Description=Initialize the kubernetes control plane
After=kubelet.service mount-cluster-data.service
After=kubelet.service mount-data-api.service
Requires=mount-data-api.service
# The presence of this file indicates that the cluster has already been created.
# If it exists, do not run this unit.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[Unit]
Description=ePoxy extension server
After=docker.service mount-data-api.service
Requires=docker.service mount-data-api.service

# Run the ePoxy extension server (supporting the ePoxy Extension API).
#
# Mount /opt/bin so that the container has access to kubeadm, and
# /etc/kubernetes so that kubeadm has access to admin.conf"
[Service]
TimeoutStartSec=120
Restart=always
ExecStartPre=-/usr/bin/docker stop %N
ExecStartPre=-/usr/bin/docker rm %N
ExecStart=/usr/bin/docker run --publish 8800:8800 \
--volume /etc/kubernetes:/etc/kubernetes:ro \
--volume /opt/bin:/opt/bin:ro \
--name %N -- \
measurementlab/epoxy-extensions:v0.4.0 \
-bin-dir /opt/bin
ExecStop=/usr/bin/docker stop %N

[Install]
WantedBy=multi-user.target
28 changes: 0 additions & 28 deletions configs/virtual_ubuntu/etc/systemd/system/token-server.service

This file was deleted.

22 changes: 13 additions & 9 deletions configs/virtual_ubuntu/opt/mlab/bin/create-control-plane.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,18 @@ k8s_version=$(kubectl version --client=true --output=json | jq --raw-output '.cl
# The internal DNS name of this machine.
internal_dns="api-platform-cluster-${zone}.${zone}.c.${project}.internal"

# If this file exists, then the cluster must already be initialized. The
# systemd service unit file that runs this script also has a conditional check
# for this file and should not run if it exists. This is just a backup,
# redundant check, just in case for some reason the file exists but the service
# unit gets run anyway. This happened to me (kinkade), where a small bug in the
# configurations caused this service to run, even though this file existed, and
# kubeadm overwrote that file and others before finally erroring out due a
# preflight check failure.
if [[ -f /etc/kubernetes/admin.conf ]]; then
exit 0
fi

# Evaluate the kubeadm config template
sed -e "s|{{PROJECT}}|${project}|g" \
-e "s|{{INTERNAL_IP}}|${internal_ip}|g" \
Expand Down Expand Up @@ -165,21 +177,13 @@ function initialize_cluster() {
gcloud compute project-info add-metadata --metadata "lb_dns=${lb_dns}" --project $project
gcloud compute project-info add-metadata --metadata "token_server_dns=${token_server_dns}" --project $project

# Add the current CA cert hash to the setup_k8s.sh script which physical
# platform nodes use to join the cluster, then push the evaluated template to
# GCS.
sed -e "s/{{CA_CERT_HASH}}/${ca_cert_hash}/" /opt/mlab/conf/setup_k8s.sh.template > setup_k8s.sh
cache_control="Cache-Control:private, max-age=0, no-transform"
gsutil -h "$cache_control" cp setup_k8s.sh "gs://epoxy-${project}/latest/stage3_ubuntu/setup_k8s.sh"

# TODO (kinkade): the only thing using these admin cluster credentials is
# Cloud Build for the k8s-support repository, which needs to apply
# workloads to the cluster. We need to find a better way for Cloud Build to
# authenticate to the cluster so that we don't have to store admin cluster
# credentials in GCS.
gsutil -h "$cache_control" cp /etc/kubernetes/admin.conf "gs://k8s-support-${project}/admin.conf"


# Apply the flannel DamoneSets and related resources to the cluster so that
# cluster networking will come up. Without it, nodes will never consider
# themselves ready.
Expand Down Expand Up @@ -219,7 +223,7 @@ function join_cluster() {
# Once the first API endpoint is up, it still has some housekeeping work to
# do before other control plane machines are ready to joing the cluster. Give
# it a bit to finish.
sleep 60
sleep 90

token=$(get_bootstrap_token)
ca_cert_hash=$(
Expand Down
78 changes: 46 additions & 32 deletions configs/virtual_ubuntu/opt/mlab/bin/join-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,36 @@ METADATA_URL="http://metadata.google.internal/computeMetadata/v1"
CURL_FLAGS=(--header "Metadata-Flavor: Google" --silent)

# Collect data necessary to proceed.
epoxy_extension_server="epoxy-extension-server.${project}.measurementlab.net"
external_ip=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/instance/network-interfaces/0/access-configs/0/external-ip")
hostname=$(hostname)
k8s_labels=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/instance/attributes/k8s_labels")
k8s_node=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/instance/attributes/k8s_node")
lb_dns=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/project/attributes/lb_dns")
project=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/project/project-id")
token_server_dns=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/project/attributes/token_server_dns")

# MIG instances will have an "instance-template" attribute, other VMs will not.
# Record the HTTP status code of the request into a variable. 200 means
# "instance-template" exists and that this is a MIG instance. 404 means it is
# not part of a MIG. We use this below to determine whether to attempt to
# append the unique 4 char suffix of MIG instances to the k8s node name.
is_mig=$(
curl "${CURL_FLAGS[@]}" --output /dev/null --write-out "%{http_code}" \
http://metadata.google.internal/computeMetadata/v1/instance/attributes/instance-template
)

# If this is a MIG instance, determine the random 4 char suffix of the instance
# name, and then append that to the base k8s node name. The result should be a
# typical M-Lab node/DNS name with a "-<xxxx>" string on the end. With this,
# the node name is still unique, but we can easily just strip off the last 5
# characters to get the name of the load balancer. Among other things, the
# uuid-annotator can use this value as its -hostname flag so that it knows how
# to annotate the data on this MIG instance.
node_name="$k8s_node"
if [[ $is_mig == "200" ]]; then
node_suffix="${hostname##*-}"
node_name="${k8s_node}-${node_suffix}"
fi

# Don't try to join the cluster until at least one control plane node is ready.
# Keep trying this forever, until it succeeds, as there is no point in going
Expand All @@ -39,52 +62,43 @@ done
# Wait a while after the control plane is accessible on the API port, since in
# the case where the cluster is being initialized, there are a few housekeeping
# items to handle, such as uploading the latest CA cert hash to the project metadata.
sleep 60

sleep 90

# Generate a JSON snippet suitable for the token-server, and then request a
# token. https://github.com/m-lab/epoxy/blob/main/extension/request.go#L36
extension_v1="{\"v1\":{\"hostname\":\"${hostname}\",\"last_boot\":\"$(date --utc +%Y-%m-%dT%T.%NZ)\"}}"

# Fetch a token from the token-server.
# Fetch cluster bootstrap join data from the epoxy-extension-server.
#
# TODO (kinkade): this only works from within GCP, so is not a long term
# solution. It is just a stop-gap to get GCP VMs able to join the cluster until
# we have implemented a more global solution that will support any cloud
# provider. Going through ePoxy will not work for VMs in a managed instance
# group (MIG), since siteinfo, ePoxy and Locate will only know about the load
# balancer IP address, not the possibly ephemeral public IP of an auto-scaled
# instance in a MIG.
token=$(curl --data "$extension_v1" "http://${token_server_dns}:8800/v1/allocate_k8s_token" || true)
# TODO (kinkade): here we are querying the epoxy-extension-server directly
# through the GCP private network. This only works from within GCP, so is not a
# long term solution. It is just a stop-gap to get GCP VMs able to join the
# cluster until we have implemented a more global solution that will support
# any cloud provider. Additionally, going through ePoxy will not work for VMs
# in a managed instance group (MIG), since siteinfo, ePoxy and Locate will only
# know about the load balancer IP address, not the possibly ephemeral public IP
# of an auto-scaled instance in a MIG.
join_data=$(
curl --data "$extension_v1" "http://${epoxy_extension_server}:8800/v2/allocate_k8s_token" || true
)

if [[ -z $token ]]; then
echo "Failed to get a cluster bootstrap join token from the token-server"
if [[ -z $join_data ]]; then
echo "Failed to get cluster bootstrap join data from the epoxy-extension-server"
exit 1
fi

# TODO (kinkade): this is GCP specific and will not work outside of GCP. This
# will have to be made more generic before we can join VMs from other cloud
# providers. A current proposal is to have the token-server return not only a
# token, but also the CA cert hash, but this has yet to be implemented.
#
# Fetch the ca_cert_hash stored in project metadata.
ca_cert_hash=$(curl "${CURL_FLAGS[@]}" "${METADATA_URL}/project/attributes/platform_cluster_ca_hash")
# $JOIN_DATA should contain a simple JSON block with all the information needed
# to join the cluster.
api_address=$(echo "$join_data" | jq -r '.api_address')
ca_hash=$(echo "$join_data" | jq -r '.ca_hash')
token=$(echo "$join_data" | jq -r '.token')

# Set up necessary labels for the node.
sed -ie "s|KUBELET_KUBECONFIG_ARGS=|KUBELET_KUBECONFIG_ARGS=--node-labels=$k8s_labels |g" \
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf

# Determine the random 4 char suffix of the instance name, and then append
# that to the base k8s node name. The result should be a typical M-Lab node/DNS
# name with a "-<xxxx>" string on the end. With this, the node name is still
# unique, but we can easily just strip off the last 5 characters to get the name
# of the load balancer. Among other things, the uuid-annotator can use this
# value as its -hostname flag so that it knows how to annotate the data on this
# MIG instance.
node_suffix="${hostname##*-}"
node_name="${k8s_node}-${node_suffix}"

kubeadm join $lb_dns:6443 --token $token --discovery-token-ca-cert-hash $ca_cert_hash --node-name $node_name
kubeadm join "$api_address" --v 4 --token "$token" \
--discovery-token-ca-cert-hash "$ca_hash" --node-name $node_name

# https://github.com/flannel-io/flannel/blob/master/Documentation/kubernetes.md#annotations
kubectl --kubeconfig /etc/kubernetes/kubelet.conf annotate node $node_name \
Expand Down
2 changes: 1 addition & 1 deletion configs/virtual_ubuntu/opt/mlab/bin/mount-data-api.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ dev_name=$(
if [[ -z $dev_name ]]; then
echo "Failed to determine the persistent disk device name"
exit 1
}
fi
dev_path="/dev/disk/by-id/google-${dev_name}"

# If the disk isn't formatted, then format it.
Expand Down
9 changes: 9 additions & 0 deletions packer/configure_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ set -euxo pipefail
# experiments.
mkdir -p /var/local/metadata

# If this directory doesn't exist, then the kubelet complains bitterly,
# polluting the logs terribly. On control plane nodes this will be created
# automatically by the kubeadm. The cluster kubelet config is generated by
# kubeadm when the cluster is initialized and stored as a configmap, which all
# other nodes will download and use. We can't remove staticPodPath from the
# kubelet config, because control plane kubelets use it, so just create the
# directory on every node to avoid log pollution.
mkdir -p /etc/kubernetes/manifests

# Enable systemd units
systemctl enable check-reboot.service
systemctl enable check-reboot.timer
Expand Down
3 changes: 1 addition & 2 deletions packer/configure_image_api.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ echo -e "\nexport KUBECONFIG=/etc/kubernetes/admin.conf\n" >> /root/.bashrc
systemctl enable docker
systemctl enable reboot-api-node.service
systemctl enable reboot-api-node.timer
systemctl enable token-server.service
systemctl enable bmc-store-password.service
systemctl enable epoxy-extension-server.service
systemctl enable mount-data-api.service
systemctl enable create-control-plane.service
10 changes: 1 addition & 9 deletions packer/configure_image_common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ sed -i -e '/secure_path/ s|"$|:/opt/bin"|' /etc/sudoers
# Install required packages.
apt update
apt install -y \
apparmor \
busybox \
conntrack \
containerd \
Expand Down Expand Up @@ -65,14 +66,5 @@ curl --silent --show-error --location \
# For convenience, when an operator needs to login and inspect things with crictl.
echo -e "\nexport CONTAINER_RUNTIME_ENDPOINT=unix:///run/containerd/containerd.sock\n" >> /root/.bashrc

# If this directory doesn't exist, then the kubelet complains bitterly,
# polluting the logs terribly. On control plane nodes this will be created
# automatically by the kubeadm. The cluster kubelet config is generated by
# kubeadm when the cluster is initialized and stored as a configmap, which all
# other nodes will download and use. We can't remove staticPodPath from the
# kubelet config, because control plane kubelets use it, so just create the
# directory on every node to avoid log pollution.
mkdir -p /etc/kubernetes/manifests

# Enable systemd units
systemctl enable kubelet.service

0 comments on commit ca6a0b1

Please sign in to comment.