Upgrade your node machines for running kubernetes cluster in GKE without downtime

This blog helps you to migrate workloads running on a GKE cluster to a new set of nodes within the same cluster without incurring downtime for your application. Such a migration can be useful if you want to migrate your workloads to nodes with a different machine type.

By default, GKE creates a node pool named default-pool for every new cluster:


gcloud container node-pools list --cluster <your cluster name>

Output:


NAME          MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
default-pool  n1-standard-1  100           1.15.9

To introduce instances with a different configuration, such as a different machine-type or different authentication scopes, you need to create a new node pool.

The following command creates a new node pool named higher-pool with three high cpu instances with n1-highcpu-2 machine type (a larger machine type than the GKE default n1-standard-1):


gcloud container node-pools create higher-pool \
  --cluster=<your cluster name> \
  --machine-type=n1-highcpu-2 \
  --num-nodes=3

Your cluster should now have two node pools:


gcloud container node-pools list --cluster migration-tutorial

Output:


NAME          MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
default-pool  n1-standard-1  100           1.15.9
higher-pool   n1-highcpu-2   100           1.15.9

You can see the instances of the new node pool added to your GKE cluster:


kubectl get nodes`

After you create a new node pool, your workloads are still running on the default-pool. Kubernetes does not reschedule Pods as long as they are running and available.

Run the following command to see which node the pods are running on.


kubectl get pods -o=wide

To migrate these Pods to the new node pool, you must perform the following steps:

  • Cordon the existing node pool:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do kubectl cordon "$node"; done

  • Drain the existing node pool:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); \
 do kubectl drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node"; done

Once this command completes, you should see that the Pods are now running on the higher-pool nodes:


kubectl get pods -o=wide

Once the Kubernetes reschedules all Pods to the higher-pool, you can delete the default pool


gcloud container node-pools delete default-pool --cluster <your cluster name>

Now you should have a single node pool for your container cluster, which is the higher-pool:

gcloud container node-pools list --cluster <your cluster name>