Upgrade Azure Kubernetes Service using Terraform

With Azure Kubernetes Service, Microsoft is constantly developing the service to follow the release cycle of Kubernetes, with an updated version coming every 3. months it means that it requires a lot of upgrading of the Kubernetes instances to be on a supported version.

Microsoft has a list here of the release calendar Supported Kubernetes versions in Azure Kubernetes Service – Azure Kubernetes Service | Microsoft Docs

You can also see the release notes of each AKS version here –> Releases · Azure/AKS (github.com) which defined what’s new as well with each release. Now if you want to be running a supported version it is important that you update and Microsoft states that

  • Users have 30 days from version removal to upgrade to a supported minor version release to continue receiving support.

Starting with Kubernetes 1.19, the open-source community has expanded support to 1 year. AKS commits to enabling patches and support matching the upstream commitments. For AKS clusters on 1.19 and greater, you will be able to upgrade at a minimum of once a year to stay on a supported version.

In terms of upgrading, it means that you need to do a rolling upgrade so jumping to another version directly is not supported you will need to upgrade to each minor version.

  • 1.12.x -> 1.13.x: allowed.
  • 1.13.x -> 1.14.x: allowed.
  • 1.12.x -> 1.14.x: not allowed.

If you are using Terraform to manage your cluster, you can automatically upgrade from one version to another if you have the following resource defined.

resource "azurerm_kubernetes_cluster" "aksresource" {
  name                = "name"
  location            = azurerm_resource_group.aksrg.location
  resource_group_name = azurerm_resource_group.aksrg.name
  dns_prefix          = "q-aks"
  node_resource_group = "nonprod-aks-np01-rg"
  sku_tier            = "Paid"
  kubernetes_version = "1.21.2"

By changing the Kubernetes version from one to another, Terraform will automatically change the version SKU which will trigger the update change from Microsoft.

In the latest AKS Provider for Terraform you also have the option to define automatic Kubernetes upgrade with the following command

automatic_channel_upgrade = patch

Possible values can be patch, rapid, node-image and stable. Omitting this field sets this value to none. It should be noted that Cluster Auto-Upgrade only updates to GA versions of Kubernetes and will not update to Preview versions.

  • Patch – automatically upgrade the cluster to the latest supported patch version when it becomes available while keeping the minor version the same.
  • Rapid – automatically upgrade the cluster to the latest supported patch release on the latest supported minor version.
  • Stable – automatically upgrade the cluster to the latest supported patch release on minor version N-1, where N is the latest supported minor version.
  • Node-Image – automatically upgrade the node image to the latest version available.
Maintenance Window

Planned Maintenace can also be defined on AKS clusters using Terraform

maintance_window {
 allowed {
 day = "friday"
 hours = [1,6]
 }
}

It is important however that when the upgrade is running that you have enough node capacity to handle the upgrade and ensure that your applications will be running. By default, AKS configures upgrades to surge with one additional node. This enables AKS to minimize workload disruption by creating an additional node before the cordon/drain of existing applications to replace an older versioned node.

So, for instance, having a max surge value of 100% will provide the fastest upgrade process but means that all nodes will be drained simultaneously. The best approach would be to have a controlled upgrade process. Microsoft recommends having a surge value of 33%. 

Surge value can also be configured using Terraform

    upgrade_settings {
      max_surge = "30"
      } 
  }
Setting up Kured for AKS

While Microsoft will automatically patch Kubernetes and the worker nodes will automatically get security patches from Ubuntu, it will not automatically restart nodes that require start after patches. Therefore, we also recommend that setting up Kured (Kubernetes Reboot Daemon)

AKS node update and reboot process with kured

That will deploy a Deamonset that will monitor for the presence of a reboot sentinel file e.g. /var/run/reboot-required

Kured can be deployed using a Helm Chart, where the pods should be deployed into a separate namespace and define nodeselector since Kured does not work on Windows.

helm repo add kured https://weaveworks.github.io/kured
kubectl create namespace kured
helm install kured kured/kured --namespace kured --set nodeSelector."kubernetes\.io/os"=linux

Now you can test this by adding a reboot-required file to one of the AKS nodes, in order to do this you would need to login into one of the AKS worker nodes using SSH (following these instructions –> SSH into Azure Kubernetes Service (AKS) cluster nodes – Azure Kubernetes Service | Microsoft Docs

Then create the file using

sudo touch /var/run/reboot-required

Then exit the pod and wait for the kured daemon to trigger a reboot.

kubectl get nodes -w

You can also define a custom schedule since it by default can restart on any given day. This can be done by changing the default download manifest and once you are done with defining the schedule (or using Helm args)

  --reboot-days=sat,sun
  --start-time=9am
  --end-time=5pm
  --time-zone=UTC

That way you have more control up the restarting window.

 

Leave a Reply

Scroll to Top