Customize Azure Kubernetes Service Diagnostics for Azure Log Analytics

If you are using Azure Kubernetes Service you will also in many cases, be using Container Insights in combination with Kubernetes Cluster audit data, which allows for deeper insight into your Kubernetes environment and containers.

However, with the default settings, Container Insight and Kubernetes Audit is a data-hungry demon it seems.

If you have a Kubernetes Cluster consisting of 3 nodes with 20 pods, you have about this much data being generated each month using Container Insights.

Table Size estimate (MB/hour)
Perf 12.9
InsightsMetrics 11.3
KubePodInventory 1.5
KubeNodeInventory 0.75
KubeServices 0.13
ContainerInventory 3.6
KubeHealth 0.1
KubeMonAgentEvents 0.005

This adds up to a total of 31 MB/Hour = 23.1 GB/month (one month = 31 days) now if you have more pods/services running it also means that this will go through the roof in terms of data. In addition, you have the AKS default diagnostics settings 

cluster-autoscaler Understand why the AKS cluster is scaling up or down, which may not be expected. This information is also useful to correlate time intervals where something interesting may have happened in the cluster.
guard Managed Azure Active Directory and Azure RBAC audits. For managed Azure AD, this includes token in and user info out. For Azure RBAC, this includes access reviews in and out.
kube-apiserver Logs from the API server.
kube-audit Audit log data for every audit event including get, list, create, update, delete, patch, and post.
kube-audit-admin Subset of the kube-audit log category. Significantly reduces the number of logs by excluding the get and list audit events from the log.
kube-controller-manager Gain deeper visibility of issues that may arise between Kubernetes and the Azure control plane. A typical example is the AKS cluster having a lack of permissions to interact with Azure.
kube-scheduler Logs from the scheduler.
AllMetrics Includes all platform metrics. Sends these values to Log Analytics workspace where it can be evaluated with other data using log queries.

Where the big bad wolf is the Kube-Audit table that generated ALOT of log data.

If you have a cluster running you can use this query to look at the cost of data collected into the Log Analytics Workspace where your AKS cluster is reporting to

Usage | where TimeGenerated > startofday(ago(31d))| where IsBillable == true
| where TimeGenerated > startofday(ago(31d))
| where IsBillable == true
| summarize BillableDataGB = sum(Quantity) / 1000. by bin(TimeGenerated, 1d), DataType | render barchart

As you can see there is a lot of data that is also collected into Azure Diagnostics table as well (This is a picture below from my lab environment)

As seen here you can see most of the data is based upon the Azure Diagnostics table where all the Azure Kubernetes Diagnostics data will be collected. After looking into the different categories, you can also see how many events that the different audit events are collected.

The general recommendation is that you do not use kube-audit and push data to a Log Analytics Workspace but that you rather deploy that to an Azure Storage Account if needed and that you only enable kube-audit-admin (that will be excluding the get and list audit events compared to kube-audit events) 

Now still if we disable Kube-Audit log there will still be a lot of data that will be collected by the kube-audit-admin diagnostics settings. any way that we can reduce the cost even more? or optimize the collection of data by Container Insight?

When you deploy Container Insight and Log Analytics it will automatically deploy a ConfigMap that you can see using this command.

kubectl describe configmaps omsagent-rs-config --namespace=kube-system

That defined what kind of logs that will be collected by the agent. Now we can override the events that are collected by the agent by defining our own configmap that the agent will be using. This will allow us to for instance filter stderr and stdout per namespace or across the entire cluster, and environment variables for any container running across all pods/nodes in the cluster.

For instance, both stdout & stderr log collection are turned off for namespaces: ‘*_kube-system_*.log’ by default. Using this ConfigMap as an example I can filter out logs that should be collected by Log Analytics

Docker-Provider/container-azm-ms-agentconfig.yaml at ci_dev · microsoft/Docker-Provider (github.com)

Let’s use the example where I want to disable the collection of stdout events from a namespace called velero where I have the velero pod running. Then I just need to define under

[log_collection_settings.stderr]
enabled = true
exclude_namespaces = ["kube-system", "velero]

Within the example configmap, then apply the ConfigMap to exclude those events. You can also define other metrics as well, and the entire ConfigMap settings is listed here –> Configure Container insights agent data collection – Azure Monitor | Microsoft Docs

Once you have applied the ConfigMap, it will perform a rolling update on the pods and you can verify that the config map has been successfully applied by checking the logs on one of the pods, for instance on my pod

kubectl logs omsagent-8l4dh omsagent --namespace=kube-system

Then you can see it also within Container Insight in Azure

config::configmap container-azm-ms-agentconfig for agent settings mounted, parsing values

Then when this is applied it will not generate as much data consumption, of course doing that in addition to tweaking the audit settings to ensure that the data collection is kept at a needed level.

 

 

 

 

 

 

Leave a Reply

Scroll to Top