Deep dive Azure Monitor and Log Analytics

I’ve been wanting to write this for a while. With more and more organizations now using Log Analytics services in Azure because of usage together with Azure Sentinel, using Desktop Analytics or just for diagnostics for PaaS services that are running in Azure. I’ve seen a lot of organizations just setting up multiple Log Analytics workspaces, then the question comes up should I have one or multiple Log Analytics Workspaces? how does the service work? why does it take so long time before data appears within? What is the difference between the different agents? Therefore I wanted to write a bit deep-dive walktrough of Log Analytics and other modules which are connected with it such as Sentinel and Azure Monitor.

Before we start with the best-pratices and design let us get a better understanding of the common components that we are dealing with first.

1: Common components

This picture below is aimed as an high-level perspective of the different components within Log Analytics surrounding services such as Sentinel and Azure Monitor.

Full resolution –> https://i.ibb.co/mNMdkN8/image.png

So within Log Analytics we have something called a Log Analytics workspace which is essentially a database which contains data. Log Analytics workspace is created within a specific region and has a specific retention time which defines how long data should be stored within the log analytics workspace (database). By default this is 30 days, but can be configured to be as long as 730 days

All the data that is stored within a workspace is read-only and cannot be modified. All the actions it allows us to collect new data or data that can also be purged usign the Purge API –> https://docs.microsoft.com/en-us/rest/api/loganalytics/workspacepurge/purge

This is typically for GDPR purposes where you need to remove certain parts of the data.

The Workspace consists of different tables with store different types of data depending on the data source. By default all the tables has the same type of retention as the workspace, but is also customizible so that you can have different types of retention of the different tables –> https://msandbu.org/changing-log-retention-on-a-specific-table-in-log-analytics/

You can also use this query to look at the different Log Analytics Tables that is collecting data.

search "*" | summarize count() by $table | sort by count_ desc

Log analytics has different forms of data that can get ingested into the database. Some are based upon agents installed on machines that push data or that using API integrations from Azure PaaS services or custom API which pushes data into the Data collector. You also have the concept of Solutions which are extensions to Log Analytics which can provide extended functionality to the data that is collected to a Workspace. one of these solutions can be Azure Sentinel or Network Performance Monitor which uses Log Analytics to store data. There are also other solutions which integrate with Log Analytics such as

ITSM Connector
Service Map
You can see a list of all the solutions here –> https://docs.microsoft.com/en-us/azure/azure-monitor/monitor-reference

NOTE: Some solutions also have their custom time intervall when they collect data such as Windows Update Analytics which collects every 24 hours. Some solutions also require some custom configuration and might also require more agents installed at the target to be able to collect the required data or perform the required actions.

When data is uploaded to Log Analytics there are different mechanisms in place that determine how quickly data will become available or visible.

Certain solutions also have different upload intervalls. As an example when data is collected it will be placed in queue and stored on a temporary storage solution first so that Microsoft can ensure that all data will be written once the service has enough capacity to process the data this is because of the surge protection that is in place. If Log Analytics is processing a new data source, the service must first create a new target table within the database. Once the data is written to the database it also takes some time before the data is indexed properly and visible within the queries.

You can also use this sample query to determine how long latency is affecting the data that is coming in.

AzureDiagnostics | where TimeGenerated > ago(8h) 
| extend E2EIngestionLatency = ingestion_time() - TimeGenerated
| extend AgentLatency = _TimeReceived - TimeGenerated 
| summarize percentiles(E2EIngestionLatency,50,95), percentiles(AgentLatency,50,95) 
by ResourceProvider

Some data can also be configured to be exported to another data source. For some organizations where you might be using Splunk or another SIEM tool you need to export data from Log Analytics to that 3party tool, you also have an option to configure data export which is a newly introduced feature which you can read more about here –> https://msandbu.org/azure-log-analytics-data-export/

2: Agent and Agent Architecture

Log Analytics can also collect data from virtual machines / physical machines that have an agent installed. This agent can also be known as the MMA agent. When installing the agent you need to have a workspace ID and a Key which is used to authenticate the agent to the workspace. Once connected, the agent will download all configurations that is connected to the workspace. This works a bit different for Azure Virtual machines since here we use an Azure Monitor agent which acts a bit different.

The Log Analytics agent is based upon System Center Operations Manager arcthictecture and downloads the centralized configuration using Management Packs (which contains the solutions, the data sources and such) and data is uploaded compressed based upon the data sources that is configured in Solutions and data sources that are defined within the workspace.

Agents can either communicate directly or using a proxy known as a Log Analytics Gateway which will proxy all traffic from agents internally.

For non-supported operatingsystems where you have for instance syslog supported you can also configure a linux based VM with the log analytics agent to forward data using rsyslog –> https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-sources-syslog

Now by default when you configure a solution for a workspace, all agents which are connected to the workspace will recieve the solution (for Log Analytics based agents)

This might not be something you want to have on for instance test/dev based machines, even if you want to have the log analytics agent installed. You also have the option to use something called Solution Scope Configuration which allows you to configure a scope for which the solution will apply to. The reason behind this is that some solutions will generate a lot of data and with this you can configure a machine group which the solution should apply for.

3: Azure Monitor vs Log Analytics agent

Now as Log Analytics has evolved, so has the agents. The former Log Analytics agent which Microsoft currently has available (which is also based upon the SCOM architecture) will be replaced with a new agent called Azure Monitor which is default for all virtual machines in Azure which are reporting to Log Analytics.

As it is now, the Azure Monitor agent is currently in Preview and will replace the Log Analytics agent for both Windows and Linux machines eventually. A difference is also how you do management of Azure Monitor Agents. Since with it you can also configure Data Collection Rules where you define what kind of data each kind of agent is collecting instead of defining that on a workspace level. Azure Monitor agents are not supported for Window Server 2008 as well, the latest version is Server 2012.

There are however some difference in terms of functionality for each of the agents currently, which you can read more about here –> https://docs.microsoft.com/en-us/azure/azure-monitor/platform/agents-overview#summary-of-agents secondly the Azure Monitor agent uses a System Managed Identity instead of a workspace key to authenticate to Azure Monitor and upload data (events logs and metrics)

The Azure Monitor agent will also be embedded with Azure Arc.

Azure Migrate also uses the agent to collect information about resources on-premises which is then uploaded to Azure Monitor.

4: Resource vs Workspace based access vs Table level based access

By default Azure Log Analytics has a access type called (default after march 2019) called Use resource or workspace permissions. This means that users that have access to a certain resource in Azure for instance Web Apps which are uploading diagnostics data into Azure Log Analytics can view logs for the service even if they don’t have access to the workspace object. This allows the central IT team to have full access to all logs and data, and still providing the application teams access to their resource logs without access to other data sources.

Another options is defining table level based access. This means that you only provide access to certain tables within the workspace. This is done using custom roles where you define the tables as part of the resource type as such

"Actions": [ 
"Microsoft.OperationalInsights/workspaces/read",
"Microsoft.OperationalInsights/workspaces/query/read",
"Microsoft.OperationalInsights/workspaces/query/Heartbeat/read",
"Microsoft.OperationalInsights/workspaces/query/AzureActivity/read" 
],

This can be useful for providing some resources access only to certain data that is being collected into Log Analytics.

5: Log Analytics and Solutions

When using Log Analytics together with Solutions as described earlier which is using a System Center Operations Manager architecture, when adding new solutions you are essentially updating the centralized management which is then downloaded to agents which are then uploading new data based upon the new solution.

There are a bunch of solutions which can be really useful together with Log Analytics.

Azure Security Center – Threat detection and also collect Security Events from Machines as part of the configuration.
ITSM Connector – Used for integration of Log Analytics with 3.party ITSM tools. Used to automatically create incidents or work items when Alerts are created within Log Analytics. Such as System Center Service Manager or Service Now.
Azure Sentinel – SIEM and SOAR solution which does analytics against the data collected within Log Analytics.
Microsoft Intune – Collects diagnostics data from Intune into Log Analytics. This also includes audit logs and changes of data from Intune.
Network Performance Monitor – NPM allows for monitoring of the network between two endpoints using additional agent on each side, and also allows for service monitoring such as probing external services from an agent.
Office 365 – Collects audit logs, only available as part of a connector within Azure Sentinel. This solution was removed on October 31, 2020.
Azure Automation – This feature is integrated with Log Analytics and provides both Change tracking feature and update management capabilities. Can read more about change tracking here –> https://docs.microsoft.com/en-us/azure/automation/change-tracking/manage-change-tracking. When Azure Automation Update Management is enabled, any machine connected to your Log Analytics workspace is automatically configured as a system Hybrid Runbook Worker.

6: Azure Resources and Diagnostics

All resources have the ability to configure diagnostics settings, essentially allowing logs to from each resources be sent to a centralized data source. This can either be

Log Analytics Workspace
Event Hub
Storage Account

The example below shows some of the different data sources the different services can send. This does not just apply to Azure PaaS services, but also other resources such as.

Azure Resource Manager Activity Log
Azure Active Directory Audit and Sign-in logs
Azure Log Analytics Search History as shown here –> https://msandbu.org/audit-log-analytics-history/

It is important that diagnostics is enabled for all resources, so that you have centralized monitoring and logging capabilities. Here is a good starting point for defining an Azure Policy which can apply diagnostics settings for all your current and future resources –>

https://github.com/JimGBritt/AzurePolicy/tree/master/AzureMonitor/Scripts

7: Retention and Azure Storage extension

By default the cost for Azure Log Analytics (including Sentinel) is determined by the amount of data that is ingested and also the retention time. However Log Analytics and Sentinel have a cap when it comes to how long it can store data. There is however with the new export option now possible to export security logs and other data to a storage account using the new built-in feature.

Also this means that we can keep certain event tables for longer period of times. Some tables might only contain performance metrics or verbose logging of services which might not be something worth storing over a long period of time.

So if we for instance would like to store all Security Events for 24 months, this would cost us about 121$ for 60 GB per months, while exporting the same data to a Storage Account for longer period of time with the same size would cost us about 20$ (using data lake gen 2 storage account) as seen here –> https://msandbu.org/azure-log-analytics-data-export/ (Just remember there are some limitations to supported tables that can be exported)

With data stored within a Storage Account as well, then can still be access from a Log Analytics Kusto Query.

Using the externaldata operator in Kusto you can query storage account data directly as well. This operator can either lookup data stored within an publicly available storage account or other available data sources. Some examples here on how it can lookup external data storage

https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/externaldata-operator?pivots=azuredataexplorer

8: Azure Arc and Azure Monitor?

With Azure Arc for servers, it provides capabilities for both management, policy control and monitoring of virtual machines. One example is that Azure Arc provides Azure Policy functionality for virtual machines. With Azure Arc 1.0 it also supports Azure Monitor agent

https://azure.microsoft.com/en-us/updates/new-azure-monitor-agent-and-data-collection-rules-capabilities-released/

So this allows you to install the Azure Monitor agent on Azure Arc enabled servers as well

New-AzConnectedMachineExtension -Name AMAWindows -ExtensionType AzureMonitorWindowsAgent
 -Publisher Microsoft.Azure.Monitor -ResourceGroupName <resource-group-name>
 -MachineName <virtual-machine-name> -Location <location>

With Azure Arc, the service also created an managed identity for the server as well which means that it will communicate with the Azure AD identity to the Log Analytics workspace instead of a workspace ID and Key.

9: Azure Log Analytics and Private Link

By default if you have Azure Monitor or have Log Analytics agents installed on on-premises machines it will communicate with the public FQDN’s of the service and not trough any VPN or ExpressRoute connection your organization might have against Azure. It is however possible to connect using your private connections trough Private Link now. There are however some caveats.

You need to use the latest version of the Log Analytics agent for Windows or Linux.
Log Analytics Gateway does not support Private Link.

Private Link configuration for Log Analytics can also be used to determine where operators can actually run queries from.

If you want to use private link however for on-premises resources you need to be aware that you need to update the DNS Zones to point to the internal FQDN of the service as well. When creating a private link, Azure will automatically create an private zone which contains the A-record for the private link.

10: Best pratices for design and setup

So the final aspect is when it comes to my best pratices for deployment of Log Analytics workspaces or use of Log Analytics in general for any deployment.

Use as few workspaces as possible – Previosly you might needed to have multiple because of retention and cost for performance metrics but now with new built-in functionality you should have one which contains all logs/activities
Use one for each region – Because for egress cost and secondly for latency, there might also be different compliance demands per country or governing laws.
Use Table level retention – This to have more control of the cost, you don’t need performance metrics for the last 90 days?!
Use Azure Policies both for installing Monitoring agents and enable diagnostics for all Azure related services that are in use – This is not just for security purposes but also for alerting of azure related outages.
Setup alerting of common events both for security but also for operations perspective – Most have log analytics enabled and are collecting data, but few spend time to create alerting and monitoring rules for the services so that they can apply automation or get notified if something happens. This can also be automated using Terraform or Azure Resource Manager, so that way you can easily define new alerts when you want to get notified of an outage or a change.
Spend time looking at the log sources and managing cost – Many organizations after setting up Log Analytics is just ingesting data without using the data that is coming in or controlling how much data that is being stored. Some services might collect a lot of verbose data which you might not need or want at all. Have someone that can have responsibility for following up what kind of data that is collected and making changes accordingly. Also now with Azure Monitor agents and data collection rules we can define more proper rules for which data should be collected and where they should be stored.
Define proper RBAC and Table level based access control – In most cases you might be collecting a lot of sensitive data or access logs which should not be accessable for most operators. With log Analytics and or Sentinel you can now define proper RBAC for most operators to ensure that they can only access certain data.
For long term retention move data to an Azure Storage Account
Pay attention to the updates! There is a lot of updates and changes happening to Log Analytics and Azure Monitor, which can be viewed here –> https://azure.microsoft.com/nb-no/updates/?category=management-tools

So this was the first part going into a more deeper insight of the architecture and setup of Azure Monitor and Log Analytics, in the second part I will go more into automation setup of the alerting and some recommendations in terms of alerts.