As many have said before me, moving virtual infrastructure to public cloud does not make any changes to what kind of responsibility you have as a customer in regards of management of those virtual machines. A Couple of weeks ago I did a presentation at the Nordic Virtual Summit about Security in Microsoft Azure, therefore I wanted to share some tips and best practices when it comes to how to do research on Azure environments and what kind of security mechanisms that should be in place to ensure that you have visibility. Some of the content is to show what kind of data that can be collected from the different services.
NOTE: Link to the presentation can be found here –> nvsummit/presentations at main · msandbu/nvsummit (github.com)
So where would you start to look for clues to what has been going on in an Azure environment? If we look at some of the management capabilities from Azure perspective, you have a lot of different services that can provide insight into what has been going on.
Looking at this overview there are a lot of different services that can provide me with some insight into what is going on. An example here I might be getting some information about an ongoing attack to one of my resources, what kind of service would provide me with the full picture?
Let me provide you with an example if we are getting some malicious traffic against RDP from a certain IP address
- Azure Defender: Malicious traffic from 18.104.22.168 (Marked as Malicious from Threat Intelligence API)
- NSG Flow Logs: Traffic from 22.214.171.124 going to
IP 126.96.36.199 on Port 3389 and was allowed
- VMConnect Using VM insight: svchost.exe accepted connection
on port 3389 currently established
- Security Event Logs: Successful logged on AD user
with username domain\administrator from IP
So to figure out what happened I need multiple data sources to get the full picture. However, most of these services are not enabled by default, hence when logging into an Azure environment with none of these services enabled you have limited information to work based out of.
Some of the log sources that are available in an Azure environment.
NOTE: If you are using Azure AD Free you only have audit and sign-in logs available for 7 days
|Audit Item||Category||Enabled by Default||Retention|
|User Activity||Microsoft 365 Security||No||90 Days (1 year for E5)|
|Admin Activity||Microsoft 365 Security||No||90 Days (1 year for E5)|
|Mailbox Audit||Exchange Online||Yes||90 Days|
|Sign-In Activity||Azure AD||Yes||30 Days (AAD P1)|
|Users at Risk||Azure AD||Yes||7 Days (30 Days, P1/P2)|
|Risky Sign-ins||Azure AD||Yes||7 Days (30 Days, P1/P2)|
|Azure MFA Usage||Azure AD||Yes||30 Days|
|Directory Audit||Azure AD||Yes||7 Days (30 Days, P1/P2)|
|Intune Activity Log||Intune||Yes||1 Year (Graph API)|
|Azure Resource Manager||Azure||Yes||30 Days|
|Network Security Group Flow Logs||Azure||No||Depending on Configuration|
|Azure Diagnostics Logs||Azure||No||Depending on Configuration|
|Azure Application Insight||Azure||No||Depending on Configuration|
|VM Event Logs||OS||Yes||Size defined in Group Policy|
For instance, all changes to an Azure environment would be stored in the Azure Activity Log under each subscription (but only for 30 days) so if the change happened some time ago, it would be difficult to get information about who made the change. First of we need to understand who has been working inside the environment.
Azure AD Logs
Within Azure Active Directory there are a couple of different log sources that we can investigate to discover if for instance there has been a compromised account that has been accessing the environment. The log sources are split into two.
- Activity – Sign-in logs, Audit Logs and Provisioning Logs
- Security – Risky sign-ins and Users flagged for risk
Now both the Sign-in and audit logs does support configuring integration to Log Analytics using “Export Data Settings” however if this is not configured in the current tenant you would need to download the log files from each of the sources.
For instance, under the Azure AD Sign-in logs you now have different filtering views to show differences between user-signs both interactive and non-interactive and service principals. The various sources which the view reflects can be shown here –> Azure Active Directory sign-in activity reports – preview | Microsoft Docs
(NOTE: There is 2 – 5 minute delay before log data shows in the Azure Portal)
You also have the audit log which shows changes to Azure Active Directory, this can be changed to Conditional Access or even synchronization changes from Azure AD Connect.
Also, within the Usage and Insight pane you can also see the usage of different service principals.
Now you can access these logs programmatically as well using PowerShell, you need to use the latest AzureADPreview Module.
Install-module AzureADPreview Get-AzureADAuditDirectoryLogs -all Get-AzureADAuditSignInLogs -all Get-AzureADUser -all (Useful to get list of all Azure AD Users) Get-AzureADServicePrincipal -all
Also, you can export these data to a centralized Log Analytics Workspace. This to allow for centralized log management.
Here is an example cost table showing the cost of storing data in Log Analytics depending on the amount of users.
|Log category||Number of users||Events per day||Volume of data per month (est.)||Cost per month (est.)||Cost per year (est.)|
|Audit||100,000||1.5 million||90 GB||$1.93||$23.12|
|Sign-ins||100,000||15 million||1.7 TB||$35.41||$424.92|
If there is a specific Error Code, you can look up the error code here to get more information –> https://login.microsoftonline.com/error
Azure Resource Manager Logs (and deployments)
The Azure Activity Log shows all activities that have been done to the environment. Regardless of if it his a delete/create/update log. It to stores data for 30 days about activities. However, if there are changes that have been made longer then 30 days it will not appear in the portal. You here also have the option to collect data using the diagnostics settings. If that has not been enabled, well you have one option.
If there is a particular resource you want to look for changes that have been going on, you can check the resource group and investigate the deployment logs. For instance each Resource Group Deployment will store information about resources that have been changed.
So If I click on deployments I can see the deployment names. It does not give me any indication about who has done the deployment, but I can see changes to the enviroment. One thing to note here is that this does only reflect changes to the environment that has been done using ARM. If deployments have been done using Terraform/Pulimi it will not show here also it does not reflect individual changes outside of the RG.
Now digging into the Azure Activity log is a bit time consuming to understand what specific changes that have been made. You also have a new option called Application Change Analysis which will show all azure monitor activities in a graphical view. NOTE: This does not support all resources, but as in preview it can also see changes that have been done within a web application as well.
Note that since this is powered by the Activity Log it also can only show the last 30 days of events.
Virtual Machine Monitoring
When it comes to monitoring a virtual machine in Azure, there are so many options to do Monitoring of the virtual machine. Many of these addons are extensions(solutions) that you install on Log Analytics which can then provide more insight to the virtual machine (as an example)
Some of the solutions that you can install on Log Analytics to provide more log collection/monitoring
- Sentinel (Security Event Collection and SOAR)
- Update Management (Patch Management of Virtual Machines)
- Anti Malware (Monitoring endpoint protection for virtual machines)
- Change & File Tracking (Monitoring file/registry/software changes on virtual machines)
Looking at Change and file tracking this is a bit overlapping with Azure Defender for Endpoints and Servers, but it can collect information about files, software, registry and even Windows Services.
- DNS Analytics (Monitoring DNS server for queries to malicious domains)
- VM Insight (Using ServiceMap, used for monitoring processes and network traffic on VM’s) displayed a bit further down here.
In addition to this you also have Azure Defender which provides a lot of monitoring capabilities
- EDR for Servers
- IPFIX Metadata monitoring for malicious traffic
- Qualys Vulnerability scanning for 3. party applications for Servers.
This is an example of the IPFIX metadata that Azure Defender is collecting to show suspicious traffic outbound from the VM.
If you have Azure Defender enabled, you can also see some security alerts related to resources in the Azure Portal.
For Windows Servers as well and enabled with Azure Defender with EDR you will also be able to collect some more data.
- DeviceInfo – Machine information (including OS information)
- DeviceNetworkInfo – Network properties of machines
- DeviceProcessEvents – Process creation and related events
- DeviceNetworkEvents – Network connection and related events
- DeviceFileEvents – File creation, modification, and other file system events
- DeviceRegistryEvents – Creation and modification of registry entries
- DeviceLogonEvents – Sign-ins and other authentication events
- DeviceImageLoadAvents – DLL loading events
Also this can showcase some of the predefine alerts/detections that Azure Defender can do –> Reference table for all security alerts in Azure Security Center | Microsoft Docs
Network Security Flow Logs
Network Security Flows are similiar to NetFlow logs and collect all traffic go trough an NSG (regardless if it is allow or deny)
NSG Flow Logs Attribute
- Resourceid (NSG ID)
- Rule ( NSG Regel)
- TrafficFlow (Inboudn or Outbound)
- Traffic Decision (Allow or Denied)
- Flow State ( C eller E)
NSG Flow logs can then be enhanced with Traffic Analysis which ingest data from NSG Flow logs which are stored on Storage Accounts, then into Log Analytics and then enriched with more data points.
Some metadata the Traffic Analysis adds to the data that it collects from the NSG Flow Logs.
1: Flow Type
3: Use of Azure PaaS services
4: Azure Region
5: Malicious Flow
NOTE: Malicious Flow can be seen in Log Analytics using this query.
AzureNetworkAnalytics_CL | where SubType_s == 'FlowLog' and FlowType_s == 'MaliciousFlow' and (FASchemaVersion_s == '1' or FASchemaVersion_s == '2') | where AllowedInFlows_d > 0 | project FlowDirection_s, FlowType_s, FlowCount = max_of(AllowedInFlows_d + DeniedInFlows_d, AllowedOutFlows_d + DeniedOutFlows_d), AllowedInFlows_d, DeniedInFlows_d, AllowedOutFlows_d, DeniedOutFlows_d, InboundBytes_d, OutboundBytes_d, InboundPackets_d, OutboundPackets_d, SrcIP_s, DestIP_s, L4Protocol_s, L7Protocol_s, DestPort_d, NIC1_s, NIC2_s, VM1_s, VM2_s, Subnet1_s, Subnet2_s, Country_s, Region1_s, Region2_s, Subscription1_g, Subscription2_g, NSGRule_s, NSGList_s, TimeGenerated, TimeProcessed_t
Virtual Machine Insight
The last part is VM insight, which is a new monitoring option in Microsoft Azure. VM Insight is also an extension to Log Analytics and is also using another VM extension called Service Map which is used to collect information about the processes and network connections for a VM in Azure.
When using VMInsight it will also collect more information which will be stored in Log Analytics such as
- VMBoundPort – Shows process name and Port with IP
- VMComputer – Information about virtual machine
- VMConnection – Show Process, Source and Destination IP and Port with protocol
- VMProcess – Shows process and commandline arguments such as PowerShell and Script.
An example of VM Process data collection
Security Events for Virtual Infrastructure
Within Microsoft Azure, there are two ways to collect Security Events from Virtual Machines (Windows) they can either be collected by having Azure Sentinel enabled or having Azure Defender enabled.
When collecting security events, you have three distinct levels to choose from. Either minimal, common or all Security Events. –> Connect Windows security event data to Azure Sentinel | Microsoft Docs
If this is not enabled, security events are not collected.
So, what kind of data do you need to get visibility into your environment in case something happens? It is important to understand that the different services provide certain sets of data and they provide some context to what is going on? Some best practices from me from a monitoring/security/analytics perspective.
- Have Diagnostics enabled for all Azure related services in production
- Have Azure AD and Azure Activity Log Collected into a Centralized Log Analytics Workspace
- Have Log data collected into a centralized Log Analytics Service for other production workloads
- Have NSG Flow Logs and Traffic Analysis for Public Facing Services
- Have at least the Common level set when it comes to integrating Security Event Logs –> Connect Windows security event data to Azure Sentinel | Microsoft Docs
- Have All VM based resources should be protected with Azure Defender
- Have Qualys Vulnerability Management in place using Azure Defender integration
- Have Use Update Management or Auto manage for virtual infrastructure this also enables change tracking
- Have Table based access to Log Analytics / Sentinel
- Have Diagnostics enabled on Log Analytics to have audit trail of how has queried for data
- Have the Security Graph API integrated to your existing monitoring/ITSM system