Azure Resource Manager Rate limiting and a hint of DDOS?

Ever gotten this error message when trying to update a resource in Azure?

We recently encountered this in one environment where we had multiple virtual machines within a single subscription, and we couldn’t perform any operations on the VMs.

Within Azure, there are different limits within a subscription on how many API calls you can make.

By looking at the documentation this is the current limit that is set for a subscription regarding read and write API calls per hour.

Scope	Operations	Limit
Subscription	reads	12000
Subscription	deletes	15000
Subscription	writes	1200
Tenant	reads	12000
Tenant	writes	1200

So, for instance, if you have a service principal as part of a CI/CD pipeline, that SPN can do 1200 writes to a subscription within one hour, which might sound like enough right?

Within that environment, we noticed that Azure was pushing out HTTP 429 Too Many Requests, while this pushed us into the idea that there was some rate limiting going on, we wanted to verify if this was causing the issue.

A Simple way to verify how many read/writes you have as part of your hourly quota is using this command.

az group list –verbose –debug

Then within the reply to you will be able to see the rate limit and remaining quota

DEBUG: cli.azure.cli.core.sdk.policies: ‘x-ms-ratelimit-remaining-subscription-reads’: ‘11999’

If then using the same command to create a resource group as well, you can also view the remaining write commands

Command: az group create -n myresourcegroup –location westus –verbose –debug

Reply: DEBUG: cli.azure.cli.core.sdk.policies: ‘x-ms-ratelimit-remaining-subscription-writes’: ‘1199’

While Azure Resource Manager is the front door to all the other resource providers, it has its own rate limiting for native ARM calls. There are also other limits for resource providers which are not directly documented.

In the case that I was involved in, we saw that the ARM API limit was not reached, still, we were getting 429 HTTP errors.

Then we noticed in the Activity log of the subscription that there were a lot of VM Run Commands that were attempting to run, however, the machine that the VM Run Commands were being sent to was offline so it would never work.

Then we thought that it might be a rate limit also for the compute resource provider. So, I recreated the scenario, using a virtual machine that has its own MSI and access to the subscription to see if I could find the rate limit for the compute resource provider.

# First create an Access Token

$token = Invoke-RestMethod -Headers @{“Metadata”=”true”} -Method GET ‘http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/’

# Use the Access token to create an authorization HTTP Header

$authHeader = @{
‘Content-Type’=’application/json’
‘Authorization’=’Bearer ‘ + $token.access_token
}

# Define an URL Path that needs to be pointed to my test environment to trigger the VM run command

$urlrun = “https://management.azure.com/subscriptions/subid/resourceGroups/dresourcegroup/providers/Microsoft.Compute/virtualMachines/machinename/runCommand?api-version=2021-11-01”

# Run the script command 1000 times
$n = 0
While($n -lt 1000)
{
Invoke-RestMethod -Method ‘Post’ -Uri $urlrun -Body “{`”commandId`”:`”RunPowerShellScript`”,`”script`”:[`”Restart-Computer -force`”]}” -ContentType “application/json” -Headers $authHeader
$n++
}

After a while, I noticed the HTTP results changed in PowerShell. So, now I know that the limit is for 240 API calls for the Compute resource provider.

{
“code”: “TooManyRequests“,
“message”: “{\”operationGroup\”:\”UpdateVM3Min\”,\”,\”allowedRequestCount\”:240,\”measuredRequestCount\”:565}”,
“target”: “UpdateVM3Min”
}
],
“innererror”: {
“internalErrorCode“: “TooManyRequestsReceived”
},
“code”: “OperationNotAllowed“,
“message”: “The server rejected the request because too many requests have been received for this subscription.”
}

Now, this has some bad side effects as well. While this script was running under an MSI within a single subscription. My other account was not able to make any changes to other virtual machines within the same subscription.

This means that this limit applies to the entire subscription regardless of SPN or Account. So, if someone manages to compromise your environment and just uses that simple script that I made above you would not be able to shut down your VMs or make any changes to resources under the compute resource provider.

So how can I detect that if we are getting issues with rate limiting? Checking the Azure Activity Log (which of course should be collected into a central Log Analytics Workspace

AzureActivity
| where parse_json(Properties).activityStatusValue == “Failure”
| where parse_json(Properties).statusCode == 429

You can, of course, use the same logic to “block” out others from other resource providers as well such as networking and such which would block out other legitimate users from the same subscription.

This is another good reason you should separate workloads into multiple subscriptions to ensure that a faulty script or logic does not affect other services within the same environment.

Share this:

Leave a Reply Cancel reply