A new dilemma that has come up the last couple of weeks with regards to COVID and more organizations now moving and starting to use public clouds to provide new solutions for employees to work from home, is the use of Disaster Recovery and public cloud.
Now the title is a bit misleading, but looking at the news highlights the last couple of weeks is that Microsoft has seen a big surge in the use of Azure and Office 365 https://www.forbes.com/sites/martingiles/2020/03/30/microsoft-cloud-service-775-percent-rise-covid-19/ and because of this Microsoft has placed restrictions on Azure, and this was to ensure that they could prioritize certain customers such as
- First responders
- Emergency routing and reporting applications
- Medical supply management and delivery systems
- Emergency alert applications
- Health-bots, health screening applications, and websites
- Health management applications and record systems
Which makes sense, to ensure that these critical systems/applications were working and having enough resources. For non-prioritized customers they have now seen issues with now having enough capacity or not having access to starting up cold applications or systems.
Now this got me thinking, does Microsoft or other cloud platform reserve capacity for DR purposes? Many business are using the built-in geo redudancy solution in Azure on storage level or even using Azure Site Recovery in Azure to failover to another region in case of disaster on their virtual infrastructure. What if in the midst of this one of the other Azure regions went down because of an outage?
As an example both North and West Europe (which are geo paired regions) and also the common use-case of ASR and both were at 75% capacity and suddenly, North Europe went down. Does companies relying on ASR to provide DR would not been able to power on the services in West Europe because of the restrictions, the same would apply if West Europe did not have available resources.
Now as part of ASR you are not guaranteed the capacity to run your workload in another region, you are guaranteed that replication is running as part of the RTO that is defined and that the service itself is running. This is unlike most Disaster Recovery solutions where you are often running a cold site where you have enough capacity to run your critical solutions where you might have some small solutions running such as Active Directory and such to provide simpler DR failover.
So this might not be an issue for small companies, but if the following might occur where one of the regions would suffer and outage and with the current restrcitions it would mean that non-prioritized customers would be stuck with not being able to power on resources in another region. So this is something that you from a Business Continuity you need to plan and consider for.