What does your hybrid cloud look like? and building a cloud exit plan

With the rise of organizations moving to public cloud, many also need to understand how to build and provide disaster recovery of their services outside of the public cloud vendor. Even with more of the hyperscalers now also providing new hybrid cloud services, what is the difference and what kind of functionality do each of them provide?

Lastly, many regulators within different sectors also recommend that many organizations create a cloud exit plan. Exit plans are an effective risk mitigation mechanism for extreme Cloud Service Provider’s (CSP) failures—not only technical failures but also strategic or commercial reasons—or other situations where the organization is unable or unwilling to continue using their existing CSP. The plan will help to prepare for business continuity by identifying, and in some cases also testing exit scenarios. This need for exit planning is now present in several regulations. Regardless of the choice of deployment model, exit plans that provide a 100 percent complete and fully tested failsafe are extremely hard and will be expensive to accomplish, making a pragmatic, risk-based approach that focuses on the most important business functions more appropriate. Fortunately, regulations allow this.

The essence of this blog post is to provide insight

1: What kind of options do you have to do cross mobility of services across the providers.

2: What kind of Hybrid Services do the Cloud Providers have and what functionality they provide.

3: What do you need to think about in terms of Disaster Recovery and Backup for the different providers, and how could you automate it?

4: How could you create an exit plan from a Public Cloud to a Private Cloud infrastructure (if possible)

Before we go into building different disaster recovery, hybrid cloud and or defining an exit plan we should first understand the main workloads within Public Cloud.

Delivery models and mobility within Public Cloud

When moving to public cloud you have different delivery models that you can choose from depending on service requirements. Within the different Public Cloud vendors, you can choose from a range of different services such as virtual machines, where the vendor provides the responsibility up to the virtualization layer, or Container-as-a-service where the vendor can manage the container environment and we are responsible for managing the containers and the applications.

Another example is PaaS services where we get access to a certain service or runtime, such as hosting a web application, database, or storage service. This means that the vendor will take more control and responsibility and we can use the services within the vendor framework.

Lastly is SaaS where we just get access to the application, a good example here is Office 365. With a SaaS delivery model, the vendor is responsible for the entire stack, and we are only responsible for the access (meaning identity access) to the application.

Regarding mobility of these services when we are building new services in public cloud, they all have pros/cons in terms of functionality, but they also have pros/cons in terms of mobility between services and platforms. Just to highlight an example.

We can move a virtual machine from on-premises to Microsoft Azure.
We cannot easily move a SQL Database from Azure SQL to Amazon SQL Services.
We cannot easily move from Office 365 to Google Workspace.
We cannot easily move from a Service Message Bus in Azure to another similar service in Google Cloud and vice versa.

NOTE: That some services can be migrated from different cloud providers, but it is dependent on the runtime/framework that is supported. And, VM migration can be bit of a hazzle, but it works in most cases. Many of the vendors provide built-in migration tools to migrate from various sources to Public Clouds.

Also, that workloads can be moved and abstracted away from the left side to the right-side delivery models.

Migrating from SQL Server to PaaS based SQL Service
Migrating from IIS Web Server to Web Application or App Service
Migrating from Exchange/SharePoint to Office 365

However, moving from public cloud PaaS & SaaS services to any service running on-premises is not an easy task, since that also means that you need to find a similar type of runtime to host the same services on-premises instead.

Also, that increasingly more services are built using new PaaS/FaaS (Function-as-a-service) delivery models which are not easily replicated on traditional virtual infrastructure in the same fashion or migrated from one cloud provider to another. Unless we use a framework like OpenFaaS which is not linked to a specific cloud platform.

Migrating services between vendors and delivery models

When it comes to migrating services between different vendors which can be part of our exit plan. Where we need to exit a cloud provider to either move to another vendor or back to an on-premises environment, how easy is it to move services and applications from one placement to another? The easiest way to showcase this is to make a table of the most common platforms and delivery models.

NOTE: it is impossible to cover all scenarios, but I wanted to highlight the main ones. Most migration tools cover the move from on-premises services to public cloud, but not the other way around.

VMware IaaS – Most Public Cloud’s provide the functionality to migrate a virtual machine from a VMware based infrastructure to public cloud. This means copying the VMDK files (from a hypervisor perspective) or doing an agent-based replication (with all the data along with it) This could be either to move virtual machines from existing on-premises VMware to a cloud based VM in Azure, Google, or AWS. Or you could use the VMware on Public Cloud offering from the same vendors as well. Moving from VMware-to-VMware based platform will require the least effort in terms of planning and migration. Most of the migration tools are built for migrating to Public Cloud but not the other way around, meaning that if you would need to restore virtual machines from public cloud back to a VMware based IaaS, which is not going to work directly since it is different virtual machine formats for virtual disks. Which is going to require some conversion. However, if you are using VMware on Public Cloud it also means that it provides a much easier capability to more virtual machine resource back and forth.
Hyper-V IaaS – Here is the same with Hyper-V, most cloud providers have some migration tool for migrating from on-premises Hyper-V to Public Cloud (but not to the same extent that as with VMware) Hyper-V also has its own virtual hard drive format which means that to move to a public cloud you need to either convert to use a migration tool. If you want to migrate back toe Hyper-V, then it again becomes a cumbersome task. Secondly, none of the cloud providers have a native Hyper-V service the only thing similar is that Microsoft is running Hyper-V in Azure, but still different disk formats that are supported. However, if running virtual machines in Azure you can download the VHD file that is being used and spin it up on a Hyper-V platform easily, but that is not for the other cloud providers. However, you can export from the different cloud providers the virtual hardddrives using different disk formats Export your Google Cloud instances in one command | by Fabio Ferrari | Medium
Public Cloud IaaS – If you have a set of virtual machines within Public Cloud you can export all virtual machines using several types of disk formats which can then be used to rehost virtual machines on another platform. Still, you would need to make sure that you do proper retooling of any virtual machine since all cloud providers and virtualization platforms have in-guest tooling to allow certain service communication and addons to run. This also means that you would need to redesign the network and other shared services that you might be using from one cloud provider.
Google Workspace (SaaS) When it comes to Google Workspace, there are some free migration tools that can migrate data from existing on-premises collaboration tools such as Exchange. As seen here –> Migrate your organization’s data to Google Workspace – Google Workspace Admin Help There are also 3. party tools that can assist with moving data to Google Workspace. However, moving all the data out of Workspace is not a simple task that would require a lot of time and effort to export all the data from the different services by user-initiated moving. Still, you would require retraining of end-users and adjust for instance identity service and end-user machines to work with another service, such as Office 365.
Office 365 (SaaS) When it comes to Office 365, there are free migration tools from Microsoft that can migrate data from existing on-premises collaboration tools such as Exchange. There are also many 3. party tools that can assist with moving data to Office 365. However, moving all the data out of Office 365 is not an easy task.
Public Cloud PaaS – Regarding PaaS services, the mobility between different vendors is dependent on the service. For instance, if you are using Azure Functions with PowerShell which is not offered on AWS or GCP it means that you would need to rewrite or rehost those services in another manner. The same applies for if you are using Oracle RDS on AWS which is not available as a service on other platforms it means that you would need to rehost that service. Another one is the use of Kubernetes and Containers which can provide a bit more mobility given that it is containers running underneath and is something that all provides have as a similar service. There are of course differences underneath on how you control network flow, storage provisioning, and such but the ability to just run the services is a bit simpler when you have containers. There are also several types of storage options available, where many of them have an export option (not directly to another cloud provider but to a local drive) or that you would need to build custom scripts to handle the export of data from one service to another.

Also, another aspect is that when you are building a platform in one of the cloud providers you are can be using a lot of the centralized services for identity, networking, storage, security, management capabilities which you would need to replace or rebuild if you needed to move to another cloud provider or move back to an on-premises private cloud environment.

Hybrid Services from Cloud Vendors

For many companies either because of compliance reasons, or other reasons cannot take full benefit of the public cloud (can also be because of disaster recovery, latency, governance) reasons as well. The approach that the cloud providers are building now is providing their services within customers own datacenters or even edge capabilities so that the cloud services are running on your own infrastructure. Which can also be an option for Disaster Recovery purposes, so in that sense you can reuse existing capabilities and in case of a failure you can run the services within your own datacenter. Or that you just need the service close.

The different Cloud providers have several types of hybrid services. Some are aimed at providing local processing capabilities such as doing ML or running local data services but still using the same management plane within the public cloud. We also have some hardware-based approaches where the cloud providers are delivering hardware with a predefined set of capabilities such as running locally virtual machines, data services.

One big trend we also seeing more of is

1: Kubernetes anywhere – Where the cloud providers are building a Kubernetes Management plane that sits within the cloud provider, but the workload and compute can be run “anywhere” which provides consistent management and flexible placement of the workloads.

2: Cloud-based PaaS services delivered as a Container-based delivery on-premises – As an example with both Google Anthos Run and Azure Arc Data Services, where they are now providing PaaS services running on top of a Kubernetes environment. This provides with easier ability to do workload scaling and provide running updates to the environment.

3: Cloud-managed hardware – We are also seeing that the different cloud providers are now also building customized hardware such as with AWS Outposts, Azure Stack Edge, where the management plane is linked to Public Cloud to make it easier for them to provide updates automatically.

4: Local processing – Latency is a killer for many workloads so, therefore, there needs to be edge-based data processing for many workloads so many of the cloud providers are providing edge-based computing either to do data processing, IoT, Analytics or Machine Learning but powered by cloud services.

So, what kind of Hybrid Services do we have from the different Cloud Providers? here are some of the Hybrid Services from the different providers which can be anything from API Management, ETL services, or Container-based workloads. This means that many of these services can be run within your own data center.

Building DR and Backup for Public Cloud

Building Backup and DR for Cloud Based services requires that you understand how each cloud service is built and its limitations. Let me illustrate with an example, where we might have a mix of virtual infrastructure and some PaaS services like storage, SQL, and some data lake service for analytics purposes. In Azure we have like the other cloud providers, much of the services have their own backup and or Disaster Recovery features.

Coming from an on-premises environment where you might have a virtualized backup/disaster recovery tool you would need to mix different built-in tools from the PaaS service to provide the same level of backup and disaster recovery.

Now could we use the same tools to restore our services on-prem? this is just data, but the services only support the cloud platform they are on. So, if you are using Azure Backup, you can only restore on Azure. There are some 3. party backup vendors that can backup virtual machines on the public cloud and do restore on another platform, but that is only limited to virtual machines. That means that you need to have some data logic to back up the other data components, the second issue that arises then is all the data egress cost.

The same logic arises when working with multi-cloud. What if you wanted to provide cross-disaster recovery across Azure and AWS? Because those might be the closest datacenters available.

Using native features there is no effective way to provide DR Across. However, if you were using VMware on Azure and VMware on AWS then you have the same compatibility level and could use traditional backup/DR tools to replicate across.

How to bring “our stuff” back?

In case of a cloud exit, how can we bring our stuff back and restore it either within our own data center or within another cloud provider?

IaaS either needs to be downloaded as an open VHD format that can be reused by the new virtualization layer or that we have a backup solution that can backup/restore on another virtualization layer.
PaaS services is really depending on what kind of service you are using. Data services such as S3, Blob Storage or Data Cloud Storage which is just storage containers you can download the raw data and put it elsewhere.
If you are using other PaaS services where you are leveraging the SDK and logic from the Cloud provider, such as Functions, Lambda or even Logic Apps you would need to convert the logic to use another SDK on another platform. Depending on the complexity and integration that you have with other PaaS services on the same platform this might not be an easy task. You can in most cases be able to export the actual data that is being stored within the cloud PaaS data services.
However, if you have built your services using Cloud-Native services. According to the definition from CNCF ”

Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

This will provide a higher level of mobility across different platforms. However, there are certain elements that you might still need to have to provide availability which is cloud provider-specific which is network elements such as virtual network, firewalls, and load balancing mechanisms.

No silver bullet?

So how do we handle a Cloud Exit? is it even possible to do it in a feasible (and without it cost a high amount) way? Not really but here are some pointers that you should take into consideration in terms of Cloud Exit and secondly building a disaster recovery for your cloud-based services.

Disaster Recovery / Building for resiliency

Understand the SLA and Availability of your Cloud Components (all different cloud services such as VM’s, Database Services, Virtual Networks, and such) have their own defined SLA. important to understand as well what kind of requirements that the cloud providers must be able to fulfill that SLA.
Look at what kind of availability options the different cloud services have. It is not helping if you have some services that are geo-redudant but one core component is only available within a single region. So, map availability for each component to understand what would happen if a region went down?
So, what about services that cannot be easily configured as cross region redundant? Define your cloud infrastructure as code to allow easy rebuild of services in another region in case of failure. Also, look at setting up failover tests to build everything from scratch to determine how long it takes.
Data also needs to be taken into consideration; how do we handle data availability across? Can we use built-in replication options, or do we need to configure some data sync service? Most of the cloud providers have built-in data replication services or CLI tools that can be used to synchronize data from one service to another or from one region to another. Some tools even support from one cloud provider to another.
Understand if your services have some hard dependencies to services that you cannot control like Azure Active directory or others, what will happen if one of those services would suffer an outage.

Designing for Cloud Mobility / Cloud Exit

Look at using Cloud Native Services compared to Cloud PaaS services. This requires more management of additional workloads instead of using Cloud PaaS services like databases but provides more flexibility/mobility.
Define your cloud as Infrastructure as Code, this doesn’t mean that you can reuse the code for another platform but provides you a documented setup of how it is built and allows you to reuse that logic elsewhere and building it with a different contexts. In another context, you could have a similar setup on another data center for DR purposes where you have customized blueprints or templates.
Look at services that are used and how you can export data and how could you reuse it? If you are using S3 for instance, there are numerous partners that provide the same S3 API on-premises based storage solutions, however for other cloud-based storage providers you might not have the same options. For instance, if you are using a Cloud native analytics platform can you reuse the same setup on-premises or elsewhere?
One final aspect is also competency. If you decide that you need to move out one of platforms, then it means that you also are changing the toolbox and core competency so ensure that you have that before switching to an entirely new platform.