Microsoft has always been heavily committed to the datacenter and has a large footprint with Hyper-V & System Center. If we dial the clock back five years, Microsoft released Azure Pack, which was their first big attempt at a complete integrated private cloud offering and which was meant as a private offering of Microsoft Azure. As time progressed Microsoft also introduced the CPS (Cloud Platform Systems) together with Dell → http://www.dell.com/en-us/work/learn/microsoft-cloud-platform-system-powered-by-dell
The negative aspect of Azure Pack and CPS was that it was too integrated with other products such as System Center, and was still restricted to the traditional three-tier architecture with Compute, Storage, and Networking as separate parts of the infrastructure. This coincided with the emergence of a lot of the other hyper converged infrastructure platforms as well.
Microsoft has now been aiming Azure Stack to be the true next generation enterprise private cloud platform, and has been pushing development on the platform for almost three years now since the announcement at Microsoft Ignite in 2015 and will hopefully be GA Mid-CY17.
So what features are included in Azure Stack so far and what do we know that is coming to the platform during the course of the year?
Features:
Virtual Machines
Storage Accounts
Virtual Networks
Load Balancer
Network Security Groups
DNS Zones
Azure Functions
Web Applications
SQL
Marketplace with syndication option
KeyVault
VPN
ARM functionality
Azure Pack Connector
- BlockChain templates*
CloudFoundry templates*
Mesos templates*
Service Fabric*
Azure Container Service*
* Post GA
Much of the core concepts behind Azure Stack is to have a consistent experience between public Azure and Azure Stack, therefore all features and services will be identical to their counterparts in public Azure. So if a feature is added to Azure Stack it will have the same “look and feel” as the feature has in public Azure. So from a developer standpoint this will translate into smaller changes if you have applications or ARM templates that are being used for public Azure today to be able to use these for Azure Stack. The only thing that is limiting this for now is the support for newer API versions of ARM on Azure Stack.
Moving on the purpose of this post is to explore the different building blocks that make up Azure Stack, from the virtualization layer up to the different platform services themselves to try to explain how it all fits together.
Lifecycle management
Azure Stack will come as a bundled platform from the OEM vendor which is a set of certified servers from the vendor. Azure Stack cannot be installed on any type of infrastructure. The reason behind this logic is that Microsoft wants to take total responsibility of the lifecycle management of the platform as well as ensuring optimal performance. So if one of the OEM vendor releases a firmware update, BIOS update or any update to the hardware Microsoft wants to ensure that the upgrade process goes as smooth as possible and that the patch/firmware has been prevalidated in testing. In order to do this Microsoft needs to set certain limitations to the hardware vendors to ensure that they can maintain control of the hardware.
Azure Stack is split into different infrastructure roles which has its dedicated area of responsibility such as networking ,storage and compute.
Source: https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-architecture
From a tenant perspective they interact using the different APIs that are available from Azure Resource Manager (ARM). Now ARM which is available using REST API’s can be triggered either from using the Web portal or using for instance CLI tools like Azure CLI, Powershell or using the SDK. Now depending on what the end user does to trigger an request it will be forwarded to the broker and will then be forwarded to the responsible resource provider. The resource provider might be the networking provider if the tenant wants to create a virtual network or the compute provider if the tenant wants to provision a virtual machine.
The oveview architecture of the Azure Stack is split into Scale Units (Which is a set of nodes which makes out a Windows Failover Cluster as is a fault domain) Then we have an Azure Stack Stamp which consits of one or more Scale Units. Then we have one or more Azure Stack stamps which can be added to a region. By default in GA we are limited to 12 nodes which consists of 3 Scale units and 4 nodes in each scale unit.
Core philosophy – Software Defined Everything
At its core Azure Stack consists of a hyper converged platform running Windows Server 2016 from one of the four OEMs (Dell, Cisco, HP or Lenovo). The purpose with a hyper-converged setup is that you have a server with local disks attached which then are connected together and make a distributed filesystem. This is not a unique feature for Microsoft, there are many vendors in this marketspace already like Nutanix, VMware and Simplicity but all have different approaches on how they store and access their data.This hypercoverged setup also comes with other features like auto-tiering, deduplication and having these features only in software makes this a software-defined architecture.
It should be noted that since it is a hyper converged setup, the compute will always scale up with the storage attached to it, since this is the current setup with Storage Spaced Direct as of now. Another thing to be clear about is that Azure Stack has a current limitation at GA to scale up to 12 nodes in a single region as mentioned above, there is more content on that here –> https://azure.microsoft.com/mediahandler/files/resourcefiles/ebb2fd25-06ec-476b-a29a-bca40f448cf6/Hybrid_application_innovation_with_Azure_and_Azure_Stack.pdf
Storage Spaces Direct
The bare-metal servers are running Windows Server 2016 with Hyper-V as the underlying virtualization platform. The same servers are also running a feature called Storage Spaces Direct (SPD) which is Microsoft’s software-defined storage feature. SPD allows for the servers to share internal storage between themselves to provide a highly-available virtual storage solution as base storage for the virtualization layer.
SPD will then be used to create a virtual volume with a defined resiliency type (Parity, Mirrored, Two-way mirror) which will host the CSV shares and will use a Windows Cluster role to maintain quorum among the nodes.
SPD can use a combination of regular HDD disks and SSD disks (Can also be all-flash) to enable capacity and caching tiers which are automatically balanced so hot data is placed on the fast tier and cold data on the capacity tier. So when a virtual machine is created and storage is placed on the CSV share, the virtual hard drive of the VM is chopped into interleaves of blocks which by default are 256KB and is then scattered across the different disks across the servers depending on the resiliency level.
Overview of Storage Spaces Direct and Hyper-V in a Hyper Converged setup
SPD uses many of the different features in the file protocol SMB 3 to provide highly-redundant paths the storage among the nodes. It also utilizes another part of the SMB 3 protocol which is SMB direct.
SMB Direct is Microsoft’s implementation of remote direct memory access (RDMA) which gives direct memory access from the memory of one computer into that of another without involving either one’s operating system. With this feature it provides low latency, high throughput connections between each server in the platform, without putting a lot of strain on the CPU on the servers, since it is essentially bypassing the operating system when it moves data. It is important to note however that SPD does not provide data locality and a virtual machine running on any node can request data blocks from all disk across the entire cluster, so this puts a lot of strain on the network therefore RMDA is an important aspect, but ill get back to that.
Storage Spaces Direct by default uses the ReFS file system, which has some enhancements compared to NTFS. . It works proactively to do error correction, In addition to validating data before reads and writes, ReFS introduces a data integrity scanner, known as a scrubber. This scrubber periodically scans the volume, identifying latent corruptions and proactively triggering a repair of corrupt data.
ReFS also introduces a new Block cloning API which accelerates copy operations & also sparse VDL allows ReFS to do zero files rapidly. You can read more about the mechanims behind it here –> https://technet.microsoft.com/en-us/windows-server-docs/storage/refs/block-cloning
SMB Direct with RDMA
To be able to use RDMA based connections between the hosts you need to have specific network equipment both on the adapter side and the leaf/spine switch configuration. There are different implementations of RDMA but the most common ones are RoCE, iWARP or Infiniband. Now with RDMA as mentioned effectively bypassing much of the operating system layer to transfer data between nodes, which has a negative effect on any QoS policies since it is hard to implement OS based QoS when we are bypassing it. This is where Data Center Bridging (DCB) comes in, it provides hardware-based bandwidth allocation to a set of specific type of traffic and is there to ensure that the SMB traffic does not hog of all the available bandwidth on the converged NIC team.
Azure Stack also uses a new virtual switch called SET (Switch Embedded teaming) which is a converged NIC teaming feature and combined the Hyper-V virtual switch in the same logical entity on the operating system. This also provides high availability for the physical network layer, so for instance if a NIC on a particular node should stop working, traffic would still be active on the other NICs on the node
So far we have covered the hyper converged setup and the physical networking aspect, what about the virtual networking layer?
Network virtualization
In order to have a fully cloud platform you need to abstract away the physical network as well and move toward using network virtualization so be able to fully automate the tenant configuration. In the early days of Azure Pack Microsoft used a tunneling protocol called NVGRE. This protocol encapsulated IP packets within a GRE segment which allowed to go away from the restrictions that traditional layer two networking had with for instance limited VLAN space and having tenants with overlapping IP ranges.
The issue with NVGRE is that traffic was encapsulated using GRE which in essence is a tunneling protocol developed by Cisco. The negative part about GRE is that it makes it difficult for firewalls to inspect the traffic inside the GRE packets. Microsoft decided therefore with Windows Server 2016 to focus on supporting VXLAN instead, which is now the default network virtualization protocol in Azure Stack. The positive part about VXLAN is that is more widely used by other vendors such as Cisco, Arista, VMware NSX, OpenStack and such. It also uses UDP instead of GRE as its tunneling protocol which allows for lower overhead and makes packet inspection much easier.
VXLAN allows each tenant to have the same overlapping IP segment for instance 192.168.0.0/16, and it will also associate all tenant traffic with a VNI which is a unique identifier for that specific tenant, in pretty much the same matter that VLAN expect that this is completely virtualized and does not involve the switches as well. The switches in a VXLAN setup only see the server IP address and not the tenant specific IP address inside the VXLAN packet. Using this VNI we know which traffic is from a specific tenant and that VNI is used to interact with other of the tenant resources across other nodes as well.
Now the tunneling protocol is one part of the puzzle. The second part is adding NFV (network functions virtualization) which adds functionality to the virtualized network and this is where the distributed firewall and the software load balancer.comes in.
Distributed Firewall
Showing how the distributed firewall is being controlled by the Network Controller
The distributed firewall feature is a virtualized network feature which runs on each of the hyper-v vswitches in a Azure Stack environment. The feature runs regardless of the operating system inside the guest virtual machine and can be attached to a vNIC directly, or to a virtual subnet. Unlike traditional firewalls which acts as a security layer between subnets the distributed firewall acts as a security layer directly to a VM or to a subnet. Now in Azure Stack the distributed firewall feature is presented as network security groups inside the platform. This feature allows for most of the basic stateless access list configurations, IP, PORT & PROTOCOL (Source & Destination) and does not replace in stateful or packet inspection firewall
Software load balancer
Software load balancer combined with Azure Stack
The software load balanced also is a feature which is running on the hyper-v switch as a host agent service, and is also managed centrally by the network controller which acts as a central management for the network. The load balancer works on layer two and is used to define a public IP with a port against a backend pool on a specific port. The software load balancer is load balancing using (DSR) direct server return which means that it only load balances incoming traffic and the return traffic from the backend servers are going directly from the server back to the requesting IP address via the Hyper-V switch. This feature is also presented in Azure Stack as the regular load balancer.
The Brains of the network – Network Controller
To ensure that the software load balancing rules are in place and that the distributed firewall policies are synced and maintained and of course when we have VXLAN in place all the hosts needs to have a IP table so each node knows how to communicate with all the different virtual machines on different hosts. This is where there needs to be a centralized component in place which takes care of that and that is the network controller.
On Azure Stack the network controller runs as a highly available set of three virtual machines which operates as a single cluster across different nodes. The network controller has two API interfaces, one which is the northbound API which accepts requests using REST API, so for instance if we go and change a firewall rule or create a software load balanced in the Azure Stack UI the Northbound API will get that request. The network controller can also be integrated with System center but that is not a part of Azure Stack.
Network Controller architecture – With Azure Stack
The southbound API will then propagate the changes the different virtual switches on the different hosts. The Network controller is intended to be a centralized management component for the physical and virtual network since it uses the Open vSwitch standard, but the schema it uses is still lacking some key features to be able to manage the physical network.
The Network Controller is also responsible for managing the VPN connections and advertisement of the BGP routes and maintaining sessions states across the hosts.
So this summarizes some of the capabilities in Azure Stack and how it interacts with the different components and what the platform underneath contains with some of the services. There are of course more features here which I have not elaborated on here such as the storage options it presents, but you can read more about it here –> https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-architecture
There will be more information available on this subject as we get closer to GA from Microsoft and can go more in-depth of the technical architecture on a full platform from top to bottom.