Within Microsoft Azure there are numerous storage options which can be used for different workloads such as Container workloads for stateful storage, or big data solutions which requires high IOPS for calculation of workloads or even traditional blob storage which is useful for backup or archive data. Even sometimes you would also need to build your own storage solution using 3.party storage solution running on IaaS. To be honest I have missed having a good overview of the different storage options so therefore I decided to write this post as a starting point to give an overview. When considering storage options in Azure there is also other aspects one need to take into consideration
- Limitations (IOPS / Troughput / Size)
- SLA Level for Service
- Integration with other Azure Services (Data Factory, Azure Data Share, Kubernetes, Virtual Networks)
- DR and Backup options
- Protocol support and workload use-case
- Logging and Security Mechanisms
These are some of the points that I wish to highlight when I go into each of the different storage options in this blog post and finally I will try to have table which summarizes the different storage options. The following storage services will be highlighted as part of this post. Some are standalone services and some are part of another service but has some dedicated use-cases that I wish to highlight.
- Azure Blob Storage
- Azure Files
- Azure File Sync
- Azure NetApp Files
- Azure Managed Disk
- Azure Shared Disk
- Azure HPC Cache
- Azure Data Lake
- Azure Data Box
Azure Blob Storage
Azure Blob Storage is the native cloud storage offering within Azure, which provides API based access to the storage solution and is also the same solution that poweres Azure Data Lake Storage as well. Blob Storage is split up into a storage account which is then configured with one or more containers which then contains content.
When a Blob Storage account is created is has by default a public FQDN within the https://storageaccount.blob.windows.net URL. Blob Storage comes in different flavours and tiers. (NOTE: I’m only going to cover General Purpose v2, since v1 is the legacy version) As part og GPv2 It supports 3 acces tiers. Hot, Cold and Archive and data can be automatically moved across tiers. Both Hot and Cool Tier supports 99,9% SLA and also supports GRS (geo redudant storage) in combination with LRS (Local redudant storage).
Blob Storage also has another offering called Premium Blob storage which only supports LRS or ZRS (Zone redudant) redudancy, where data is stored on SSD backend storage compared to standard blob storage, but does not support any of the other access tiers.
By default Blob Storage is only available using the API but in preview now it can also support NFS v3 you can sign up for the preview here –> https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR2Hac0C7FxRCrNVIXjVHNppUQkNVMElIRloyWVlSOUQ5RVMwOFlMNEJUQyQlQCN0PWcu
Also Blob Storage will also come with a new feature called Object Replication, where you can define what kind of data should be replicated across to other regions. Since when you setup GRS based storage accounts all data will be replicated across to another region (You can read more about Object replication in the video below)
Blob Storage can also be configured to be behind an Private Endpoint which allows the Storage Account to only be available for private connections trough the virtual network.
In terms of DR and Redudancy, it is depending on what kind of tier that is choosen storage account
Blob Storage supports integration with Data Factory, and is also one of the supported options which can be used together with Azure Data Share. Azure Data Share allows you to share data externaly with other organizations.
Since Blob Storage is the cloud native storage solution in Azure it is also what 3.parties tend to integrate with to do data offloading, or for storing backup data (Such as Veeam, Cohesity, Rubrik). Support for Blob Storage is also integrated as part of Azure Data Box Edge to support hybrid cloud storage. Using this in combination with Data Box Edge will allow you to publish an SMB or NFS based share within your own datacenter but data will be automatically tiered between local data stored on the edge and to the blob storage account. We can also integrate Logic Apps to do automatic to a Azure Storage Blob as well.
Interacting with Blob Storage can also be done using tools such as AzCopy and Azure Storage Explorer
Now as default there is no direct option to define backup of data within Blob Storage Account, but there are different options that can be used to setup a backup solution here –> https://azure.microsoft.com/en-in/blog/microsoft-azure-block-blob-storage-backup/
Azure Files
Azure Files is an alternative to Azure Blob Storage, but can reside within the same storage account. All data within Azure files is seperate and only available to the Azure Files service.
Azure Files is an storage solution which presents out data using SMB protocol, either trough SMB 2.1, 3 or the File Rest API. Now unlike a regular SMB based file share, files does not support NTFS ACL’s and therefore you cannot directly define ACL on the file shares. A Azure File Share can be deployed into two different tiers either standard or premium. The total capacity for premium and standard file shares is 100 TiB.
Azure Files can also be configured to be used together with Azure Active Directory for authentication but it requires Azure Active Directory Domain Services you can read more about how to set it up here –> https://msandbu.org/getting-started-with-azure-ad-auth-with-azure-files/
Now even if Azure Files does not support traditional NTFS based ACL’s you can still use it for certain certified solutions such as a providing a storage backend for Failover Clusteres for your SQL Servers –> https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-portal-sql-create-failover-cluster-premium-file-share?toc=%2fazure%2fstorage%2ffiles%2ftoc.json
Azure Files also support Private Endpoint which allows us to lock down access to the file share only from a virtual network. Files are also supported for Data Factory as a Source & Sink to allow for copy of data between different sources and Files.
File Shares are also available to do backup of using Azure Backup –> https://docs.microsoft.com/en-us/azure/backup/backup-afs
Now even if Azure Files only support SMB (2.1 and v3) as of now there is development from Microsoft to support NFS 4.1 coming soon, and also coming with the support for Active Directory integration with on-premises trough VPN coming as well.
Azure File Sync
Azure File Sync is an extension of Azure Files, but provides Storage Tiering, Replication of data between different branches. Essentially you setup Azure File Sync between two servers and all data that is defined will be replicated between two those servers as a sync group.
The advantage of Azure File Sync is that is supports NTFS ACL’s and can also replace DFS-R and can also work in conjunction with DFS-N.
Azure NetApp Files
Azure NetApp files? It’s basically a managed file service in Azure, which can provide either a NFS v3 or SMB v3 based file volume using NetApp’s own filestructure and hardware from within the Azure Datacenters. Which then can be accessed from within virtual networks in Azure. In many cases you might have applications or services that are dependant on using NFS based storage, such as HPC based workloads, SAP or Container based applications.
Azure Files this also supports NFS and also native AD based integration for SMB 3 based authentication (So no Azure AD DS required and more native integration) Now since this is a new feature it is not yet supported by using for instance Azure Backup, If you want to have backup of a volume that is running on NetApp Files you would need to create a snapshot of a volume either using REST or the UI in the portal –> https://docs.microsoft.com/en-us/rest/api/netapp/snapshots/create
The performance tier is split into three different options and performance is calculated based upon service level
STANDARD | 16MB / second throughput per provisioned TB
PREMIUM | 64MB / second throughput per provisioned TB
ULTRA | 128MB / second throughput per provisioned TB
So essentially it would translate to this
100 GB = Standard tier = 1.6 MBps
100 GB = Premium tier = 6.4 MBps
100 GB = Ultra tier = 12.8 MBps
Another cool thing with Azure NetApp files is the support for NetApp Trident –> https://netapp-trident.readthedocs.io/en/stable-v18.07/kubernetes/index.html (https://netapp-trident.readthedocs.io/en/stable-v19.07/kubernetes/operations/tasks/backends/anf.html) Trident integrates natively with Kubernetes and its Persistent Volume framework to allow for provisioning and management of volumes from ANF directly from Kubernetes.
Azure Managed Disk
Azure Managed Disk is the underlying virtual disks which are attached to virtual machines. Managed Disks consist of different options depending on if you require performance or cheap storage for your virtual machines.
The biggest different between the options here is 1: Speed 2: Integation 3: SLA. The table below summarizes it good enough.
Standard HDD |
Standard SSD |
Premium SSD |
Ultra Disk |
Ephemeral OS Disk |
|
Max size |
32 TB |
32 TB |
32 TB |
64 TB |
4 TB |
Max performance |
500 MiB/S |
750 MiB/S
|
900 MiB/S
|
2,000 MiB/S |
Really high |
SLA for VM |
N/A |
N/A |
Yes |
N/A |
No |
Replication |
LRS / ZRS |
LRS / ZRS
|
LRS / ZRS
|
LRS |
No |
Backup |
Yes |
Yes |
Yes |
No |
No |
Snapshot |
Yes |
Yes |
Yes |
No |
No |
Bursting |
No |
No |
Yes |
No |
No |
Resizeable |
Yes |
Yes |
Yes |
Yes |
No |
It should also be noted that in addition to this, managed disks also has something called a blobcache which uses a combination of the underlying physical machines RAM and SSD to create a cache for each virtual machine. This cache is available for the Premium Storage persistent disks and the VM local disks. Premium Storage does not count the Reads served from cache, towards the disk IOPS and Throughput. Therefore, your application is able to achieve higher total IOPS and Throughput.
Secondly Disk bursting which is currently in preview is a feature for premium SSDs. Bursting is supported on any premium SSD disk sizes <= 512 GiB (P20 or below).
Azure Shared Disk
Azure Shared Disk is a new preview feature which allows you to share a disk between multiple virtual machines to allow for multiread & write against the same disk. Which is required for many cluster based workloads. Especially where you have clustered application using SCSI Persistent Reservations
- Only available for Premium Disk greater than P15
- Azure Shared Disks can be enabled as data disks only
- Azure Shared Disks do not support mounting across AZ’s
- Currently only supported in the West Central US region.
- All virtual machines sharing a disk must be deployed in the same
proximity placement groups. - https://azure.microsoft.com/en-us/blog/announcing-the-preview-of-azure-shared-disks-for-clustered-applications/
Azure Data Lake Generation 2
Azure Data Lake Generation 2 is file system based upon Azure Blob Storage but provides HDFS based filsystem on top, to allow integraiton with analytics services. Such as HDInsight, Databricks, Synapse. Unlike regular Blob Storage, Data Lake comes is the addition with a hierarchical namespace to Blob storage. The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
Data Lake Storage also has support to collect data from IoT Hub and Azure Data Box.
Azure HPC Cache
Azure HPC Cache, is essentially a high performance Cache for HPC workloads. The HPC Cache is then part of your virtual network and will also require its own Cache delegated subnet for the service. The subnet cannot host any other VMs, even for related services like client machines. Since this is a cache service it sits in front of the other backend storage solutions to handle the large I/O.
HPC Cache supports Azure Blob containers or NFS for backend storage storage. Where the NFS Storage could either be hosted as part of NetApp files or trough NFS solution running on-premises trough VPN or ExpressRoute.
HPC Cache presents itself as NFS to the frontend clients.
Azure Data Box – Gateway & Azure Stack Edge
- Azure Data Box Gateway – Data Box Gateway is a virtual device based on a virtual machine provisioned in your virtualized environment or hypervisor. The virtual device resides in your premises and you write data to it using the NFS and SMB protocols. The ateway transfers data to Azure block blob, page blob, or Azure Files.
- Azure Stack Edge – Azure Stack Edge is a Hardware-as-a-service solution, A 1U rack-mounted server supplied by Microsoft. To integrate with Azure Machine Learning and also has the option to setup a file share using NFS or SMB which then integrates with Azure Blob Storage.
Support for Azure Services
Data Factory |
Azure Backup |
Private Link |
AKS |
Power BI |
|
Azure Blob |
Source/Sink |
N/A (own Snapshot) |
Yes |
Yes |
Yes |
Azure Files |
Source/Sink
|
Yes (Preview) |
Yes |
Yes |
No |
Azure Data Lake |
Source/Sink
|
N/A (Own Snapshot) |
Yes |
N/A |
Yes |
Azure NetApp Files |
N/A |
N/A (Own snapshot) |
N/A |
Yes (Trident) |
No |
Azure Managed Disk |
Source/Sink (Trough VM and extension) |
Yes |
N/A |
Yes |
No |
Support for Kubernetes workloads
Summary
So in the post I covered a set of the different storage options that Microsoft Azure has to offer, and now there is some overlap in terms of supported protocols and workloads, but it is important to consider what kind of performance and troughput you are also paying for as part of it. In the table below I’ve tried to create a summary of each storage options and what kind of workload / performance / SLA / protocol each of them support.
Primary Usecase |
Supported Protocols |
|
Azure Files |
File based storage |
SMB |
Azure NetApp Files |
File based storage |
SMB / NFS |
Azure Blob |
Object based storage |
REST API / NFS (Preview) |
Azure Files Sync |
File based storage replication / tiering |
Windows Server supported |
Azure Managed Disk |
VM based storage for Azure |
OS Supported |
Azure DataLake |
Big Data |
Multi-protocol (HDFS) |
Azure Databox Gateway |
Hybrid Storage / tiering |
SMB / NFS |
Azure Shared Disk |
VM Based shared disk |
Directly mounted using SCSI |
Azure HPC Cache |
Cache for HPC workloads |
NFS |
If you have any feedback to the table or if there is something wrong, please reach out to me on [email protected] so I can update the content.