Considering GPU in the Cloud for VDI deployments? Not so fast…

At Citrix Synergy myself and Phillip Jones has a presentation at the NVIDIA Booth on the first day of the conference where we talked about different deployment scenarioes for NVIDIA GRID both for HCI (Which Phillip covered) and in Cloud which I talked about. This also something which I discussed at the hot topics roundtable later on the week at the conference.


Now for deployment of GPU in Cloud there are some limits people should be aware of. Note that I only focused on the larger cloud providers such as IBM, Azure, AWS and GCP. When it comes to GPU there are only two which can provide native GPU with GRID architecture which is Azure and IBM.

IBM Softlayer (Which is IBM’s IaaS offering can provide customers which a bare metal deployment choice of M60 or K2 cards which can be rented at a monthly basis. Since this is a bare metal deployment you can use the different NVDIA deployment options such as vGPU or Pass-trough if needed.

Microsoft Azure
Microsoft Azure provides M60 cards on their NV virtual machine instances, which is possible using DDA (Discrete Device Assignment) in Windows Server 2016, so it is in essence pass-trough mode.

The issue with these instances is the lack of better IOPS support.  Now in Microsoft Azure there are multiple storage options available for virtual machines. First of we have the Storage Account which is a general purpose storage options which can be used for multiple storage objects, but has always been the default storage option for virtual machines. Storage Accounts has a limitation when it comes to IOPS for VHD (Virtual Harddrives). So VHD’s are stored as a storage blob which means they have a limit which is about 500 IOPS per second per 8KB sector size


Now Microsoft is advertising the NV-series with SSD but this is ONLY available on the D:\ drive which in Azure is a temp drive (never store persistent data on that drive!!) Now we have the option to combine multiple drive in Azure using Storage Spaces and have a combined higher IOPS and troughput, but it does not fix the latency issue with it and even though we will get a higher level of troughput/IOPS it will not drastically improve application experience.

Now Microsoft has something called Premium Storage which is SSD backend storage option which allows us to add data disks with higher level of IOPS.  However the N-series does not support this feature in Azure. Which means that if we for instance use the NV6 instance for our XenApp Servers we are bound to the limits of the data disks for our applications.

Now Microsoft announced a new series of virtual machines which are coming later this year with a new release of NVIDIA cards which will support Premium Disk options but their scale and GPU type (the P40) is

more advised towards using AI and Deep Learning workloads and not XenApp / RDSH deployments.

Amazon Web Services
Amazon Web Services has support for GPU instances as well using an older card called K520 on G2 instances.


CPU Cores





G2.2x large – EBS Optimized



0,8$ / h






2,8$ / h



Flexible GPU ( Preview )






AWS however has the option to do optimized EBS (Elastic Block Storage) which provides us with a high level of IOPS available to the instance. However the K520 is no longer supported because of the GRID drivers for this type of old cards –>

AWS is also working on something called Flexible GPU which is not directly a dedicated GPU but a custom service that AWS is adding and will be available to all instance types in AWS but it is a way to give dedicated video memory to an instance, but it is only OpenGL aware and will not work natively with the operating system. Therefore AWS is now working with ISV and software vendors to have applications with work nativly with their GPU offerings (using a set of custom API’s)

Google Cloud Compute
Google’s GPU option is still in Preview as of now NVIDIA® Tesla® K80 GPUs are available  and soon we will be able deploy instances on AMD FirePro S9300 x2 and the NVIDIA® Tesla® P100. Now all the cards are purely aimed against HPC and Rendering such with using TensorFlow.  This GPU instances are also directly attached to a virtual machine using Pass-trough mode. Also uou must have GPU quota before you can create instances with GPUs on GCP.

In Google you can attach a GPU instance to ANY type of virtual machine instance. However, instances with lower numbers of GPUs are limited to a maximum number of vCPUs. In general, higher numbers of GPUs allow you to create instances with higher numbers of vCPUs and system memory.

Cloud today does not provide the same level of deployment options as we have on traditional on-premises scenarioes using for instance HCI. On on-premises we can setup Pass-trough, NVIDIA vGPU, RemoteFX vGPU and even offerings such as VDGA.

Which is a shame actually, just think about the ability to be able to rent a GPU per minute with blazing fast disks and use it for the duration of the time it took to do the rendering and having a power control mechanism which would shit the VM down after the user logged out (and also have the ability to power it on on-demand as well). Which is something the flexibility of the cloud provides but we aren’t quite there yet hopefully we will be there someday. Hopefully the cloud providers actually see that VDI workloads are something that people want and have offerings which can support that.

Leave a Reply

Scroll to Top