In my previous blog post I did a short overview of the different cloud vendors, a bit about their focus areas and also a bit about strengths and weaknesses. The blogpost can be found here –>http://bit.ly/2CrBgZA .In this post I want to focus more on IaaS and the offerings surrounding it, first I want describe a bit about each vendor and then ill go into a bit more comparison and also include the price/performance factor here as well and end it with some focus on automation functionality and additional services.
As mentioned in my previous blogpost, IBM with its Softlayer capabilities has had extremely focus on bare-metal, with the addition on traditional IaaS and also with the extended partnership with VMware, they can also provide vCloud Foundation package (which also is a prerequisite package for VMware HCX) or just plain ESXi with vCenter deployment. On the bare-metal options we can choose between hourly or monthly pre-configured servers or customize with single to quad processing solutions that range from 4 to 72 cores. We can also order bare-metal servers with dedicated GPU offerings such as K2, K80, M60, P100). One of the cool features in terms of pure IaaS is that they offer pure Block storage as an option as well, using iSCSI or just plain file storage using NFS which also is an option. In terms of scale ability they can only offer up to 56 cores and 242 GB RAM for a single virtual machine which is a lot smaller then most offerings in Azure, Google and AWS. IBM like AWS and Azure also offers pre defined instances sizes which can be used and when setting up an instance you can also define what kind of network connectivity you want to have, by default you get 100 mbps uplink and private connectivity which is free, but if you want to up it to 1 GB you need to pay a cost. The main issue is that of all much of the concepts such as availability zones and other options for HA is not an option in IBM compared to GCP, AWS and Azure.
In terms of Automation, IBM has a feature called Cloud Schematics which is natively based upon Terraform, so it is basically wrapping REST API calls using a IBM Provider in Terraform, https://ibm-cloud.github.io/tf-ibm-docs/ we also have the ability to run provision scripts which can be run at boot as part of a deployment. One of the things I feel is missing on IBM when it comes to automation is the ability to provider more overall system management capability such as Azure automation or AWS systems manager.
Google compared to the others have the most simplified deployment of virtual machines. Also they are the only vendor that has the option to defined custom instance sizes (for a bit higher prices of course) and also the flexibility when it comes to GPU flexibility for instance we can add GPU instances to any type of instance and also when it comes to disk type and sizes.
With Automation, Google has an API framework called Google Cloud Deployment Manager, which uses a YAML based syntax, but can also be using providers from Terraform or Puppet to do the deployment as well. Google also has the option do run start-up scripts on each virtual machine which allows for scripting of software and services inside the virtual machines. Google provides up to 96 vCPU and 1433 GB of memory on their largest instances. They however do not have any form of bare metal options, compared to IBM but that is not their focus either but like AWS, Google has gone into a partnership with Nutanix on a Hybric Cloud model which is going to be interesting to see how it turns out. Another cool thing about Google is that they provide live migrations of instances as default to handle maintance updates on their infrastructure.
For deployment of redudant solutions you need to be able to deploy instances across multiple zones within a region (Which is a simliar setup as Amazon Web Services do and Azure does with Availability Zones)
From a management perspetive, Google has been really good at developing their Cloud Shell solutions which allows for easy access to virtual instances directly from their browser and also allows for simple access with auto inserting the SSH key as part of the setup. One of the coolest things about Google is their core infrastructure and the network backbone which is called Andromeda https://cloudplatform.googleblog.com/2017/11/Andromeda-2-1-reduces-GCPs-intra-zone-latency-by-40-percent.html which now has allowed them to provide low latency high bandwidth connections on east-west traffic. Also that they SDN is also worldwide meaning that if you create a virtual network by default it will be available on all the different regions (where different subnets are placed within each region but are all interconnected)
Microsoft has also been doing a lot of work recently and investing heavily into new options such as new GPU offerings with the P100 and P40 cards but also with the introductions of availability zones (Still in preview for most services) which now allows for a great level of redundancy which is now pretty similar to Zones on GCP and AWS. Microsoft has also introduced loads of different new instances types with the burstable compute (B-series) and also now with the introduction of GA on Accelerated networking which allows for SRV-IO based network deployment of instances in Azure.
From a management perspective Microsoft has been doing alot around regular operations, such as with Log Analytics which can now do patch management and provide multiple pieces of monitoring across different platforms and also integrating different PaaS serivces to allow for a single hub to do monitoring across most of the services. Also with simple EDI based tools such as Logic Apps and Azure Automation allows us to setup simple and down to more complex automation jobs to do automated deployments and start/stop virtual instanced based upon a trigger or schedule. Also that they provide alot more tools when it come to migration and backup tools compared to the other vendors, with Azure Migrate and Azure Site Recovery.
Also Microsoft has been doing alot of investment into their Cloud shell solution as well which allows us to run az cli (bash based) and Azure PowerShell cmdlets directly from the browser. (As as of t01.02 they now also support Ansible directly from the cloud shell interface)
One of the issues with Azure from an IaaS perspective is the lacking flexibility to mixing like GPU cards with different instances, scaleable IOPS together with disk size. Also Microsft is focusing alot on building partners in the ecosystem to support automation and have been doing alot when it comes to Terraform which now covers alot of the resources in Azure directly.
When it comes to IaaS, Amazon provides most of the services both when it comes to bare-metal(coming) and support with VMware. Also different options depending on if needed reserved instances or just need to get reserve capacity or godzilla virtual machines. They also provide different storage options and scale options on IOPS depending on the size of the storage. Now with the upcoming support with VMware will also provide a whole new level of infrastructure solutions (the service is now available but still limited to certain regions in the US)
AWS also provides multiple management tools to make things easier such as AWS systems manager (which can also target on-premises virtual machines), and they even provide their own AWS Managed Services where they manage the IaaS solutions for you . AWS also has a service called OpsWorks which provides automation solution based upon Puppet and Chef as a managed service which can be then used to deliver configuration management against your own enviroment in AWS. AWS also has CloudWatch and CloudTrail to track events, logs and activity and API usage across AWS subscriptions.
AWS also has multiple options when it comes to GPU offerings such as P2 and G2 series which comes with a dedicated GPU card, or use the flexible GPU which is a software-defined GPU offering which allows us to add a GPU card to almost any type of instance.
Now the fun part is that most providers are now delivering more and more services to help with automation and system management, such as managed container engine cluster and also different advisor roles which can detect cost or security issues. This can be to check for best-pratices according to the cloud provider.
Now the interesting part is mainly around the container solutions that most providers are now fighting about. Both Microsoft and AWS have their own Container instance solution, where you just provision a container based upon an image and don’t have to worry about the infrastructure beneath (AWS Fargate and Azure Container Instance) and both of them also provide other container solutions such as Amazon Container Engine and Azure Container Engine. The fun part is that all 4 providers supports Kubernetes as the container orchestration engine and have supported features to build upon it, this can be a container registry solution or CI/CD solutions.
Technical Comparison: So the intention here is to have a short table to compare some of the different infrastrucutre services from each vendor, it does not measure the quality of service but just defines that they have a service and service name.
|High Performance Computing Services||Azure Batch||Amazon Batch||IBM Spectrum, IBM Aspera|
|Reserve Capacity instances||Low Priority VM’s||Preemptible instance||Spot instances|
|Reserved Instances||Reserved Instances||Committed use||EC2 Reserved Instances|
|Dedicated instances||EC2 Dedicated Instances|
|Bare Metal hosts||Yes (Announced)||Yes|
|VM Metadata support||Yes||Yes||Yes||Yes|
|Custom Instance Sizes||Yes||Yes|
|Compute Service Identity||Yes||Yes||Yes||No
|High performance disk||Premium Disk||SSD persistent disk, Local SSD||SSD EBS||SSD Octane|
|GPU-instances||N-series (NV, NC, ND)||Flexible GPU||P2 instances / Flexible GPU||Only as bare metal|
|Nested virtualization support||Yes||Yes (Beta)||Yes|
|Hybrid Story||Azure Stack||Nutanix||VMware||VMware|
|GPU cards support||M60, K80, P40, P100||K80, P100, AMD S9300||M60, Custom GPU, V100||P100, M60, K80|
|Desktop as a service||Third Party||Third party||Workspaces & AppStream||Third Party|
|Scale set||VM Scale Set||Instance Group||Auto Scaling||Auto Scale|
|Godzilla VM||Standard_M128 128vCPU, 3800 GB||N1-highmem 96vCPU, 1433 GB||X1.32large 128vCPU, 4 TB||56 vCPU, 242 GB Memory (Other Bare Metal)|
|VMware support||Yes (Announced)||Yes (Limited to the US)||Yes|
|Billing for VM||Per minute||Per Second||Per Second (For some)||Per Hour|
|Deployment & Automation service||Azure Resource Manager||Google Deployment Manager||Cloudformation||IBM Cloud Schematics|
|CLI||PowerShell, AzureCLI||GCloud CLI, Cloud Tools for PowerShell||AWS CLI, AWS Tools for PowerShell||Bluemix CLI|
|Monitoring & Logging||Microsoft Log Analytics, Azure Monitor||StackDriver||CloudWatch, Cloudtrail||Monitoring and Analytics|
|Optimization||Azure Advisor||Native Service in UI||Trusted Advisor|
|Automation tools||Azure Automation||Amazon CloudOps for Chef and Puppet||Cloud Automation, Workload Scheduler|
|Support for third party configuration and infrastructure tools||Chef, Puppet, Terraform, Ansible, SaltStack||Chef, Puppet, Terraform, Ansible, SaltStack||Chef, Puppet, Terraform, Ansible, SaltStack||Terraform|
|Cloud Shell support||Yes||Yes|
|EDI Tools||Azure Logic Apps|
In the next blog post I will take a closer look at some price comparison and comparing apples and apples in some benchmarks which measures speed of deployment using the different deployment tools and in VM spped on different levels.