Monthly Archives: November 2015

Testing SSL from Netscaler–Issues with SSL handshake

From time to time we need to setup load balancing to a SSL based service or when setting up connection to a secure Storefront (which is the default) there is one thing that alot of people are missing from the config when setting up, which results in wierd issues or getting SSL handshake errors from the monitors. In most cases it because of two things

  • Missing Root CA
  • Wrong Ciphers or not supported ciphers

So how can we verify from the Netscaler that it is missing the rootCA or that we have the right CA in place?

That is when we uses OpenSSL, which is a toolkit that is used on the Netscaler, which also has a commandline interface which allow us to test different parameters.

So if we enter Shell on the Netscaler and then do a CD to /nsconfig/ssl (This is where all the NS certificates are stored by default and from there we can use OpenSSL.

By using the command

openssl s_client –connect FQDN

First of this will show us, the certificate that is presented, and the certificate chain. It will also list out what kind of connection that is being used towards to FQDN (In this case below we are using TLS 1.2 against a Storefront server.

depth=1 C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
verify error:num=20:unable to get local issuer certificate
verify return:0

Certificate chain
   i:/C=US/O=DigiCert Inc/CN=DigiCert SHA2 Secure Server CA
1 s:/C=US/O=DigiCert Inc/CN=DigiCert SHA2 Secure Server CA
   i:/C=US/O=DigiCert Inc/ Global Root CA

Server certificate
issuer=/C=US/O=DigiCert Inc/CN=DigiCert SHA2 Secure Server CA

No client certificate CA names sent

SSL handshake has read 3034 bytes and written 479 bytes

New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-SHA384
    Session-ID: EB46000017E1621AA1BB5491BDFD3EDB2C273F35E73DB2029651C5B00DEC62BC
    Master-Key: 65CA41A8B811869F0C005469E20578BB3C876AB7207AB5D2D42370B7779FD1EB                                 7F971DC3A0001EF9B54963D1D2B080BD
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1448336973
    Timeout   : 300 (sec)
    Verify return code: 20 (unable to get local issuer certificate)

What we can see here is that we have an error message at the tops stating that

 verify error:num=20:unable to get local issuer certificate

This error occurs if

  • The certificate chain for the certificate wasn’t provided by the other side or it doesn’t have one (it is self-signed).
  • The root certificate is not in the local database of trusted root certificates.
  • The local database of trusted root certificates was not given and thus not queried by OpenSSL.

In order to verify against a chain of certificates with a RootCA or Intermidiate with both, we can use the parameter –CAfile or –CAPath which we can specify behind the command to test a connection with a RootCA.

Now there are a bunch of different parameters that we can use with OpenSSL, for instance we can also test openSSL using different protocols such as -ssl3, -tls1, -no_ssl3, -no_tls1, -no_tls1_1, -no_tls1_2

Which allow us to test using SSL3 for instance. You can see the list of different options on the openSSL site here –>

Load balancing port ranges with Netscaler

So I got a question earlier today, if it was possible to load balance a specific port range within Netscaler. Now by default we cannot specify a port range when setting up a load balanced vServer or setting up services.

In order to ensure that traffic from a specific endpoint going to the same backend service we have some options.

1: Persistency Groups, in this case we need to define load balanced vServers and services for each port nr and then we need to define a persistency group afterwards. Which I have blogged about earlier

2: Another option we have is to define the port nr to * and service to ANY, which means that we can have a single vServer and single service objects, the issue is that it will intercept ANY traffic to ANY port, which is a bad thing.

What we can do with option 2 is define a listening policy. Listen policy will allow us to customize which ports a vServer should respond to. For instnace we can define a vServer with ANY service and port equals * but then we attach a listen policy and define for instance port range between 80 to 8080. Even thou the vServer is setup with ANY port and ANY service it will only listen to requests coming from within the port range in the listen policy.

Listen policies can be defined within the vServer, such as shown in the screenshot below


Software-defined networking difference between VXLAN and NVGRE

Myself being quite in the starting phase of software-defined networking and all the different network virtuliazation technologies out there, I thought I would do a summurization between the largest different vendors in this market. What differenciates them (from a protocol perspective) and why on earth would we use them ?

First of, network virtualization is not new it has been around for a long time. Since we started with computer virtualization and had some sort of networking capabilities, but to extend this capabilities required something more. We started out with

* Virtual Network adapters and dummy switches

And then we moved along into more cool stuff like

* Virtual VLAN
* Managed L2 Switches virtually
* Firewall and load balancing capabilities
* Virtuall routing capabilities and virtual routing tables

Now in the later years came VXLAN and NVGRE (which are two different tunneling protocols) which was primarly aimed at the scaleability issues with large cloud computing platforms and also with the problems with STP and using a large number of disabled links. Such as VLAN issues and overlapping IP-address segments, and that management should be a part of the virtualization layer and not seperate.


VXLAN (Part of NSX) is in essence a tunneling protocol which wraps layer 2 on layer 3 network. Where a network is split into different segment and only VMs within the same VXLAN segment can communicate with each other. This segment has its own 24-bits segment ID. VXLAN uses IP Multicast to deliver bcast/mcast/unknown destination VM Mac addresses to all access switches participating in a given VXLAN.

In a tradisional VLAN packet it would look like this

Using VXLAN we wrap the Ethernet packet within UDP packet, so first we have the inner (Original) Ethernet header

So using VXLAN addes another 50 bytes of additional overhead for the Protocol. Which in essence means that it will the standard MTU over 1500. There is a tech post from Vmware which stats that the MTU should be adjusted to 1600 MTU, but you should rather consider Jumbo frames

So it gives more overhead and all packets need to wrapped out of the VXLAN before being sent to the other VM. This also makes an issue when sending small packets such as Telnet/SSH which transmits a packet for eac keystroke which will see a large amount of overhead for each packet, even thou it not a very common workload.

In order to allow communication between a VXLAN enabled host and a non enabled VXLAN host you need a VXLAN capable device in between which acts as a gateway.

Now a nice thing about VXLAN is that there is coming more and more support for VXLAN enabled devices, and so using VXLAN in our cloud infrastructure we can define access and management from the virtualization layer and move all VXLAN traffic over just one transport VLAN.


NVGRE on the other hand is primarly a tunneling protocol that Microsoft is pushing, which uses GRE to tunnel L2 packets across an IP fabric. Which uses a 24 bits of the GRE to identity the network ID

The positive thing about using GRE is that many existing hardware already has full support for GRE (Hence switching and nic offloading) but on the other hand wrapping L2 packets within a GRE layer will not allow regular features like firewalls or load balancers be able to “see” the packets unlike with UDP. So therefore the load balancers / firewall would need to act as a Gateway and remove the GRE wrapper in order to do packet inspection.

For instance in Windows Server 2016 TP4 it includes its own load balancing and firewall capabilities to be able to do this without unwrapping the packets. Here are some features that are included in TP4

Network Function Virtualization (NFV). In today’s software defined datacenters, network functions that are being performed by hardware appliances (such as load balancers, firewalls, routers, switches, and so on) are increasingly being deployed as virtual appliances. This “network function virtualization” is a natural progression of server virtualization and network virtualization. Virtual appliances are quickly emerging and creating a brand new market. They continue to generate interest and gain momentum in both virtualization platforms and cloud services. The following NFV technologies are available in Windows Server 2016 Technical Preview.

  • Software Load Balancer (SLB) and Network Address Translation (NAT). The north-south and east-west layer 4 load balancer and NAT enhances throughput by supporting Direct Server Return, with which the return network traffic can bypass the Load Balancing multiplexer.

  • Datacenter Firewall. This distributed firewall provides granular access control lists (ACLs), enabling you to apply firewall policies at the VM interface level or at the subnet level.

  • RAS Gateway. You can use RAS Gateway for routing traffic between virtual networks and physical networks; specifically, you can deploy site-to-site IPsec or Generic Routing Encapsulation (GRE) VPN gateways and forwarding gateways. In addition, M+N redundancy of gateways is supported, and Border Gateway Protocol (BGP) provides dynamic routing between networks for all gateway scenarios (site-to-site, GRE, and forwarding).

The future

It might be that both of these prococols will be replaced by another tunneling protocol called Geneve which is a cojoint effort by Intel, Vmware, Microsoft and Red Hat –-> which in my eyes look alot like VXLAN using UDP wrapping protocol.

Either way the tunneling protocol that be used needs to be properly adopted by the management layer in order to integrated with the computing virtualization layer to ensure that traffic policies and security management are in place.

Setting up NFS Direct Veeam against Nutanix cluster

So the last couple of days I have tried to wrap my head around Direct NFS support which is coming in Veeam v9. The cool thing about this feature is that Veeam has a custom built NFS agent, which will go directly to the NFS share (only needs READ access) and export the snapshot data when doing a backup.

Now important that Veeam is configured against a vCenter server ( I tried many times against an ESX directly and then NFS Direct didn’t really work.

When setting up a Direct NFS backup solution, we need to first setup a Veeam Backup Proxy as we would in other scenarioes. We need to include the Veeam Backup Proxy in the virtual vSwitch that Nutanix provisions within ESX (Note: Do not change the vSwitch, just add the VM to the vSwitch network)


Then define an IP address to the Veeam Backup Proxy within the vSwitch so it can communicate with the Controller VM.


Note that since the vSwitch is an internal only switch, we should setup a Backup proxy per node to maximize the performance. Even thou in this scenario it will work to do NFS direct on this node against other node as well, but then we will be pushing the traffic across the Controller VM network. So when setting up backup jobs try to make it so it uses the local proxy on the host which the virtual machines recides on, this will give the best troughput.

We also need to whitelist the IP address of the proxy so that it can allow access ot the NFS share (Which in the case of Nutanix will be the Storage Container which virtual machines resides on) This can be done on a container level or at a cluster level.


Next we need to “force” Veeam to use the storage network on the proxies to do backup traffic. Which can be done in the central management pane within Veeam.


Lastly we need to rescan the storage attached to the infrastructure which will allow Veeam to see the new NFS datastores and see that they can access it using NFS direct. This can be done here.


We can see from the statistics of this job that it is using NFS in the first screenshot


We can also see in the backup job log file for the VM


and that we are using regular hotadd in the second one.


New Netscaler books available!

Just a small post about what I have been busy with lately Smilefjes For those who have been following me on Twitter/LinkedIn/Blog notice that I from time to time blog about Netscaler which kinda has become my little baby.
Anyhow… 2 years back I started working on my second book for Packt Publishing called Netscaler VPX, which was the first technical book (outside of Citrix Education) on Netscaler available on Amazon.

Now two years later, I have done a bit more.

Implementing Netscaler VPX Second Edition:

Which is a upgraded/polished version of my first book, which was created upon version 10, this book is based upon V11 and contains more content around security, troubleshooting, azure/amazon deployments and front-end optimization.
And my latest project Mastering Netscaler VPX

Mastering Netscaler VPX:

This is a book which I co-wrote with Rick Roetenberg (Note: He did most of the work) Which goes a bit more deep into the material. I did some chapters on Network optimization, troubleshooting, Content switching, GSLB, Datastream and security features.

So if you are unsure what to get for christmas, this might be a good idea Smilefjes

Putting ThinWire and Framehawk to the test!

Framehawk and Thinwire – It’s all about the numbers

Recently me and Mikael @mikael_modin attended a Citrix User Group Conference in Norway, where Mikael held a session regarding when and when to use Framehawk, you can read his entire blogpost here –> and I have already done some details regarding Framehawk from a networking perspective.

The main point in Mikael’s presentation was that although using Framehawk in situations when packet loss is tremendously better, Thinwire Advance will often be “enough” or even more useful when there is only latency involved. This is because of the use of CPU, RAM and most of all bandwidth.
Another thing he pointed out was that Framehawk needs “a lot” of bandwidth to be at its best.
The recommendations for Thinwire is a minimum of 1,5MBps + 150kbps per user while recommendations for Framehawk is a minimum of 4-5Mbps + 150kbps per user.

There is a lot of naming conventions when it comes to Thinwire. Although we can see Thinwire as one protocol, there are different versions of it.
Thinwire is all about compressing data before sending it. The methods for this are:

· Legacy Thinwire (Pre win8 / Server 2012R2)

· Thinwire Compatibility Mode (New with FP3, also known as Thinwire +, Win8 / Server 2012R2 and later. This version takes advantage of how new operating systems constructs the graphics.
For more info read the following blog post written by Muhammad Dawood

· Thinwire Advance (uses H.264 to compress the data)

For a more detailed overview when to use each technology, you can refer to the following table:


When we came back home we decided to take a closer look at what impact had on CPU, RAM and bandwidth Thinwire or Framehawk had and we have found some very interesting data.

Our tests includes the following user workload;

· Logging in and waiting 1 minute for the uberagent to gather data and getting the session up and ready.

· Open a PDF file, scrolling up and down for 1 minute. (The PDF is located locally on the VM to exclude network I/O)

· Connect to a webpage, which is a Norwegian newspaper which contains a lot of different objects and high graphics, and scrolling up and down for a 1 minute. 

· We then open Microsoft Word and type randomly for 1 minute.

· Last but not least our favorite opening of the Avengers trailer in fullscreen using Chrome for the full duration of 2 minutes.

This allows us to see which workloads generate how much bandwidth, CPU- and RAM usage with each of the different protocols.

To collect and analyze the data we were using the following tools

· Splunk – Uberagent (Get info we didn’t even think was possible!)

· Netbalancer (Show bandwidth, set packet loss, define bandwidth limits and define latency)

· Citrix Director

– Displaystatus (to verify the protocol status)

The sample video below shows how the tests is being run. This allows us to closer analyze the sample data from Netbalancer as well.

NOTE: During the testing there might be some slight alterations from test to test since this not an automated test but running as an typical enduser experience, but these were so minor that we can conclude that the numbers are within +/-5%

We had two Windows 10 VDI running the latest release of XenDesktop 7.6 FP3 during the testing phase.

· MCS1002 is for the test02 user, which is not using Framehawk

· MCS1003 is for the test01 user, which has Framehawk, enabled using policies

· Use of Codec were deactivated through policy to ensure that Thinwire was used

The internett connection is a solid 100 MBps, the average connection to the Citrix enviroment is about 10 – 20 MS latency.

The sample video in this URL shows how the tests is being run. This allows us to closer analyze the sample data from Netbalancer as well.

Some notes so far: Some Framehawk sessions get stuck on the Netscaler, we can see existing connections not being dropped correctly, we can see this in the Netscaler GUI under Gateway –> DTLS sessions

After we changed the TCP profiles on the Netscaler we were unable to use Framehawk.
We then needed to reconfigure the DTLS and Certificate settings on the vServer and setup a new connection and Framehawk worked again as expected.

So after the initial run, we can note the following from the Netbalancer data;

We begin with looking at how Framehawk handles bandwidth.

We can see that the total session, which was about 7 minutes, Framehawk uses about 240 MBs of bandwidth to be able to deliver the graphics.
However, it was during the PDF and Webpage part of the test which really pushed it in terms of bandwidth, not the Youtube trailer.


Thinwire on the other hand, used only 47 MBs of bandwidth, and like we would expect more data was being used when showing the trailer than the PDF- and webpage section.


Using Splunk we care able to get a closer look at the Framehawk numbers.
Average CPU usage for the VDA agent was close up to 16% on average.


While using ThinWire the CPU usage was only 6% on average.


But the maximum amount of CPU usage came from Framehawk, which was close to 50% CPU usage at one point.


While ThinWire on the other hand, was only up to 18%


We can conclude that Framehawk uses much more CPU cycles in order to process the bandwidth, but from our testing we could see that the PDF part which generated a lot more traffic, allowed for a much more smooth experience. Not just from scrolling the document but also zooming in.

On the other side we can also see that Framehawk uses a bit more RAM then ThinWire does, about 400 MB was the maximum number


While Thinwire was about 300 MB


So this was the initial test, which shows that Thinwire uses less bandwidth, less memory and less CPU, but we can see that Framehawk on applications like PDF deliver a better user experience. So now, let us see how they fare when taking into account of latency and packet loss.

2% Packet loss

We started by testing Framehawk at 2% packet loss.
Looking at the bandwidth test we could see that is uses about 16 MB of bandwidth less with the packet loss. It’s still the PDF and Webpage that consumes the most resources, and now it is down to 224 MBs of bandwidth usage

The Maximum CPU usage peaked at 45%

And the average CPU usage was 19%

The amount of RAM used was a slight increase with 4MB






Now here comes the interesting part, using Thinwire at 2% packet loss, (up and down) will trigger a lot of TCP retransmissions because of the packet drops


(Remember that this is using an optimized Netscaler) we can see that ThinWire uses only 12 MBs of bandwidth! This is because of the TCP retransmissions, it will never be able to send large enough packets before the packet loss occurs.

So with Thinwire and 2% packet loss we could see that the bandwidth usage dropped with about 59 MB when we had the packet loss. The maximum bandwidth used in this session was 12Mbps

The maximum was also 50% lower than the reference test and showed only 3%

The average CPU usage was now only 3% (that is 50% of the reference test)

The RAM usage was about 30MB more than earlier





5% Packet loss

At 5% packet loss we can see that is uses about 50 MB of bandwidth extra. It’s still the PDF and Webpage that consumes the most resources, but now it is up to 300 MBs of bandwidth

We can also see that from a resource perspective, it still uses almost the same amount of max CPU %, but this might vary from test to test, but it is close to the 50%)

On average CPU usage we can see that it went up 4% from the initial testing, which makes sense since it needs to send more network packets which uses CPU cycles.

The RAM usage is the same as with 2% packet loss





5% Packet loss

Looking at the bandwidth usage with 5% packet loss and use of Thinwire the number is slightly lower and now uses 11MB

This can also be seen in the CPU usage of the protocol, since the packet loss occurs, the VDA does not need to send so much packets and hence the CPU usage is lower and stops at 7%

Average CPU usage is now just under 3%

RAM however is a bit larger with 330MB





End-user perspective
From an end-user perspective we can safely say that Framehawk delivered a much better experience, if we tried to follow the test from minute to minute, the ThinWire test took about 40 seconds longer just because of the delay from a mouse click to occur and doing things like zooming into a PDF file took so much time that it caused the test to take a longer time to complete.

Winner: Framehawk!

10% Packet loss


With 10% packet loss, we could see that the bandwidth usage went down a bit. That might again be that the packet loss was so high that it was unable to process all the data and hence the total bandwidth usage was lower than it was with 5%, and with the decrease in bandwidth, we can also see the CPU usage go down as well.

The max CPU usage was about the same with 47%

The average CPU usage was 19%

The RAM usage is the same at 404 MB




10% Packet loss

With 10% packet loss Thinwire was down to 6MB and the CPU usage also reflected this by only use 4% at peak and 1.6 % at average
RAM usage was still about the same as earlier and peaked at 326MB





End-user perspective
What we noticed here is that most of the different graphic intensive testing became unresponsive and that the ICA connection froze. The only thing that was really workable was using Word. Opening the PDF, Webpage and youtube became so unresponsive that is was not really workable.

Winner: Framehawk!

CPU Stats on Framehawk and Thinwire
NOTE: We have taken multiple samples of the CPU statistics on the Netscaler so this screenshots represent the average number we saw.
What we can see is that a framehawk which uses more bandwidth also will increase the CPU usage on the packet engines. The Netscaler from an idle state uses about 0 – 1,5 % CPU, which can be seen here à


NOTE: This is a VPX 1000 with 2 vCPU (Where we have only 1 packet engine) starting an ICA proxy session with the defaults over thin wire and starting the process that generates the most bandwidth (PDF scrolling and zooming) the packet CPU rises to about <1%


So it’s a minor increase which is expected since ThinWire uses a small amount of bandwidth. Now Framehawk on the other hand will use about 4% of the packet engine CPU. Note again that this was when we kept working with the PDF documentet.
We can conclude that using Framehawk will put a lot more strain on the Netscaler packet engine and therefore we cannot have as many users on the Netscaler.


RDP usage:
We also wanted to give RDP a test under different scenarios. We have some issues fetching out CPU and memory usage since RDP uses DWM and MSTSC which can appear as a sub-process of svchost
We therefore skipped that part and only focused on the bandwidth usage and end-user experience.

First we started out with a test where we have no limitations in form of latency and packet loss (This was using regular RDP against a Windows 10 with TCP/UDP

The initial test shows as we expected, RDP uses 53 MB of bandwidth


We also noticed that under the YouTube part that the Progressive rendering engine kicked in order to ensure optimal delivery but the graphics was ok.

RDP, 2% Packet loss

With 2% Packet loss the bandwidth usage was basically half 26MB of bandwidth


Keystrokes and some operations was a bit delayed, but still workable, on the other hand the progressive rendering engine on the youtube part made the graphics nearly impossible to see what actually happened, even thou audio worked fine.

RDP 5% Packet loss

RDP used about 17MB of bandwidth PDF scrolling and zooming made a huge delay in how the end-user could work. Surfing on the webpage which has a huge amount of graphics, freezed up for a couple of seconds. Youtube itself, well it didn’t work very well.


We can conlude that RDP uses more bandwidth that Thinwire under normal circumstances, but when coming to packet loss it does not deal with that pretty well.

So what does all these data tell us?
We can clearly see that Framehawk and Thinwire has its own use cases.
While Thinwire is the preferred method of delivering graphics, even with high latency, as soon as we experience packet loss off 3% or higher, Framehawk will definitively give a better use experience. Just remember to keep an eye on the resource usage on the VDI.
Especially when using it with XenApp since a spike in the CPU usage will have a great impact on the users who are logged on and will decrease the numenbr of users you can have on each server.

Enterprise Data Protection policy options in Microsoft Intune

Now earlier this week, Microsoft released their enterprise data protection feature within Microsoft Intune. I have blogged about this earlier here –>

But it is a security feature which allows us to filter data based upon if it is private data or buisness data.

We will be able to define 4 different levels of security.

  • Block (We can say that users are NOT allowed to share data from a buisness file to for instance social media)
  • Override (Users get a warning but are allowed to override, events are logged)
  • Silent (Everything is logged)
  • Off

The feature which was released into Intune is aimed at Windows 10 enterprise (mobile/desktop) and allows policied aimed at applications., either desktop apps or Universal apps.

So if we go into Intune and choose create new policy we have a new option called Enterprise data protection here –>


From here we can then create the different levels of security and define which application we want to scope this policy on. We can of course use wildcard levels to exclude/include different software


We can also define security level, domains which users are allowed to store data on and such.


So what is the magic sauce running beneath which allows this to happen ?
Stay tuned as I get more detailed on this blogpost, since I am still testing it Smilefjes