Monthly Archives: December 2015

“Legacy” network security layer and moving into Software-defined security/networking

In lack of finding a suiteable subject for this post, I just used the topic legacy to shine some light upon a modern problem in our infrastructure.

Modern firewalls have evolved from being a device that handles ACLs to becoming to the term Next-generation firewall, where we have features like

  • Application firewall & Application awareness
  • In-line deep packet inspection
  • IPS
  • QoS / Bandwidth mangement
  • Antivirus inspection
  • VPN/Secure Access MGMT

And of course these types of NGFW which typically sits between the infrastructure and the “rest of the world”, to protect resources from the “bad guys” outhere.

Whenever I think about firewalls, I think about the wall of troy, which was put in place to protect the residents of Troy from enemies coming from the outside.

NGFWs are a bit like the wall of troy, but with more traps, maybe even with pools with shark equipped with lasers on top to deal with incoming attacks.

And having a huge NGFW with huge bandwidth capabilities basically means that you have a huge gate and are able to have many people go inn and out at the same time, where the ACLs are the guards that keeps to see if the traffic is legit or not.

So what is the problem with this approach? The guards have done their job for many years now and it seems to have been working.  The issue occurs when we security issues already happening inside the gate, the guards at the gates are immobile and can’t help. Or if the guards at the gates are unable to detect the threats at the gates (such as the trojan horse)

This is the issue we are facing, more and more attacks on the infrastructure is happening from the inside and not from external attacks. Another issue is since many of the new attacks are so new that the signature based engine on the network firewall is unable to detect it. Now in some cases we can detect signature based attacks since traffic is split into different subnets and we can define the traffic to flow between the firewall and different zones.

Now there is a new way to approach this, with a zero-trust model. Where we have virtual machines within their own zone even if they are a part of the same subnet, moving away from the tradisional model of subnets and ACLs.

Such as with Vmware NSX and vArmour, which allow us to implement micro-segmentation.

Another advantage of this approach is this features are tightly integrated into the virtualization layer which allows us to control security from a virtual machine layer, and not at an IP-layer which makes it alot more flexible and dynamic.

So with this transition into sofware-defined why do we still need regular firewalls? “Why do we still need the big walls”? at the edge?

Software-defined datacenter is aimed at pretty much that at the moment, the datacenter. Its not aimed at protecting the regular computer clients and other perhipials in the buisness. We still need those guards to protect the regular residens from traffic going out and in.  Also in case of DDoS I would not want to have alot of traffic being processed trough a hypervisor or a virtual appliance, I would much rather have this being processed on a physical appliance that is aimed at processing this kind of traffic.

So before adopting a SDN technology make sure to have a proper strategy for all type of clients inside your infrastructure.

ICA vs PCOIP

First of let me start by stating that the subject of the blogpost is purely to get more viewers… But there is some truth to it, over the last weeks there has been alot of people talking about RDP / ICA / PCOIP and that the protocol wars are over.

There are multiple articles on the subject, but this one started the idea –> http://www.brianmadden.com/blogs/guestbloggers/archive/2015/11/25/are-the-display-protocol-wars-finally-over.aspx

And here as well –> https://twitter.com/michelroth/status/670288837730541568

So I figured since me and a good friend of mine @Mikael_modin have done alot of testing on Framehawk vs ThinWire (and with RDP in the mix) https://msandbu.wordpress.com/2015/11/06/putting-thinwire-and-framehawk-to-the-test/

Where we did a test based upon different packet loss parameters and measured is using uberagent splunk and netbalancer to see how it performed. The test consists about 5 minutes for each test where we did 1 minute of idle workload, 1 minute of web browsing using Chrome on a newspaper, 1 minute of PDF browning and zooming, 1 minute of word typing, Avengers Youtube trailer. The test was to be contucted on the same virtual infrastructure with the same amount of resources available to the guest VM (Windows 10) and no other firewall releated issues, and with just one connection server with a VDI instance. So this is purely a test of resource usage and bandwidth and how it adapts to network changes. There are of course other factors which kick in terms of performance +/- and other things.

Another thing is that I am by means no expert of View, if someone disagreed with the data or if I have stated something wrong please inform me of it.

So therefore I figured it was about time to put PCoIP to the test as well. Now I know that PCoIP has different protocol usage (HTML5/Blast and its own for GPU) but I am testing the native PCoIP Protocol)  and no change besides the default setup.

For those unknowning, PCoIP uses TCP & UDP port 4172, where TCP is used for session handshake and UDP is used as the transport of session data. Now the issue with UDP is that its hard to control the traffic flow. PCoIP is a “quite” chatty protocol

image

which means that a better experience (if the line can handle it) so it will be interesting to see how it handles congestion)
So from the initial test (with no limits what so ever)

image

It consumed about 168 MBs of banwidth, with a max amount of 933KB/s which was mostly during the Youtube showing of Chrome.

The View agent only used about 7% average CPU during the test.

image

The max amount of CPU at one point was about 23% which was during the youtube testing

image

Not such a heavy user of RAM as well

image

During our test using Framehawk and ThinWire on the same test we could see that Framehawk for instance used about 224 MB/s of bandwidth with a max of 1,2 MBps and oddly enough this was during the PDF scrolling and zooming which generated the most bandwidth.

image

On a side note, framehawk delivered the best experience when it came to the PDF part, it was lightning fast! Thinwire on the other hand used only 47 MBs of bandwidth, most bandwidth was during the Youtube part. Thinwire used the same amount of CPU usage.

image

Now as part of the same test we also turned up the packet loss to a degree that it would be able to reflect upon a real-life scenario. So at 5% packet loss I saw alot of changes

Now PCoIP only used about 38 MBs of banwidth, looking kinda similar to thinmwire usage… But this was quite noticeable from an end user perspective. Not quite sure if there is a built-in mechanism to handle QoS under packet loss.

image

Now when we did this with ThinWire and Framehawk during the same test we got the following results (11 MBs bandwidth)

clip_image025

Framehawk (used about to 300 MBs bandwidth) I’m guessing that it got in ass in gear when it noticed packet loss and therefore tried to compentanse by trying to max my banwidth available.

clip_image020

So in terms of packet loss, Framehawk handles it alot better then PCoIP, and ICA which uses TCP still manages to give a decent user experience but because of the TCP rules and congestion algoritms it not really as useable. Now since there was packet loss and hence less banwidth to transmit, the CPU had less to do.

image

With 10% Packet loss, we could also see a good decrease in bandwidth usage. Which means that it had a hard time keeping up with want I wanted to do, now it was down to 27MB of bandwidth usage, and struggled during the PDF and browsing wasn’t really good.

image

So as a first quick summarization,

  • the View agent is “lighter” meaning that is uses less CPU and memory on each host.
  • Its a chatty protocol, which I’m guessing that it work well in a highly congested network, ICA is also chatty but since it uses TCP it can adapt to the congestion
  • The plus side to it that since there is a steady flow of packets it delivers a good user experience.
  • It cannot handle packet loss as well as Framehawk, it was better then Thinwire on packet loss, but Thinware was never aimed for lossy networks.

Conclusion: Well I’m not gonna post any conclusions related to this post, since in some social media circles..
image

Well let’s just say that you can draw your own conclusion from this blogpost and ill just end the post with the picture of these two cars and you can point out which is which

Software-defined networking difference between VXLAN and NVGRE

Myself being quite in the starting phase of software-defined networking and all the different network virtuliazation technologies out there, I thought I would do a summurization between the largest different vendors in this market. What differenciates them (from a protocol perspective) and why on earth would we use them ?

First of, network virtualization is not new it has been around for a long time. Since we started with computer virtualization and had some sort of networking capabilities, but to extend this capabilities required something more. We started out with

* Virtual Network adapters and dummy switches

And then we moved along into more cool stuff like

* Virtual VLAN
* Managed L2 Switches virtually
* Firewall and load balancing capabilities
* Virtuall routing capabilities and virtual routing tables

Now in the later years came VXLAN and NVGRE (which are two different tunneling protocols) which was primarly aimed at the scaleability issues with large cloud computing platforms and also with the problems with STP and using a large number of disabled links. Such as VLAN issues and overlapping IP-address segments, and that management should be a part of the virtualization layer and not seperate.

VXLAN

VXLAN (Part of NSX) is in essence a tunneling protocol which wraps layer 2 on layer 3 network. Where a network is split into different segment and only VMs within the same VXLAN segment can communicate with each other. This segment has its own 24-bits segment ID. VXLAN uses IP Multicast to deliver bcast/mcast/unknown destination VM Mac addresses to all access switches participating in a given VXLAN.

In a tradisional VLAN packet it would look like this

Using VXLAN we wrap the Ethernet packet within UDP packet, so first we have the inner (Original) Ethernet header

So using VXLAN addes another 50 bytes of additional overhead for the Protocol. Which in essence means that it will the standard MTU over 1500. There is a tech post from Vmware which stats that the MTU should be adjusted to 1600 MTU, but you should rather consider Jumbo frames http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-VXLAN-Perf.pdf

So it gives more overhead and all packets need to wrapped out of the VXLAN before being sent to the other VM. This also makes an issue when sending small packets such as Telnet/SSH which transmits a packet for eac keystroke which will see a large amount of overhead for each packet, even thou it not a very common workload.

In order to allow communication between a VXLAN enabled host and a non enabled VXLAN host you need a VXLAN capable device in between which acts as a gateway.

Now a nice thing about VXLAN is that there is coming more and more support for VXLAN enabled devices, and so using VXLAN in our cloud infrastructure we can define access and management from the virtualization layer and move all VXLAN traffic over just one transport VLAN.

NVGRE

NVGRE on the other hand is primarly a tunneling protocol that Microsoft is pushing, which uses GRE to tunnel L2 packets across an IP fabric. Which uses a 24 bits of the GRE to identity the network ID

The positive thing about using GRE is that many existing hardware already has full support for GRE (Hence switching and nic offloading) but on the other hand wrapping L2 packets within a GRE layer will not allow regular features like firewalls or load balancers be able to “see” the packets unlike with UDP. So therefore the load balancers / firewall would need to act as a Gateway and remove the GRE wrapper in order to do packet inspection.

For instance in Windows Server 2016 TP4 it includes its own load balancing and firewall capabilities to be able to do this without unwrapping the packets. Here are some features that are included in TP4

Network Function Virtualization (NFV). In today’s software defined datacenters, network functions that are being performed by hardware appliances (such as load balancers, firewalls, routers, switches, and so on) are increasingly being deployed as virtual appliances. This “network function virtualization” is a natural progression of server virtualization and network virtualization. Virtual appliances are quickly emerging and creating a brand new market. They continue to generate interest and gain momentum in both virtualization platforms and cloud services. The following NFV technologies are available in Windows Server 2016 Technical Preview.

  • Software Load Balancer (SLB) and Network Address Translation (NAT). The north-south and east-west layer 4 load balancer and NAT enhances throughput by supporting Direct Server Return, with which the return network traffic can bypass the Load Balancing multiplexer.

  • Datacenter Firewall. This distributed firewall provides granular access control lists (ACLs), enabling you to apply firewall policies at the VM interface level or at the subnet level.

  • RAS Gateway. You can use RAS Gateway for routing traffic between virtual networks and physical networks; specifically, you can deploy site-to-site IPsec or Generic Routing Encapsulation (GRE) VPN gateways and forwarding gateways. In addition, M+N redundancy of gateways is supported, and Border Gateway Protocol (BGP) provides dynamic routing between networks for all gateway scenarios (site-to-site, GRE, and forwarding).

The future

It might be that both of these prococols will be replaced by another tunneling protocol called Geneve which is a cojoint effort by Intel, Vmware, Microsoft and Red Hat –-> http://tools.ietf.org/html/draft-gross-geneve-00#ref-I-D.ietf-nvo3-dataplane-requirements which in my eyes look alot like VXLAN using UDP wrapping protocol.

Either way the tunneling protocol that be used needs to be properly adopted by the management layer in order to integrated with the computing virtualization layer to ensure that traffic policies and security management are in place.

The mysterious case of 1110 and 1030 errors and PVS

So been spending a couple of days now troubleshooting a enviroment where external users “sometimes” had issues with connecting (which was completely random) and getting the 1110 and 1030 error messages

 

and also that Wyze thin clients had issues (sometimes) as well. There was only an issue on the specific hosts which were creating the problem. So this is a post of things to check Smilefjes 

First of this was a Hyper-V enviroment running PVS and the Netscaler was on another enviroment. There was two DCC which had Storefront installed. So for some reasons users had issues connecting to the enviroment. As in most cases I doublechecked the STA settings on the Storefront and Netscaler, and didn’t notice any error messages on the Storefront. Next thing I noticed was that the Hyper-V hosts had old HP Network drivers which in most cases had issues with VMQ. This would explain the sudden drop of a existing connection when inside, so we installed the latest NIC drivers and verified that VMQ was working as intended. Then I did a conclusion the case was resolved.

But the next day there were still users who were having issues with 1110 and 1030 issues, after some more troubleshooting I noticed that there was a host file entry (on one of the hosts) that was conflicting with the FQDN of the Storefront server, so when the Wyze clients were fetching the PNclient settings they were redirect in a loop which meant that they were never able to get the config.

So I noticed also that this meant that callback was not working to the Netscaler from that particular controller. The only problem is that even thou it was easy to fix, it would not resolve the 1110 and 1030 issues that we were experiencing.

Now what I noticed was that the 1030 and 1110 was at random, so I did a check while connecting to the Netscaler to make sure that the Netscaler was actually communicating with the correct VDA servers. Then I saw it, in the DNS records.

By default the Netscaler caches the DNS records for any VDA for 20 minutes, and by default the STA ticket will respond with hostnames. So when a external users was trying to gain access externally the ICAfile would contain what VDA agent to contact, the Netscaler would then ask DNS to get IP addresses of that VDA agent. For some reason the Netscaler got to addresses pr VDA agent from DNS. This would explain why external users would get that error message at random. Since the IP-addressses to the hosts, only one was active and the other was not.

So why was PVS servers registered with two IP-addresses? well that was the easy part.. Noticed I mentioned that this was running Hyper-V ? Smilefjes

Setting up PVS on Hyper-V requires two nics on each server. Legacy for PXE boot and then Synthentic to do the real-traffic. When setting up PVS 7.1 Citrix will “switch” between legacy and synthentic NIC after the OS is finished booting and the sythentic NIC is up and running. The only issue was that the Legacy NIC was able to update DNS before “being disabled”

image

So by updating the image and removing this feature and setting the Desktop Service to delay boot I made sure that the registration to the DDC was working as inteded and that no bad records in DNS.

Netscaler and AAA with CSW One VIP

As part of the latest release from Citrix Netscaler V11, there was an interesting feature added to the firmware. Which in essence allows ut to add a NO-IP Virtual AAA server, which allow us to add multiple resources lets say behind a CSW vServer where we only use one VIP.

Highlander there can be only one - There can be only one VIP

This is part of the latest feature release from Citrix (build 11. 63 from October) which has this feature.
It can either be setup using CLI or using the GUI.

User-added image

So when setting up the AAA vServer we can then use the option non-adressable

image

Note that when biding it to the CS vServer you need to specify that it needs to use 401-based authentication, since forms based requires an HTTP session externally to function

image

So from an enduser perspective a users tried to go to LB1, which resides on the CSW vServer, which will then trigger an AAA request to the non-adressable 401 based authentication and then the user will be authenticated.

New award, Nutanix Technology Champion!

Today Nutanix announced their list of Nutanix Technology Champions for 2016, and I am honored to be among the people on the list. Nutanix is doing alot of cool things, and alot more to come Smilefjes

http://next.nutanix.com/t5/Nutanix-Connect-Blog/Welcome-to-the-2016-Nutanix-Technology-Champions/ba-p/6382?utm_content=buffer05396&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

I am delighted to announce the 2016 group of Nutanix Technology Champions. This year has seen enormous demand from the community to participate in this program, and on behalf of the entire community team here at Nutanix, we are grateful and honored in the volume and the quality of the feedback we’ve had.
The Nutanix Technology Champion program spans the globe and is comprised of IT professionals from every cloud, application group, and technology. They are committed to news ways of thinking that will power the next generation of enterprise computing.
I am looking forward to getting to know you all and will be contacting our new NTC members shortly with more details. Congratulate one another and if you are sharing on social, please do use #NutanixNTC so other’s can engage in the conversation. Thank you for believing in us and the larger community.