Monthly Archives: May 2016

Getting started with Storage Replica in Windows Server 2016

So Storage Replica is a new feature which will make its introduction in Windows Server 2016 (Datacenter only!) but it is a volume based storage replication feature which can work in two manners, either synchronous or asynchrounous. This feature also opens up for a bunch of different use cases, the biggest one is the ability to now deploy streched cluster!

Synchronous

Will looks like this, when an application will write to a Storage Replica volume the SR feature will write it into log files (on a dedicated log volume) on the primary host. When the data is written to the log the data will also be replicated across to the other host/cluster in the setup, where it will also be written to the log volume. When the data is written to the remote host, there will be sent an acknowledgement back to the primary host and back to the application, and allows is to continue its buisness.

image

Using synchronous replication has some recommendations in terms of infrastructure.
Network:

  • Bandwidth! Remember that if you do not have sufficient bandwidth between sites, it will become the bottleneck and slow down the services running. Even if the compute / storage power in the primary site has more then enough horsepower.
  • Latency! should be no more then 5 MS latency

Disks and Storage, for a storage replica setup regardless of deployment mode there are some disk/storage requirements as well

  • Four disks are required: a source data disk, a source log disk, a destination data disk and a destination log disk.
  • The data disks must be formatted as GPT, not MBR.
  • The volumes have to be the same size
  • The log disk should use SSD storage using mirrored Storage Spaces, RAID 1, RAID 10 or similar resiliency.
  • The data disks can be on HDD, SSD or tiered, using mirror Storage Spaces, parity Storage Spaces, RAID 1, RAID 10, RAID 5, RAID 50 or equivalent configurations.
  • It is recommend testing with than 1TB to reduce initial replication time, but it can be as large as we want as long as we have bandwidth and I/O to handle our requirements.
  • The log volumes should be atleast 8 GB (larger for longer outtages or it can be as small as 512 MB, but by default it is 8 GB

So why should the log disks be SSD?? Think about it when we get outages having a fast log disk allowes for faster recovery. Also larger logs allow faster recovery from larger outages and less rollover, but cost more disk space! and of course the SSD will in case act as a write cache for all incoming I/O to the storage.

Asynchrounous

now asynchrounous works almost the same way, but with a major difference. Which is that data is written locally and after is has been added to the log it will then send an acknowledgement back to the application running, and then data will be replicated across to the other site.

image

So this does not have the same network requirements as syncronous because for instance if the link goes down, applications will still be able to run (The log file is going to grow, but when the link comes back up again it will be able to replicate again.

Now Storage Replica can also be deployed in different fashions depending on the need. For instance we can have

  • Server-to-server replica
  • Server-to-self (From one Volume to another)
  • Cluster-to-cluster (Two seperate clusters, DR scenario)
  • Streched Cluster (Single cluster, automatic failover)

So how to setup this stuff? As an example I have two virtual machines (server-to-server replica)

First of we need to install the server roles on each of the servers

$Servers = ‘DEMO01′,’DEMO02’

$Servers | ForEach { Install-WindowsFeature –ComputerName $_ –Name Storage-Replica,FS-FileServer –IncludeManagementTools -restart }

Then make sure that the disks are configured as they should as pr requirements further above. Simple way is to use diskpart to check if they are GPT.

image

Then after the disks are configured as they should (Note that they should have a form of redundancy!)

There is an cmdlet for storage replica that we are going to use which is Test-SRtopology, which is going to perform test against the different disks we are going to use.

Test-SRTopology -SourceComputerName SR-SRV05 -SourceVolumeNames e: -SourceLogVolumeName f: -DestinationComputerName SR-SRV06 -DestinationVolumeNames e: -DestinationLogVolumeName f: -DurationInMinutes 30 -ResultPath c:temp –verbose

NOTE: If you encountering any issues, reference the KB article here –> https://technet.microsoft.com/en-us/library/mt126101(v=ws.12).aspx

image

After the test is complete we can configure the Storage replica partnership.

New-SRPartnership -SourceComputerName sr-srv05 -SourceRGName rg01 -SourceVolumeName e: -SourceLogVolumeName f: -DestinationComputerName sr-srv06 -DestinationRGName rg02 -DestinationVolumeName e: -DestinationLogVolumeName f:

After the configuration you can use the command Get-SRgroup to see the properties of the replication

image

What we also will notice now is that that the log file disk will be filled up with about 8 G (Which is the default size of the log file) and note also that the replication mode is synchronous
NOTE: If you want to setup Asynchronous use the ( New-SRPartnership –ReplicationMode Asynchronous)
(Primary node)

image

On the secondary node however, you will notice that the data disk is unaccessable

image

And if I place data on the E: volume on the primary node, it will automatically get replicated across to the other side. Now If I eventually remove the replication group

image

I will get access to my disks again with the replicated data intact on the E: folder on both nodes.

Now if it by chance get broken the replication and you need to remove the metadata now we can use the command

Clear-SRMetadata -AllPartitions

So stay tuned for more about cluster setup and advanced configurations

Citrix NetScaler–TCP profiles

After having a huge number of questions regarding this topic of the last couple of weeks, I decided to write a blogpost about it, to clarify some of the misconceptions about this feature on NetScaler.

NOTE: TCP profiles can be found under System –> Profiles –> TCP

TCP profiles is a feature which allows us to customize TCP parameters on a NetScaler which we then can bind to a specific object. TCP profiles can be bound either globally, to a virtual server or to service (service groups). Important to note that TCP profiles can be bound to for instance at a global level, this will affect all TCP communication on the NetScaler, but we can for instance customize a TCP profile which we can bind to a virtual server, which will then override the TCP profile on the global level for that partciular virtual server.

Same goes for services, if we have a TCP profile bound globally, if we create a custom TCP profile which we then bind to a service, then it will override the global TCP settings that are defined.

So why should we customize TCP settings for different objects?

image

Our end-users access resources differently, for instnace on one hand we might have users using Citrix Receiver which is dependant on having a good experience wherever they are and on many different devices. On the other hand we might have mobile users working from their phones accessing resources using an app, and in most cases working wirelessly and roaming between 3G/4G and WiFi where it also often roams between access points, where you also have an high amount of packet loss.

Now in another of the puzzle are the internal resources that the NetScaler needs to talk to which are often connected to an high-speed ethernet 1GB/10GB connection, with no to little packet loss.

Think about it if you were to talk with a friend that sits right next to you, which is like internal traffic. No latency, little retransmission. On the other hand try talking to someone riding a bycicle far away, you would need to maybe repeat alot of word or sentences to that person and also you might tneed to speak slower as well to adjust and make sure that the other person receives what you are saying.

So TCP should also act differently depending on where the user is, and how their connection is. The default TCP profile on the NetScaler has not be adjusted for a long time, so it tries to communicate in the same way with internal resources and with external resources on the virtual server level, but of course it is there to ensure compability.

Another thing to remember is that there are many TCP settings that if enabled might impact the TCP performance badly as well. So when configuring TCP settings, if you are customizing on your own be sure that you test and validate TCP performance.
Now for most of us, it is alot simpler. Citrix NetScaler has pre created TCP profiles for different use cases.
Some of it, is use of features like SACK and DSACK, Nagle, MTCP and so on. Another important factor is the use of congestion algoritms and when to choose what.
This chart can be used as a guideline on which congestion algoritm to choose.

User-added image
 
source: http://support.citrix.com/article/CTX211877
Now as an important factor
⦁    NetScaler Gateway does not have the concept of Services, hence a TCP profile can only be bound to the Virtual Server. All other internal traffic will be using the default TCP profile.
⦁    Virtual Servers like Content Switching, Load balancing and so on, can have its own TCP profile attached to it. For instance if we have a virtual server that is used for serving mobile users content I would consider looking into using another congestion algoritm, and use of MTCP is the devices/application supports it
⦁    All services and service groups which communicate with internal resources can also have their own TCP profile, which is most cases nstcp_default_tcp_lan can be used for internal communication.

So hopefully you got a better understanding of TCP profiles Smilefjes

The future of VMware NSX–Transformers and licensing update

A couple of days ago, VMware announced the new release of NSX-MH (Multi-hypervisor) also known as Transformers – Codename Bumbelbee. Which supports now KVM and vSphere as hypervisor choices and being able to share the same transport zone. This picture displays how it looks like.

Also you can now have the Edge installed on a bare-metal, which then allows us to create our own hardware vTep.

Now also VMware introduced licensing models on NSX-V (which is still for VMware ESXi only) into three different editions.

  • Standard Edition: Automates IT workflows, bringing agility to the data center network and reducing network operating costs and complexity.
  • Advanced Edition: Standard Edition plus a fundamentally more secure data center with micro-segmentation. Helps secure the data center to the highest levels, while automating IT provisioning of security.
  • Enterprise Edition: Advanced Edition plus networking and security across multiple domains. Enables the data center network to extend across multiple sites and connect to high-throughput physical workloads.-
  • Also some interesting about the licensing: All three editions are available per socket on a perpetual basis. The advanced edition is available as a per-user offering (to align with virtual desktop deployments). The Enterprise edition is also available on a per-VM term basis.

    Existing customers will be moved into Enterprise edition when they move to the new licensing model. You can see the differences between the editions here –> http://www.vmware.com/products/nsx/compare.html?mid=1824&eid=CVMW2000000369654

    And more detailed information about the editions and feature matrix here –> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2145269

    Also according to: http://vcloud.sx/new-nsx-license-tier-thoughts-transformers/

    Where previously the list (USD) for NSX was $5,996 per Socket the new editions come in at $1,995, $4,995 and $6,995 per Socket. The Standard edition is well priced but taking a look through the Matrix in the official KB you are getting an extremely slimmed down version of NSX…short of the bells and whistles that make it the awesome SDN platform that it is, however I’m sure the feature set will be attractive for some.

    So one of the main reasons that people get NSX is the firewall parts, which is only part of the Advanced licensing which makes sense for VDI as well. Standard gives distributed switching and routing, which is also something that people should take a closer look at!

    NetScaler Security Insight

    With NetScaler build 11.65, Citrix introduces Security Insight as part of the Insight Appliance. This feature is used to give real-time insight into attacks, and recommendations on countermeasures.

    NOTE: Security Insight requires Application Firewall enabled, since it reports based upon AppFw violations and rules. You also need the same version of Insight and the NetScaler appliance for this to be supported and for it to work.

    To setup Security Insight there are basically three steps.

    1: Setup NetScaler Insight build 11.65 and NetScaler appliance with the same build number, then add the NetScaler appliance to Insight

    2: Alter the AppFlow paramters on the NetScaler appliance to send Security Insight data

    set appflow param  -SecurityInsightRecordInterval 60 -SecurityInsightTraffic ENABLED

    This enables it to send the IPFIX templates and data to Insight.

    3: Enable Application Firewall rules and bind it either globally or to a particular virtual server. The simplest way to setup it could be done like this.

    add appfw profile pr_appfw -defaults advanced

    set  appfw profile pr_appfw -startURLaction log stats learn

    add appfw policy pr_appfw_pol “HTTP.REQ.HEADER(“Host”).EXISTS”pr_appfw

    bind appfw global pr_appfw_pol 1

    This enabled the StartURL feature on the Application Firewall modul, which is set to learn which will learn the start URL on a virtual server, if someone tries to go into the virtual server on another start url for instance http://test.fqdn.no/test2.html they will be violating the StartURL rule and will then trigger an alert.

    This will then generate an AppFlow alert which will be sent to Insight Center and processed there.

    So in my case I have a simple virtual server, which represents a HTTP server. (This could also be Exchange, SharePoint, eCommerce sites for instance)

    After the configuration is done, you will see when logging into Insight and going into –> Security Insight that a device with Security Insight AppFlow will appear in the list

    image

    As of now I only have one Application called “test” which has an Application firewall policy attached to it.

    NOTE: Yeah I love descriptive application names.

    So after triggering some Appfw violations by accessing the virtual server on another URL which is not the start URL I get a bit more information. It will get some more  information about my Application Firewall policy and NetScaler System security policies, ill come back to that in a bit.

    image

    When clickcing on the virtual server form within Insight I get more information about what is configured for the virtual server in terms of AppFw signatures and security checks and so on. Also a threat level is generated based upon the violations that are created, te hmore critical the attacks on an application, the higher the threat index for that application. Values range from 1 through 7.

    So if I click on the threat index I get a detailed overview of what kinds of violations that has been triggered.

    image

    I can also from here click on the violation type and get information about client ip address, and if I have GeoIP database added it will draw a map of where it originated from

    image

    Now what we saw from the earlier pane, we noticed that my Application Firewall configuration was level one and that my NetScaler System Security was level 3, which means that Insight noticied that I haven’t done any changes beside the default and I should take a closer look at the system configuration to harden it more..

    So if I now go into the NetScaler system policy configuration I get feedback on what I should do, to ensure that the NetScaler is more locked down

    image

    The NetScaler System Security is built up of different categories.

    • Access
    • Monitoring
    • Logging
    • Cryptography
    • Others

    The different categories still show limited information and the crpytography pane just looks at if SSL/TLS is enabled and if it is using AES256 as the default encrpytion.

    So what’s missing?

    Even thou this is a good starting point, I would love for Citrix to go even deeper here in this feature, because so many have no clue about how secure their external services are. So some things I would love to see in the product moving forward

    • Cipher groups indexing
    • Certificates indexing
    • HSTS enabled?
    • DDoS attacks?
    • AAA bruteforcing ?

    Would love for them to incorporate features from what ssllabs are running and display a better SSL/TLS overview of each virtual servce.

    And also all the recommendations that are shown from Insight should be able to configure directly from Insight, instead of just show what you should do and then you need to log into the appliance and then configure it from there.

    New award, Aviator!

    So what is an Aviator? Well It’s now what you think… For those that have been following me, notice that  now that I from time to time have been working with Avi Networks. Why? Because they think differently from other ADC vendors in the market and they deliver a pretty cool product!

    If you for instance look at Google Load balancer,  Maglev –> http://research.google.com/pubs/pub44824.html or at Microsoft’s own load balancer Duet –>  http://research.microsoft.com/pubs/220640/sigcomm14-duet-final.pdf

    Both Microsoft and Google’s load balancer’s (which they use for their public clouds) share many of the same characteristics for cloud scale load balancing, to easily allow of scaling up and  down load balancing resources and having rich analytics built-in which is easily distributed across multiple servers. Which of course are running on plain x86 hardware with powerful automation features.

    This is the same vision that Avi Networks want to deliver to on-premises datacenters, providing the same feature set and analytics as Microsoft and Google does in their own datacenters. Also instead of having a single appliance which does all the features + management, which is typically the case for many ADC vendors which comes from a physical space and then turned to virtulization,  Avi networks is built for virtuliazation where management and packet handling is seperated into their own bits of virtual appliance and can be integrated into the virtulization layer.

    So back to the subject, what is an Aviator? Aviator is a new community program from Avi networks, where I am actually the first official member as of now!  I am honored that Avi choose me as their first member, and I look forward to participating in the program and trying out the new upcoming features in the upcoming beta bits! Smilefjes

    Troubleshooting a slow ICA-proxy session NetScaler

    This article is meant as an way to troubleshoot network issues on a NetScaler appliance, and of course ways to troubleshoot may differ, if you have any comments on what you typically do in this type of scenario please post a comment below!

    So the other day I was tasked to troubleshoot a NetScaler issue, where the customer had someissues with ICA sessions going slow and unreliable. A big problem was the file transfers were not working at all, where the bandwidth usage was going between 0KBps – 200 KBps. So when doing an initial assesment I noticed the following

    • Running NetScaler VPX 50
    • Running on VMware
    • Using LACP on the vDS on VMware
    • Firewall between the NetScaler and the external users, where they were using NAT for incoming requests

    First a couple things worth checking if ICA sessions are going slow

    • Amount of SSL transactions (Depending on the CPU performance and compute resources available to the NetScaler, it is going to affect the performance on the appliance) If this is pretty high, it could be that the resources available to the NetScaler was just saturaged.
    • Bandwidth use (Was it consuming to much resources so it couldn’t actually handle the amount of users trying to access this solution?)
    • Packet CPU usage (On NetScaler the packet CPU’s are responsinble for all the packet handling, and it also has one dedicated vCPU for management) on a VPX 50 you can only have 2vCPU (1 for management and 1 for packet management)

    So I noticed that the VPX had plenty of resources, the amount of SSL transactions were low (This could also be why they customer has issues with unreliable connections) the Packet CPU usage was low (I could see this by using stat cpu in CLI)

    Then after we noticed that there was nothing wrong with the VPX, we took a closer look at the virtual infrastructure. I checked if the NetScaler VMware host was sagurated, of if there was any performance issues on the virtuel network that the NetScaler was placed on.

    Since the issue was persistent and that it affected both client drive transfers and plain ICA proxy sessions, we guessed that this was issues with the external traffic and not the internal traffic which was causing the issue. We also checked that there were no bandwidth policies set on the XenApp farm which might affected the file transfer.

    Now since the bandwidth performance of the NetScaler was going up and down, I was thinking that this might be congestion somewhere. So the simplest way was to do a trace file from the NetScaler to see what kind of traffic is going back and forth and if there were any issues.

    After using WireShark for a while you get used to search for the most common parameters. If you have congestion somewhere you might get alot of RST or retransmits because of a full buffer. If you think about it, file transfer using client drive mapping will try to use as much bandwidth as possible. Another thing that was done before I did my test was to change the TCP profile to use nstcp_xa_xd_tcp_profile, which enables use of features like SACK and Nagle to reduce the amount of TCP segments and need for ACK messages in case of packet drops.

    NOTE: A good tip when doing starting trace files from NetScaler for SSL connections is to enable for “Decrypt SSL packets”

    User-added image

    From the trace file we noticed a couple of things.

    1: Alot of retransmissions from the XenApp server to the NetScaler SNIP

    2: TCP ZeroWindow

    Which are two symptoms which are often connected.

    image

    This meant that the NetScaler was not able to receive further information at the time, and the TCP transmission is halted until it can process the information in its receive buffer. So what I immediately assumed that the TCP buffer size was adjusted or somewhat altered. This was not the case since it was still using the default size.

    So why was this happening?

    A quick google search indicated that this was an issue in the NetScaler build, which has since then been resolved in the later build –> http://support.citrix.com/article/CTX205656

    So some quick tips when troubleshooting an NetScaler VPX

    • Check if the appliance has enough compute resources
    • Check if the hypervisor / virtualization layer it is running on has enough resources, or if it is a problem affecting other parts of the virtual network as well
    • Draw a topology map of the network and elimiate other possible components in the network path
    • Check TCP settings, remember that ICA-proxy is using TCP to the end-users
    • Check a Trace file, use filters in WireShark to easily filter out traffic (https://wiki.wireshark.org/DisplayFilters) (Even thou you can set filters in the NetScaler it can consume more resources on the NetScaler and you might not see the whole picture from a networking perspective, for instance the NetScaler might be flooded with Network traffic from another IP source which will then not be displayed in the trace file)
    • Last but not least, check if there are any known bugs in the current build and that the build is supported for the hypervisor that is being used. (http://support.citrix.com/en/products/netscaler)

    NOTE: You can read more about TCP Window Scaling in this article –> http://support.citrix.com/article/CTX113656