Building a highly available Azure AD Connect

One of big obstacles with Hybrid Identity with Microsoft Azure these days is with syncronization and ensuring availaiblity for the bridge between on-prem Active Directory and Azure AD.
Azure AD Connect has evolved alot over the last couple of years and have added many new features and ways to authenticate, but one of the big obstacles has been high-availability and built-in redundancy for the different components.

Overview features Azure AD Connect

Azure AD Connect has evolved from being a sync engine that was only for syncing local Active Directory to Azure Active Directory users, and in combination with ADFS for federation to handle authentication to resources. Now it has evolved to replace ADFS and to allow Azure AD to handle authentication in combination with reducing the attack surface that we had with ADFS.

This is now possible with the new Passtrough Authentication or Password Hash Sync in Combination with Seamless SSO.

Overview of Passtrough authentication
Agent registration

Azure AD Connect also has writeback options

Or other options such as Hybrid Azure AD Join.

When it comes to the sync engine itself consists of two namespaces that store the identity information which is the Connect Space (CS) Which is connected to a connected directories (CD) in this case a Connect Directory can be either Active Direcetory or Azure AD Connect. Data coming from the Connected Directory is syncronized into the Metaverse (MV). 

Arch6

What is the issue with Azure AD Connect?

From an High-Availabilty perspective there is no built-in redudancy for the sync engine which leaves us only in an active/passive setup, using a staging server feature in Azure AD Connect.

The staging mode will makes the server active for import and synchronization, but it does not run any exports. A server in staging mode is not running password sync or password writeback, even if you selected these features during installation. So if you have Azure AD Connect with Password Hash Synchronization feature enabled. When you enable staging mode, the server stops synchronizing password changes from on-premises AD.

Bilderesultat for staging server azure ad connect

It’s also possible to have more than one staging server when you want to have multiple backups in different datacenters to provide full redudancy and you are not required to have a backend SQL cluster to handle high-availability for Azure AD Connect. This allows for easier portability across multiple locations.

Monitoring and failover

An important step to monitor Azure AD Connect is to setup Azure AD Connect Health, to give notification to different servicedesk and emailing lists in case of failure. Should be noted that using Azure AD Connect Health requires an Azure AD Premium license

  • The first Connect Health Agent requires at least one Azure AD Premium license.
  • Each additional registered agent requires 25 additional Azure AD Premium licenses.

Having this feature enabled will give you insight and email notification if the sync has stopped. Even if you have an active/passive Azure AD Connect it will not automatically failover if something happens to the Azure AD connect server.

Setting up Staging mode on a seperate server is a simple process, and is done using the Azure AD Connect Wizard where you in the last configuration pane specify “Enable Staging mode”

Once the setup is Complete you can see the following (Syncronization is currently disabled) 

Now we can run some simulations and import the AD users to the metaverse.

Select Connectors, and select the first Connector with the type Active Directory Domain Services. Click Run, select Full import, and OK. Do these steps for all Connectors of this type.
Select the Connector with type Azure Active Directory (Microsoft). Click Run, select Full import, and OK.
Make sure the tab Connectors is still selected. For each Connector with type Active Directory Domain Services, click Run, select Delta Synchronization, and OK.
Select the Connector with type Azure Active Directory (Microsoft). Click Run, select Delta Synchronization, and OK.

And we can see that the jobs have been run in Staging mode

In case of a failure and you want to promote the staging server to Primary you just rerun the Azure AD connect Wizard and remove the “Enable Staging Server” 

Then enable syncronization and you will notice that the Azure AD connect server will be starting to syncronise

Is it important if the other former primary comes back online that the sync services needs to be stopped and changed to Staged mode, or else you will be running in a non-supported topology by Microsoft.

Enabling High-Availability for Passtrough Authentication

If you are using Passtrough Authentication as well.  and that has been defined within Azure AD Connect configuration an authentication agent will automatically be installed and enabled as part of Azure AD connect. Now as mentioned above with the Active/Passive for the sync engine this does not apply to the passtrough authentication agent since this works in Active/Active.

In production environments, Microsoft recommends that you have a minimum of 3 Authentication Agents running on your tenant. There is a system limit of 40 Authentication Agents per tenant. You can download the authentication agent here –> https://aka.ms/getauthagent

And it is important that these machines are scaled properly to handle authentication requests.  A single Authentication Agent can handle 300 to 400 authentications per second on a standard 4-core CPU, 16-GB RAM server. It is however important that the first authentication agent is installed directly on the Azure AD connect server.

NOTE: Remember to have a Cloud-only Administrator account in spare, with Global Administrator access rights in case the authentication should fail and you need to revert back to password hash.

Summary

I really hope that in the future Microsoft will be able to create an Azure AD availability group or group of sync engines like we have with the passtrough authentication agents. Since Azure AD Connect now with passtrough is becoming a more crucial part of the infrastructure for hybrid identity, but still missing an important aspect that ADFS had which was high-availbility.

Leave a Reply

Scroll to Top