Deep Dive into Azure AD and AWS SSO Integration – Part 3

Deep Dive into Azure AD and AWS SSO Integration – Part 3

Back for more are you?

Over the past few posts I’ve been covering the new integration between Azure AD and AWS SSO.  The first post covered high level concepts of both platforms and some of the problems with the initial integration which used the AWS app in the Azure Marketplace.  In the second post I provided a deep dive into the traditional integration with AWS using a non-Azure AD security token service like AD FS (Active Directory Federation Services), what the challenges were, how the new integration between Azure AD and AWS SSO addresses those challenges, and the components that make up both the traditional and the new solution.  If you haven’t read the prior posts, I highly recommend you at least read through the second post.

Azure AD and AWS SSO Integration

New Azure AD and AWS SSO Integration

In this post I’m going to get my hands dirty and step through the implementation steps to establish the SAML trust between the two platforms.  I’ve setup a fairly simple lab environment in Azure.  The lab environment consists of a single VNet (virtual network) with a four virtual machines with the following functions:

  • dc1 – Windows Active Directory domain controller for jogcloud.com domain
  • adcs – Active Directory Certificate Services
  • aadc1 – Azure Active Directory Connect (AADC)
  • adfs1 – Active Directory Federation Services

AADC has been configured to synchronize to the jogcloud.com Azure Active Directory tenant.  I’ve configured federated authentication in Azure AD with the AD FS server acting as an identity provider and Windows Active Directory as the credential services provider.

visio of lab environment

Lab Environment

On the AWS side I have three AWS accounts setup associated with an AWS Organization.  AWS SSO has not yet been setup in the master account.

Let’s setup it up, shall we?

The first thing you’ll need to do is log into the AWS Organization master account with an account with appropriate permissions to enable AWS SSO for the organization.  If you’ve never enabled AWS SSO before, you’ll be greeted by the following screen.

1.png

Click the Enable AWS SSO button and let the magic happen in the background.  That magic is provisioning of a service-linked role for AWS SSO in each AWS account in the organization.  This role has a set of permissions which include the permission to write to the AWS IAM instance in the child account.  This is used to push the permission sets configured in AWS SSO to IAM roles in the accounts.

Screenshot of AWS SSO IAM Role

AWS SSO Service-Linked IAM Role

After about a minute (this could differ depending on how many AWS accounts you have associated with your organization), AWS SSO is enabled and you’re redirected to the page below.

Screenshot of AWS SSO successfully enabled page

AWS SSO Successfully Enabled

Now that AWS SSO has been configured, it’s time to hop over to the Azure Portal.  You’ll need to log into the portal as a user with sufficient permissions to register new enterprise applications.  Once logged in, go into the Azure Active Directory blade and select the Enterprise Applications option.

Register new Enterprise Application

Register new Enterprise Application

Once the new blade opens select the New Application option.

Register new application

Register new application

Choose the Non-gallery application potion since we don’t want to use the AWS app in the Azure Marketplace due to the issues I covered in the first post.

Choose Non-gallery application

Choose Non-gallery application

Name the application whatever you want, I went with AWS SSO to keep it simple.  The registration process will take a minute or two.

Registering application

Registering application

Once the process is complete, you’ll want to open the new application and to go the Single sign-on menu item and select the SAML option.  This is the menu where you will configure the federated trust between your Azure AD tenant and AWS SSO on the Azure  AD end.

SAML Configuration Menu

SAML Configuration Menu

At this point you need to collect the federation metadata containing all the information necessary to register Azure AD with AWS SSO.  To make it easy, Azure AD provides you with a link to directly download the metadata.

Download federation metadata

Download federation metadata

Now that the new application is registered in Azure AD and you’ve gotten a copy of the federation metadata, you need to hop back over to AWS SSO.  Here you’ll need to go to Settings.  In the settings menu you can adjust the identity source, authentication, and provisioning methods for AWS SSO.  By default AWS SSO is set to use its own local directory as an identity source and itself for the other two options.

AWS SSO Settings

AWS SSO Settings

Next up, you select the Change option next to the identity source.  As seen in the screenshot below, AWS SSO can use its own local directory, an instance of Managed AD or BYOAD using the AD Connector, or an external identity provider (the new option).  Selecting the External Identity Provider option opens up the option to configure a SAML trust with AWS SSO.

Like any good authentication expert, you know that you need to configure the federated trust on both the identity provider and service provider.  To do this we need to get the federation metadata from AWS SSO, which AWS has been lovely enough to also provide it to us via a simple download link which you’ll want to use to get a copy of the metadata we’ll later import into Azure AD.

Now you’ll need to upload the federation metadata you downloaded from Azure AD in the Identity provider metadata section.  This establishes the trust in AWS SSO for assertions created from Azure AD.  Click the Next: Review button and complete the process.

AWS SSO Identity Sources

Configure SAML trust

You’ll be asked to confirm changing the identity source.  There a few key points I want to call out in the confirmation page.

  • AWS SSO will preserve your existing users and assignments -> If you have created existing AWS SSO users in the local directory and permission sets to go along with them, they will remain even after you enable it but those users will no longer be able to login.
  • All existing MFA configurations will be deleted when customer switches from AWS SSO to IdP.  MFA policy controls will be managed on IdP -> Yes folks, you’ll now need to handle MFA.  Thankfully you’re using Azure AD so you plenty of options there.
  • All items about provisioning – You have to option to manually provision identities into AWS SSO or use the SCIM endpoint to automatically provision accounts.  I won’t be covering it, but I tested manual provisioning and the single sign-on aspect worked flawless.  Know it’s an option if you opt to use another IdP that isn’t as fully featured as Azure AD.
Confirmation prompt

Confirmation prompt

Because I had to, I popped up the federation metadata to see what AWS requiring in the order of claims in the SAML assertion.  In the screenshot below we see is requesting the single claim of nameid-format:emailaddress.  This value of this claim will be used to map the user to the relevant identity in AWS SSO.

AWS SSO Metadata

Back to the Azure Portal once again where you’ll want to hop back to Single sign-on blade of the application you registered.  Here you’ll click the Upload metadata file button and upload the AWS metadata.

Uploading AWS federation metadata

Uploading AWS federation metadata

After the upload is successful you’ll receive a confirmation screen.  You can simple hit the Save button here and move on.

Confirming SAML

Confirming SAML

At this stage you’ve now registered your Azure AD tenant as an identity provider to AWS SSO.  If you were using a non-Azure AD security token service, you could now manually provision your users AWS SSO, create the necessary groups and permissions sets, and administer away.

I’ll wrap up there and cover the SCIM provisioning in the next post.  To sum it up, in this post we configured AWS SSO in the AWS Organization and established the SAML federated trust between the Azure AD tenant and AWS SSO.

See you next post!

Deep Dive into Azure AD and AWS SSO Integration – Part 2

Deep Dive into Azure AD and AWS SSO Integration – Part 2

Welcome back folks.

Today I’ll be continuing my series on the new integration between Azure AD and AWS SSO.  In my last post I covered the challenges with the prior integration between the two platforms, core AWS concepts needed to understand the new integration, and how the new integration addresses the challenges of the prior integration.

In this post I’m going to give some more context to the challenges covered in the first post and then provide an overview of the what the old and new patterns look like.  This will help clarify the value proposition of the integration for those of you who may still not be convinced.

The two challenges I want to focus on are:

  1. The AWS app was designed to synchronize identity data between AWS and Azure AD for a single AWS account
  2. The SAML trust between Azure AD and an AWS account had to be established separately for each AWS account.

Challenge 1 was unique to the Azure Marketplace AWS app because they were attempting to solve the identity lifecycle management problem.  Your security token service (STS) needs to pass a SAML assertion which includes the AWS IAM roles you are asserting for the user.  Those roles need to be mapped to the user somewhere for your STS to tap into them.  This is a problem you’re going to feel no matter what STS you use, so I give the team that put together the AWS app together credit for trying.

The folks over at AWS came up with an elegant solution requiring some transformation in the claims passed in the SAML token and another solution to store the roles in commonly unused attributes in Active Directory.  However, both solutions suffered the same problem in that you’re forced to workaround that mapping, which becomes considerably difficult as you began to scale to hundreds of AWS accounts.

Challenge 2 plagues all STSs because the SAML trust needs to be created for each and every AWS account.  Again, something that begins to get challenging as you scale.

AWS Past Integration

AWS Past Integration

In the image above, we see an example of how some enterprises addressed these problems.  We see that there is some STS in use acting as an identity provider (idP) (could be Azure AD, Okta, Ping, AD FS, whatever) that has a SAML trust with each AWS account.  The user to AWS IAM role mappings are included in an attribute of the user’s Active Directory user account.  When the user attempts to access AWS, the STS queries Active Directory for the information.  There is a custom process (manual or automated) that queries each AWS account for a list of AWS IAM Roles that are associated with the IdP in the AWS account.  These roles are then populated in the attribute for each relevant user account.  Lastly, CloudFormation is used to push IAM Roles to each AWS account.  This could be pushed through a manual process or a CI/CD pipeline.

Yeah this works, but who wants all that overhead?  Let’s look at the new method.

Azure AD and AWS SSO Integration

Azure AD and AWS SSO Integration

In the new integration where we use Azure AD and AWS SSO together, we now only need to establish a single SAML trust with AWS SSO.  Since AWS SSO is integrated with AWS Organizations it can be used as a centralized identity source for all AWS accounts within the organization.  Additionally, we can now leverage Azure AD to manage the synchronization of identity data (users and groups) from Azure AD to AWS SSO.  We then map our users or groups to permission sets (collections of IAM policies) in AWS SSO which are then provisioned as IAM roles in the relevant AWS accounts.  If we want to add a user to a role in AWS IAM, we can add that user to the relevant group in Azure AD and wait for the synchronization process to occur.  Once it’s complete, that user will have access to that IAM role in the relevant accounts.  A lot less work, right?

Let’s sum up what changes here:

  • We can use existing processes already in place to move users in and out of groups either on-premises in Windows AD (that is syncing to Azure AD with Azure AD Connect) or directly in Azure AD (if we’re not syncing from Windows AD).
  • Group to role mappings are now controlled in AWS SSO
  • Permission sets (or IAM policies for the IAM roles) are now centralized in AWS SSO
  • We no longer have to provision the IAM roles individually into each AWS account, we can centrally control it in AWS SSO

Cool right?

In my few posts I’ll begin walking through the integration an demonstrating some the solution.

Thanks!

Deep Dive into Azure AD and AWS SSO Integration – Part 1

Deep Dive into Azure AD and AWS SSO Integration – Part 1

Hello fellow geeks!

Back in 2017 I did a series of posts on how to integrate Azure AD using the AWS app available in the Azure Marketplace with AWS IAM in order to use Azure AD as an identity provider for an AWS account.  The series has remained quite popular over the past two years, largely because the integration has remained the same without much improvement.  All of this changed last week when AWS released support for integration between Azure AD and AWS SSO.

The past integration between the two platforms functioned, but suffered from three primary challenges:

  1. The AWS app was designed to synchronize identity data between AWS and Azure AD for a single AWS account
  2. The SAML trust between Azure AD and an AWS account had to be established separately for each AWS account.
  3. The application manifest file used by the AWS app to establish a mapping of roles between Azure AD and synchronized AWS IAM roles had a limitation of 1200 which didn’t scale for organizations with a large AWS footprint.

To understand these challenges, I’m going to cover some very basic AWS concepts.

The most basic component an AWS presence is an AWS account.  Like an Azure subscription, it represents a billing relationship, establishes limitations for services, and acts as an authorization boundary.  Where it differs from an Azure subscription is that each AWS account has a separate identity and authentication boundary.

While multiple Azure subscriptions can be associated with a single instance of Azure AD to centralize identity and authentication, the same is not true for AWS.  Each AWS account has its own instance of AWS IAM with its own security principals and no implicit trust with any other account.

Azure Subscription Identity vs AWS Account Identity

Azure Subscription Identity vs AWS Account Identity

Since there is no implicit trust between accounts, that trust needs to be manually established by the customer.  For example, if a customer wants bring their own identities using SAML, they need to establish a SAML trust with each AWS account.

SAML Trusts with each AWS Account

SAML Trusts with each AWS Account

This is nice from a security perspective because you have a very clear security boundary that you can use effectively to manage blast radius.  This is paramount in the cloud from a security standpoint.  In fact, AWS best practice calls for separate accounts to mitigate risks to workloads of different risk profiles.  A common pattern to align with this best practice is demonstrated in the AWS Landing Zone documentation.  If you’re interested in a real life example of what happens when you don’t establish a good radius, I encourage you to read the cautionary tale of Code Spaces.

AWS Landing Zone

AWS Landing Zone

However, it doesn’t come without costs because each AWS IAM instance needs to be managed separately.  Prior to the introduction of AWS SSO (which we’ll cover later), you as the customer would be on the hook for orchestrating the provisioning of security principals (IAM Users, groups, roles, and identity providers) in every account.  Definitely doable, but organizations skilled at identity management are few and far between.

Now that you understand the importance of having multiple AWS accounts and that each AWS account has a separate instance of AWS IAM, we can circle back to the challenges of the past integration.  The AWS App available in the Azure Marketplace has a few significant gaps

The app is designed to simplify the integration with AWS by providing the typical “wizard” type experience Microsoft so loves to provide.  Plug in a few pieces of information and the SAML trust between Azure AD and your AWS account is established on the Azure AD end to support an identity provider initiated SAML flow.  This process is explained in detail in my past blog series.

In addition to easing the SAML integration, it also provides a feature to synchronize AWS IAM roles from an AWS account to the application manifest file used by the AWS app.  The challenges here are two-fold: one is the application manifest file has a relatively small limit of entries; the other is the synchronization process only supports a single AWS account.  These two gaps make it unusable by most enterprises.

Azure AWS Application Sync Process

Azure Marketplace AWS Application Sync Process

Both Microsoft and AWS have put out workarounds to address the gaps.  However, the workarounds require the customer to either develop or run custom code and additional processes and neither addresses the limitation of the application manifest.  This lead to many organizations to stick with their on-premises security token service (AD FS, Ping, etc) or going with another 3rd party IDaaS (Okta, Centrify, etc).  This caused them to miss out on the advanced features of Azure AD, some of which they were more than likely already paying for via the use of Office 365.  These features include adaptive authentication, contextual authorization, and modern multi-factor authentication.

AWS recognized the challenge organizations were having managing AWS accounts at scale and began introducing services to help enterprises manage the ever growing AWS footprint.  The first service was AWS Organizations.  This service allowed enterprises to centralize some management operations, consolidate billing, and group accounts together for billing or security and compliance.  For those of you from the Azure world, the concept is similar to the benefits of using Azure Management Groups and Azure Policy.  This was a great start, but the platform still lacked a native solution for centralized identity management.

AWS Organization

AWS Organization

At the end of 2017, AWS SSO was introduced.  Through integration with AWS Organizations, AWS SSO has the ability to enumerate all of the AWS accounts associated with an Organization and act as a centralized identity, authentication, and authorization plane.

While the product had potential, at the time of its release it only supported scenarios where users and groups were created directly in the AWS SSO directory or were sourced from an AWS Managed AD or customer-managed AD using the LDAP connector.  It lacked support for acting as a SAML service provider to a third-party identity provider.  Since the service lacks the features of most major on-premises security token services and IDaaS providers, many organizations kept to the standard pattern of managing identity across their AWS accounts using their own solutions and processes.

Fast forward to last week and AWS announced two new features for AWS SSO.  The first feature is that it can now act as a SAML service provider to Azure AD (YAY!).  By federating directly with AWS SSO, there is no longer a requirement to federate Azure AD which each individual AWS account.

The second feature got me really excited and that was support for the System for Cross-domain Identity Management (SCIM) specification through the addition of SCIM endpoints.  If you’re unfamiliar SCIM, it addresses a significant gap in IAM in the cloud world, and that is identity management.  If you’ve ever integrated with any type of cloud service, you are more than likely aware of the pains of having to upload CSVs or install custom vendor connectors in order to provision security principals into a cloud identity store.  SCIM seeks to solve that problem by providing a specification for a REST API that allows for management of the lifecycle of security principals.

Support for this feature, along with Azure AD’s longtime support for SCIM, allows Azure AD to handle the identity lifecycle management of the shadow identities in AWS SSO which represent Azure AD Users and Groups.  This is an absolutely awesome feature of Azure AD and I’m thrilled to see that AWS is taking advantage of it.

Well folks, that will close out this entry in the series.  Over the next few posts I’ll walk through what the integration and look behind the curtains a bit with my go to tool Fiddler.

See you next post!

 

DNS in Microsoft Azure Part 3 – Azure Private DNS Resolver

DNS in Microsoft Azure Part 3 – Azure Private DNS Resolver

Updates:

7/2025 – Updated post with DNS Security Policy support for DNS query logging. Tagged scenario 4 as deprecated. Corrected image and description of flow in 3b

This is part of my series on DNS in Microsoft Azure.

Today I’ll be continuing my series on DNS in Microsoft Azure.  In my first post I covered fundamental concepts of DNS resolution in Azure such as the 168.63.129.16 virtual IP and Azure-provided DNS.  In the second post I went over the Azure Private DNS service and it’s benefits over the default virtual network namespaces. In this post I’m going to cover the Azure Private DNS Resolver.

A majority of organizations have an existing on-premises IT footprint. During the move into Azure, these organizations need to support communication between users and applications on-premises or in other clouds with services deployed into Azure. An important piece of this communication includes name resolution of DNS namespaces hosted on-premises and DNS namespaces hosted in Azure Private DNS Zones. This is where the Azure Private DNS Resolver comes in.

The Azure Private DNS Resolver was introduced into general availability in October 2022. It was developed to address two gaps of the Azure-provided DNS service. As I’ve covered in my prior posts the Azure-provided DNS services does not support resolution of DNS namespaces hosted in on-premises or other cloud DNS services when the DNS query is sourced from Azure. Neither does it support resolution of DNS namespaces hosted in Azure Private DNS Zones from machines on-premises or in another cloud without the customer implementing a 3rd-party DNS proxy. The Azure Private DNS Resolver fills both gaps without the customer having to implement a 3rd-party DNS proxy.

The Resolver (Azure Private DNS Resolver) consists of three different components which include inbound endpoints, outbound endpoints, and forwarding rule sets. Inbound endpoints provide a routable IP address that services running on-premises, in another cloud, or even in Azure can communicate with to resolve DNS namespaces hosted in Azure Private DNS Zones. Outbound endpoints provide a network egress point for DNS traffic to external DNS services running on-premises or within Azure. Forwarding rulesets are groups of DNS forwarding rules (conditional forwarders) that give direction to DNS traffic leaving a virtual network through the 168.63.129.16 virtual IP.

Let’s take a look at a few different scenarios and how this all works together.

Scenario 1 – On-premises machine needs to resolve an Azure Virtual Machine IP address where the DNS namespace is hosted in an Azure Private DNS Zone.

In this scenario an Azure Private DNS Resolver instance has been deployed a shared services virtual network. An Azure Private DNS Zone named mydomain.com has been linked to the virtual network. Connectivity to on-premises has been implemented using either an ExpressRoute or VPN connection. The on-premises DNS service has been configured with a conditional forwarder for mydomain.com with queries being sent to the inbound endpoint IP address at 10.1.0.4.

Let’s look at the steps that are taken for an on-premises machine to resolve the IP address of vm1.mydomain.com.

  1. The on-premises machine creates a DNS query for vm1.mydomain.com after validating it does not have a cached entry. The machine has been configured to use the on-premises DNS server at 192.168.0.10 as its DNS server. The DNS query is passed to the on-premises DNS server.
  2. The on-premises DNS server receives the query, validates it does not have a cached entry and that it is not authoritative for the mydomain.com namespace. It determines it has a conditional forwarder for mydomain.com pointing to 10.1.0.4 which is the IP address of the inbound endpoint for the Azure Private DNS Resolver running in Azure. The query is recursively passed on to the inbound endpoint over the ExpressRoute or Site-to-Site VPN connection.
  3. The inbound endpoint receives the query and recursively passes it into the virtual network through the outbound endpoint which passes it on to the Azure-provided DNS service through the 168.63.129.16 virtual IP.
  4. The Azure-provided DNS service determines it has an Azure Private DNS Zone linked to the shared services virtual network for mydomain.com and resolves the hostname to its IP address of 10.0.0.4.

Scenario 2 – Azure virtual machine needs to resolve an on-premises service IP address where the DNS namespace is hosted on an on-premises DNS server

There are two approaches to using the Resolver as a DNS resolution service for Azure services. There is a centralized architecture and distributed architecture. My peer Adam Stuart has done a wonderful analysis of the benefits and considerations of these two patterns. You will almost always use the centralized architecture. The exceptions will be for workloads that have a high number of DNS queries such as VDI. Both the inbound and outbound endpoints endpoints have a limit to the queries per second (QPS) so by using a decentralized architecture for resolution of on-premises namespaces you can mitigate the risk of hitting the QPS on the inbound endpoint. I suggest reading Adam’s post, it has some great details.

Scenario 2a – Centralized architecture for connected virtual networks

Let me first cover the centralized architecture because it’s the more common architecture and will work for most use cases.

Centralized architecture for resolution of on-premises DNS namespaces

In the centralized architecture all virtual networks in the environment have network connectivity to a shared services virtual network through direct or indirect (such as Azure Virtual WAN or a traditional hub and spoke) virtual network peering. The Resolver and its endpoints are deployed to the shared services virtual network, a DNS Forwarding Rule Set is linked to the shared services virtual network, and it has connectivity back on-premises through an ExpressRoute or VPN connection. This architecture centralizes all DNS queries across your Azure environment pushing them through the inbound endpoint (beware of those QPS limits!).

In the example scenario above, a rule has been configured in the DNS Forwarding Rule Set to forward traffic destined for the onpremises.com domain to the on-premises DNS service at 192.168.1.10. The on-premises DNS service is authoritative for the onpremises.com domain.

Let’s look at the query path for this scenario where a VM1 in Azure is trying to resolve the IP address for an application running on-premises named service.onpremises.com where the namespace onpremises.com is hosted in an on-premises DNS server.

  1. VM1 creates a DNS query for services.onpremises.com. VM1 does not have a cached entry for it so the query is passed on to the DNS Server configured for the VMs virtual network interface (VNIC). The DNS Server has been configured by the Azure DHCP Service to the Resolver inbound endpoint with IP address 10.1.0.4 via the DNS Server settings of the virtual network. The query is passed on to the inbound endpoint.
  2. The inbound endpoint receives the query and recursively passes it into the virtual network through the outbound endpoint and on to the 168.63.129.16 virtual IP address and on to the Azure-provided DNS service. The Azure-provided DNS service checks to see if there is an Azure Private DNS Zone linked to the virtual network the resolver endpoints are in with the name of onpremises.com. Since there is not, the DNS Forwarding Rule Set linked to the virtual network is processed and the rule for onpremises.com is matched and triggered causing the recursive DNS query to be sent out of the outbound endpoint and over the ExpressRoute or VPN connection to the on-premises DNS server at 192.168.0.10.
  3. The on-premises DNS server resolves the hostname to the IP address an returns the result.

Scenario 2b – Distributed architecture for connected virtual networks

Let’s now cover the distributed architecture, which as Adam notes in his blog may be a pattern required if you’re hitting the QPS limits on the inbound endpoint.

Distributed architecture for resolution of on-premises DNS namespaces

In this distributed architecture all virtual networks in the environment have network connectivity to a shared services virtual network through direct or indirect (such as Azure Virtual WAN or a traditional hub and spoke) virtual network peering. The Resolver endpoints are deployed to the shared services virtual network which has connectivity back on-premises through an ExpressRoute or VPN connection. The workload virtual network DNS Server settings are set to the 168.63.129.16 virtual IP to use Azure-provided DNS. DNS Forwarding Rulesets are linked directly to the workload virtual networks and are configured with the necessary rules to direct DNS queries to the on-premises destinations.

In the above example, there is one rule in the DNS Forwarding Ruleset which is configured to forward DNS queries for onpremises.com to the DNS Server on-premises at 192.168.0.10.

Let’s look at the query path for this scenario where a VM1 in Azure is trying to resolve the IP address for an application running on-premises named service.onpremises.com where the namespace onpremises.com is hosted in an on-premises DNS server.

  1. VM1 creates a DNS query for services.onpremises.com. VM1 does not have a cached entry for vm3.mydomain.com so the query is passed on to the DNS Server configured for the VMs virtual network interface (VNIC). The DNS Server has been configured by the Azure DHCP Service to the 168.63.129.16 virtual IP which is the default configuration for virtual network DNS Server settings and which passes the query on to the Azure-provided DNS services.
  2. The recursive query is received by the Azure-provided DNS service and the rule for onpremises.com in the linked DNS Forwarding Ruleset is triggered passing the recursive query out of the outbound endpoint to the on-premises DNS server at 192.168.0.10. The recursive query is passed on over the ExpressRoute or Site-to-site VPN connection to the on-premises DNS server.
  3. The on-premises DNS server resolves the hostname to the IP address an returns the result.

Scenario 2c – Distributed architecture for isolated virtual networks

There is another pattern for the use of the distributed architecture which could be used for isolated virtual networks. Say for example you have a workload that needs to exist in an isolated virtual network due to a compliance requirement, but you still have a requirement to centrally manage DNS and log all queries.

Distributed architecture for isolated virtual networks for resolution of on-premises DNS namespaces

In this variation of the distributed architecture the virtual network does not have any direct connectivity to the shared service virtual network through direct or indirect peering. The DNS Forwarding Ruleset is linked to the isolated virtual network and it contains a single rule for “.” which tells the Azure-provided DNS service to forward all DNS queries to the configured IP address.

Let’s look at the resolution of a virtual machine in the isolated virtual network trying to resolve the IP address of a publicly-facing API.

  1. VM1 creates a DNS query for my-public-api.com. VM1 does not have a cached entry for vm3.mydomain.com so the query is passed on to the DNS Server configured for the VMs virtual network interface (VNIC). The DNS Server has been configured by the Azure DHCP Service to the 168.63.129.16 virtual IP which is the default configuration for virtual network DNS Server settings and which passes the query on to the Azure-provided DNS service.
  2. The recursive query is received by the Azure-provided DNS service and the rule for “.” in the linked DNS Forwarding Ruleset is triggered passing the recursive query out of the outbound endpoint to the on-premises DNS server at 192.168.0.10. The recursive query is passed on over the ExpressRoute or Site-to-site VPN connection to the on-premises DNS server.
  3. The on-premises DNS server checks its own cache, validates it’s not authoritative for the zone, and then recursively passes the query to its standard forwarder.
  4. The public DNS service resolves the hostname to an IP address.

One thing to note about this architecture is there are some reserved Microsoft namespaces that are not included in the wildcard. This means that these zones will be resolved directly by the Azure-provided DNS service in this configuration.

Scenario 3 – Azure virtual machine needs to resolve a record in an Azure Private DNS Zone.

I’ll now cover DNS resolution of namespaces hosted in Azure Private DNS Zones by compute services running in Azure. There are a number of ways to do this, so I’ll walk through a few of them.

Scenario 3a – Distributed architecture where Azure Private DNS Zones are directly link to each virtual network

One architecture a lot of customers attempt for Azure to Azure resolution is an architecture where the Azure Private DNS Zones are linked directly to each virtual network instead of a common centralized virtual network.

Distributed architecture for resolution of Azure Private DNS Zones for Azure compute with direct links

In this architecture each virtual network is configured to use the Azure-provided DNS service via a DNS Server setting on the virtual networks of the 168.63.129.16 virtual IP. The Azure Private DNS Zones are individually linked to each virtual network. Before I get into the reasons why I don’t like this pattern, let me walk through how a query is resolved in this pattern.

  1. VM2 creates a DNS query for vm1.mydomain.prod.com. VM2 does not have a cached entry for vm1.mydomain.prod.com so the query is passed on to the DNS Server configured for the VMs virtual network interface (VNIC). The DNS Server has been configured by the Azure DHCP Service to the 168.63.129.16 virtual IP which is the default configuration for virtual network DNS Server settings and which passes the query on to the Azure-provided DNS service.
  2. The recursive query is received by the Azure-provided DNS service and the service determines there is a linked Azure Private DNS Zone for the namespace mydomain.prod.com. The service resolves the hostname and returns the IP.

Alright, why don’t I like this architecture? Well, for multiple reasons:

  1. There is a limit to the number of virtual networks an Azure Private DNS Zone can be linked to. As I described in my last post, Azure Private DNS Zones should be treated as global resources and you should be using one zone per namespace and linking that zone to virtual networks in multiple regions. If you begin expanding into multiple Azure regions you could run into the limit.
  2. DNS resolution in this model gets confusing to troubleshoot because you have many links.

So yeah, be aware of those considerations if you end going this route.

Scenario 3b – Distributed architecture where Azure Private DNS Zones are linked to a central DNS resolution virtual network

This is another alternative for the distributed architecture that can be used if you want to centralize queries to address the considerations of the prior pattern (note that query logging isn’t addressed in the visual below unless you insert a customer-managed DNS service or Azure Firewall instance as I discuss later in this post).

Distributed architecture for resolution of Azure Private DNS Zones for Azure compute with central resolution

This architecture is somewhat of a combination of a distributed and centralized architecture. All virtual networks in the environment have network connectivity to a shared services virtual network through direct or indirect (such as Azure Virtual WAN or a traditional hub and spoke) virtual network peering. The Resolver and its endpoints are deployed to the shared services virtual network. All Azure Private DNS Zones are linked to the shared services virtual network. A DNS Forwarding Ruleset is linked to each workload virtual network with a single rule forwarding all DNS traffic (except the reserved namespaces covered earlier) to the Resolver inbound endpoint.

Let me walk through a scenario with this architecture where VM1 wants to resolve the IP address for the hostname vm2.mydomain.nonprod.com.

  1. VM1 creates a DNS query for vm2.mydomain.nonprod.com. VM1 does not have a cached entry for vm2.mydomain.nonprod.com so the query is passed on to the DNS Server configured for the VMs virtual network interface (VNIC). The DNS Server has been configured by the Azure DHCP Service to the 168.63.129.16 virtual IP which is the default configuration for virtual network DNS Server settings and which passes the query on to the Azure-provided DNS service.
  2. The Azure Private DNS Service checks to see if there is an Azure Private DNS Zone linked to the workload virtual network for the mydomain.nonprod.com domain and validates there is no. The service then checks the linked DNS Forwarding Rule Set linked the virtual network and finds a rule matching the domain pointing queries to the inbound endpoint IP address. The query is passed through the Resolver outbound endpoint to the inbound endpoint and back out the outbound endpoint to the 168.63.129.16 virtual IP address passing the query to the Azure-provided DNS service.

    The Azure-provided DNS service checks the shared services virtual network and determines there is a link Azure Private DNS Zone with the name mydomain.nonprod.com. The service resolves the hostname to the IP address and returns the results.

Scenario 3c – Centralized architecture where Azure Private DNS Zones are linked to a central DNS resolution virtual network

This is the more common of the Azure-to-Azure resolution architectures that I come across. Here queries are sent directly to the inbound resolver IP address via the direct or transitive connectivity to the shared services virtual network.

Centralized architecture for resolution of Azure Private DNS Zones

This architecture is centralized architecture where all virtual networks in the environment have network connectivity to a shared services virtual network through direct or indirect (such as Azure Virtual WAN or a traditional hub and spoke) virtual network peering. The Resolver and its endpoints are deployed to the shared services virtual network. All Azure Private DNS Zones are linked to the shared services virtual network. Each workload virtual network is configured with its DNS Server settings to point to the resolver’s inbound endpoint.

Let me walk through the resolution.

  1. VM1 creates a DNS query for vm2.mydomain.nonprod.com. VM1 does not have a cached entry for vm2.mydomain.nonprod.com so the query is passed on to the DNS Server configured for the VMs virtual network interface (VNIC). The DNS Server has been configured by the Azure DHCP Service to resolver’s inbound endpoint at 10.0.0.4. The query is routed over the virtual network peering to the resolver’s inbound endpoint.
  2. The inbound endpoint passes the query to the 168.64.129.16 virtual IP and onto the Azure-provided DNS Service. The service determines that an Azure Private DNS Zone named mydomain.nonprod.com is linked to the shared services virtual network. The service then resolves the hostname to the IP address and returns the results.

I’m a fan of this pattern because it’s very simple and easy to understand, which is exactly what DNS should be.

Scenario 4 – Centralized architecture that supports adding DNS query logging (Deprecated)

Now that you have an understanding of the benefits of the Azure Private DNS Resolver, let’s talk about some of the gaps. Prior to July 2025 you couldn’t achieve DNS query logging when using Azure Private DNS Resolver and Azure-provided DNS without the use of an additional server in the middle.

Centralized architecture for supporting DNS query logging

The architecture below was a common way customers addressed the gap when they had plans to eventually move fully into the Private Resolver pattern when it was introduced. DNS Security Policy was introduced in July of 2025 and cleaned up this gap making this architecture unnecessary if the additional DNS Server was solely providing DNS query logging. In this architecture all Azure Private DNS Zones and DNS Forwarding Rule Sets were linked to the shared services virtual network and a customer-managed DNS service is also deployed. All workload virtual networks have connected to the shared service through direct or indirect (Azure Virtual WAN or traditional hub and spoke) virtual network peering. Each workload virtual network has its DNS Server settings configured to use the customer-managed DNS service IP address.

Let me walk through a scenario where VM1 wants to resolve the IP address for a record in an on-premises DNS namespace.

  1. VM1 creates a DNS query for service.onpremises.com. VM1 does not have a cached entry for service.onpremises.com so the query is passed on to the DNS Server configured for the VMs virtual network interface (VNIC). The DNS Server has been configured by the Azure DHCP Service to the IP address of the customer managed DNS service at 10.1.2.4. The query is passed over the virtual network peering to the customer-managed DNS service.
  2. The customer-managed DNS service checks its local cache, validates it’s not authoritative for the zone, and passes the query on to its standard forwarder which has been configured to the resolver’s inbound endpoint at 10.1.0.4.
  3. The inbound endpoint passes the query into the virtual network out the outbound endpoint which passes it on to the 168.63.129.16 virtual IP and onto the Azure-provided DNS service. The Azure-provided DNS service checks the Azure Private DNS Zones linked to the shared services virtual network and determines that no Azure Private DNS Zones with the hostname onpremises.com are linked to it. The DNS Forwarding Rule Set linked to the virtual network is then processed and the matching rule is triggered passing the query out of the outbound endpoint over the ExpressRoute or VPN connection and to the on-premises DNS services.
  4. The on-premises DNS service checks its local cache and doesn’t find a cached entry. It then checks to see if it is authoritive for the zone, which it is and it resolves the hostname to an IP address and returns the results.

An alternative to using a customer-managed DNS service was using the Azure Firewall DNS proxy service using the pattern documented here.

The primary reason I didn’t remove this architecture completely in my July 2025 update is you’ll still see this architecture in the wild for customers that may not have fully transitioned to DNS Security Policy. Additionally, there may be use cases for using a 3rd-party DNS server to supplement gaps in Azure Private DNS Resolver such as acting as DNS cache or providing advanced DNS features such as virtualized DNS zones.

Summing it up

So yeah, there are a lot of patterns and each one has its own benefits and considerations. My recommendation is for customers to centralize DNS wherever possible because it makes for a fairly simple integration unless you have concerns over hitting QPS. If you have an edge use case for an isolated virtual network, consider the patterns referenced above.

It’s critically important to understand how DNS resolution works from a processing perspective when you have linked Azure Private DNS Zones and DNS Forwarding Rule Sets. The detail is here.

Alexis Plantin put together a great write-up with fancy animated diagrams that put my diagrams to shame. Definitely take a read through his write-up if anything to give him some traffic for creating the animated diagrams. I’m jealous!

There is some good guidance here which talks about considerations for forwarding timeouts when using a third-party DNS server that is forwarding queries to the Azure Private DNS Resolver or to Azure-provided DNS.

Lastly, let me end this with some benefits and considerations of the product.

  • Benefits
    • No infrastructure to manage. Microsoft is responsible for management of the compute powering the service.
    • Unlike Azure-provided DNS alone, it supports conditional forwarding to on-premises.
    • Unlike Azure-provided DNS alone, it supports resolution from on-premises to Azure without the need for a DNS proxy.
    • Supports multiple patterns for its implementation including centralized and decentralized architectures, even supporting isolated virtual networks.
    • DNS query logging can be achieved using DNS Security Policy as of 7/2025.
  • Considerations
    • The Private DNS Resolver MAY NOT support requests from non-RFC 1918 IP addresses to the inbound endpoint. This was a documented limitation, but has since been removed. However, customers of mine still report it does not work. If you have this use case, your best bet is to try it and open a support ticket if you have issues.
    • The Private DNS Resolver DOES NOT support iterative DNS queries. It only supports recursive DNS queries.
    • In high volume DNS environments, such as very large VDI deployments, query per second limits could be an issue.
    • The Private DNS Resolver does not support authenticated dynamic DNS updates. If you have this use case for a VDI deployment, you will need to use a DNS service that does support it.