DNS in Microsoft Azure Part 5 – PrivateLink Endpoints and Private DNS

DNS in Microsoft Azure Part 5 – PrivateLink Endpoints and Private DNS

Updates:

7/2025 – Updated to remove mentions of DNS query logging limitations with Private DNS Resolver due to introduction of DNS Security Policies

This is part of my series on DNS in Microsoft Azure.

Hello again!

In this post I’ll be continuing my series on Azure Private Link and DNS with my 5th entry into the DNS series.  In my last post I gave some background into Private Link, how it came to be, and what it offers.  For this post I’ll be covering how the DNS integration works for Azure native PaaS services behind Private Link Private Endpoints.

Before we get into the details of how it all works let’s first look at the components that make up an Azure Private Endpoint created for an Azure native service that is integrated with Azure Private DNS. These components include (going from left to right):

  1. Virtual Network Interface – The virtual network interface (VNI) is deployed into the customer’s virtual network and reserves a private IP address that is used as the path to the Private Endpoint.
  2. Private Endpoint – The Azure resource that represents the private connectivity to the resource establishes the relationships to the other resources.
  3. Azure PaaS Service Instance – This could be a customer’s instance of an Azure SQL Server, blob endpoint for a storage account, and any other Microsoft PaaS that supports private endpoints. The key thing to understand is the Private Endpoint facilitates connectivity to a single instance of the service.
  4. Private DNS Zone Group – The Private DNS Zone Group resource establishes a relationship between the Private Endpoint and an Azure Private DNS Zone automating the lifecycle of the A record(s) registered within the zone. You may not be familiar with this resource if you’ve only used the Azure Portal.
  5. Azure Private DNS Zone – Each Azure PaaS service has a specific namespace or namespaces it uses for Private Endpoints.
Azure Private Endpoint and DNS integration components

An example of the components involved with a Private Endpoint for the blob endpoint for an Azure Storage Account would be similar to what pictured below.

Example of components for blob endpoint of Azure Storage Account

I’ll now walk through some scenarios to understand how these components work together.

Scenario 1 – Default DNS Pattern Without Private Link Endpoint with a single virtual network

DNS resolution without a Private Endpoint

In this example an Azure virtual machine needs to resolve the name of an Azure SQL instance named db1.database.windows.net. No Private Endpoint has been configured for the Azure SQL instance and the VNet is configured to use the 168.63.129.16 virtual IP and Azure-provided DNS. 

The query resolution is as follows:

  1. VM1 creates a DNS query for db1.database.windows.net. VM1 does not have a cached entry for it so the query is passed on to the DNS Server configured for the operating system. The virtual network DNS Server settings has be set to to the default of the virtual IP of 168.63.129.16 and pushed to the VNI by the Azure DHCP Service . The recursive query is sent to the virtual IP and passed on to the Azure-provided DNS service.
  2. The Azure-provided DNS services checks to see if there is an Azure Private DNS Zone named database.windows.net linked to the virtual network. Once it validates it does not, the recursive query is resolved against the public DNS namespace and the public IP 55.55.55.55 of the Azure SQL instance is returned.

Scenario 2 – DNS Pattern with Private Link Endpoint with a single virtual network

DNS Resolution with a Private Endpoint

In this example an Azure virtual machine needs to resolve the name of an Azure SQL instance named db1.database.windows.net. A Private Endpoint has been configured for the Azure SQL instance and the VNet is configured to use the 168.63.129.16 virtual IP which will use Azure-provided DNS. An Azure Private DNS Zone named privatelink.database.windows.net has been created and linked to the machine’s virtual network. Notice that a new CNAME has been created in public DNS named db1.privatelink.database.windows.net.

The query resolution is as follows:

  1. VM1 creates a DNS query for db1.database.windows.net. VM1 does not have a cached entry for it so the query is passed on to the DNS Server configured for the operating system. The virtual network DNS Server settings has be set to to the default of the virtual IP of 168.63.129.16 and pushed to the VNI by the Azure DHCP Service . The recursive query is sent to the virtual IP and passed on to the Azure-provided DNS service.
  2. The Azure-provided DNS services checks to see if there is an Azure Private DNS Zone named database.windows.net linked to the virtual network. Once it validates it does not, the recursive query is resolved against the public DNS namespace. During resolution the CNAME of privatelink.database.windows.net is returned. The Azure-provided DNS service checks to see if there is an Azure Private DNS Zone named privatelink.database.windows.net linked to the virtual network and determines there is. The query is resolved to the private IP address of 10.0.2.4 of the Private Endpoint.

Scenario 2 Key Takeaway

The key takeaway from this scenario is the Azure-provided DNS service is able to resolve the query to the private IP address because the virtual network zone link is established between the virtual network and the Azure Private DNS Zone. The virtual network link MUST be created between the Azure Private DNS Zone and the virtual network where the query is passed to the 168.63.129.16 virtual IP. If that link does not exist, or the query hits the Azure-provided DNS service through another virtual network, the query will resolve to the public IP of the Azure PaaS instance.

Great, you understand the basics. Let’s apply that knowledge to enterprise scenarios.

Scenario 3 – Azure-to-Azure resolution of Azure Private Endpoints

First up I will cover resolution of Private Endpoints within Azure when it is one Azure service talking to another in a typical enterprise Azure environment with a centralized DNS service.

Scenario 3a- Azure-to-Azure resolution of Azure Private Endpoints with a customer-managed DNS service

Azure resolution of Azure Private Endpoints using customer-managed DNS service

First I will cover how to handle this resolution using a customer-managed DNS service running in Azure. Customers may choose to do this over the Private DNS Resolver pattern because they have an existing 3rd-party DNS service (InfoBlox, BlueCat, etc) they already have experience on.

In this scenario the Azure environment has a traditional hub and spoke where there is a transit network such as a VWAN Hub or a traditional virtual network with some type of network virtual appliance handling transitive routing. The customer-managed DNS service is deployed to a virtual network peered with the transit network. The customer-managed DNS service virtual network has a virtual network link to the Private DNS Zone for privatelink.database.windows.net namespace. An Azure SQL instance named db1.database.windows.net has been deployed with a Private Endpoint in a spoke virtual network. An Azure VM has been deployed to another spoke virtual network and the DNS server settings of the virtual network has been configured with the IP address of the customer-managed DNS service.

Here, the VM running in the spoke is resolving the IP address of the Azure SQL instance private endpoint.

The query resolution path is as follows:

  1. VM1 creates a DNS query for db1.database.windows.net. VM1 does not have a cached entry for it so the query is passed on to the DNS Server configured for the operating system. The virtual network DNS Server settings has be set to 10.1.0.4 which is the IP address of the customer-managed DNS service and pushed to the virtual network interface by the Azure DHCP Service . The recursive query is passed to the customer-managed DNS service over the virtual network peerings.
  2. The customer-managed DNS service receives the query, validates it does not have a cached entry and that it is not authoritative for the database.windows.net namepsace. It then forwards the query to its standard forwarder which has been configured to the be the 168.63.129.16 virtual IP address for the virtual network in order to pass the query to the Azure-provided DNS service.
  3. The Azure-provided DNS services checks to see if there is an Azure Private DNS Zone named database.windows.net linked to the virtual network. Once it validates it does not, the recursive query is resolved against the public DNS namespace. During resolution the CNAME of privatelink.database.windows.net is returned. The Azure-provided DNS service checks to see if there is an Azure Private DNS Zone named privatelink.database.windows.net linked to the virtual network and determines there is. The query is resolved to the private IP address of 10.0.2.4 of the Private Endpoint.

Scenario 3b – Azure-to-Azure resolution of Azure Private Endpoints with the Azure Private DNS Resolver

Azure resolution of Azure Private Endpoints using Azure Private DNS Resolver

In this scenario the Azure environment has a traditional hub and spoke where there is a transit network such as a VWAN Hub or a traditional virtual network with some type of network virtual appliance handling transitive routing. An Azure Private DNS Resolver inbound and outbound endpoint has been deployed into a shared services virtual network that is peered with the transit network. The shared services virtual network has a virtual network link to the Private DNS Zone for privatelink.database.windows.net namespace. An Azure SQL instance named db1.database.windows.net has been deployed with a Private Endpoint in a spoke virtual network. An Azure VM has been deployed to another spoke virtual network and the DNS server settings of the virtual network has been configured with the IP address of the Azure Private DNS Resolver inbound endpoint IP.

Here, the VM running in the spoke is resolving the IP address of the Azure SQL instance private endpoint.

  1. VM1 creates a DNS query for db1.database.windows.net. VM1 does not have a cached entry for it so the query is passed on to the DNS Server configured for the operating system. The virtual network DNS Server settings has be set to 10.1.0.4 which is the IP address of the Azure Private DNS Resolver Inbound Endpoint IP and pushed to the virtual network interface by the Azure DHCP Service . The recursive query is passed to the Azure Private DNS Resolver Inbound Endpoint via the virtual network peerings.
  2. The inbound endpoint receives the query and passes it into the virtual network through the outbound endpoint which passes it on to the Azure-provided DNS service through the 168.63.129.16 virtual IP.
  3. The Azure-provided DNS services checks to see if there is an Azure Private DNS Zone named database.windows.net linked to the virtual network. Once it validates it does not, the recursive query is resolved against the public DNS namespace. During resolution the CNAME of privatelink.database.windows.net is returned. The Azure-provided DNS service checks to see if there is an Azure Private DNS Zone named privatelink.database.windows.net linked to the virtual network and determines there is. The query is resolved to the private IP address of 10.0.2.4 of the Private Endpoint.

Scenario 3 Key Takeaways

  1. When using the Azure Private DNS Resolver, there are a number of architectural patterns for both the centralized model outlined here and a distributed model. You can reference this post for those details.
  2. It’s not necessary to link the Azure Private DNS Zone to each spoke virtual network as long as you have configured the DNS Server settings of the virtual network to the IP address of your centralized DNS service which should be running in a virtual network which has virtual network links to all of the Azure Private DNS Zones used for PrivateLink.

Scenario 4 – On-premises resolution of Azure Private Endpoints

Let’s now take a look at DNS resolution of Azure Private Endpoints from on-premises machines. As I’ve covered in past posts Azure Private DNS Zones are only resolvable using the Azure-provided DNS service which is only accessible through the 168.63.129.16 virtual IP which is not reachable outside the virtual network. To solve this challenge you will need an endpoint within Azure to proxy the DNS queries to the Azure-provided DNS service and connectivity from-premises into Azure using Azure ExpressRoute or a VPN.

Today you have two options for the DNS proxy which include bringing your own DNS service or using the Azure Private DNS Resolver. I’ll cover both for this scenario.

Scenario 4a – On-premises resolution of Azure Private Endpoints using a customer-managed DNS Service

On-premises resolution of Azure Private Endpoints using customer-managed DNS service

In this scenario the Azure environment has a traditional hub and spoke where there is a transit network such as a VWAN Hub or a traditional virtual network with some type of network virtual appliance handling transitive routing. The customer-managed DNS service is deployed to a virtual network peered with the transit network. The customer-managed DNS service virtual network has a virtual network link to the Private DNS Zone for privatelink.database.windows.net namespace. An Azure SQL instance named db1.database.windows.net has been deployed with a Private Endpoint in a spoke virtual network.

An on-premises environment is connected to Azure using an ExpressRoute or VPN. The on-premises DNS service has been configured with a conditional forwarder for database.windows.net which points to the customer-managed DNS service running in Azure.

The query resolution path is as follows:

  1. The on-premises machine creates a DNS query for db1.database.windows.net. After validating it does not have a cached entry it sends the DNS query to the on-premises DNS server which is configured as its DNS server.
  2. The on-premises DNS server receives the query, validates it does not have a cached entry and that it is not authoritative for the database.windows.net namespace. It determines it has a conditional forwarder for database.windows.net pointing to 10.1.0.4 which is the IP address of the customer-managed DNS service running in Azure. The query is recursively passed on to the customer-managed DNS service via the ExpressRoute or Site-to-Site VPN connection
  3. The customer-managed DNS service receives the query, validates it does not have a cached entry and that it is not authoritative for the database.windows.net namepsace. It then forwards the query to its standard forwarder which has been configured to the be the 168.63.129.16 virtual IP address for the virtual network in order to pass the query to the Azure-provided DNS service.
  4. The Azure-provided DNS services checks to see if there is an Azure Private DNS Zone named database.windows.net linked to the virtual network. Once it validates it does not, the recursive query is resolved against the public DNS namespace. During resolution the CNAME of privatelink.database.windows.net is returned. The Azure-provided DNS service checks to see if there is an Azure Private DNS Zone named privatelink.database.windows.net linked to the virtual network and determines there is. The query is resolved to the private IP address of 10.0.2.4 of the Private Endpoint.

Scenario 4b – On-premises resolution of Azure Private Endpoints using Azure Private DNS Resolver

On-premises resolution of Azure Private Endpoints using Azure Private DNS Resolver

Now let me cover this pattern when using the Azure Private DNS Resolver. I’m going to assume you have some basic knowledge of how the Azure Private DNS Resolver works and I’m going to focus on the centralized model. If you don’t have baseline knowledge of the Azure Private DNS Resolver or you’re interested in the distributed mode and the pluses and minuses of it, you can reference this post.

In this scenario the Azure environment has a traditional hub and spoke where there is a transit network such as a VWAN Hub or a traditional virtual network with some type of network virtual appliance handling transitive routing. The Private DNS Resolver is deployed to a virtual network peered with the transit network. The Private DNS Resolver virtual network has a virtual network link to the Private DNS Zone for privatelink.database.windows.net namespace. An Azure SQL instance named db1.database.windows.net has been deployed with a Private Endpoint in a spoke virtual network.

An on-premises environment is connected to Azure using an ExpressRoute or VPN. The on-premises DNS service has been configured with a conditional forwarder for database.windows.net which points to the Private DNS Resolver inbound endpoint.

The query resolution path is as follows:

  1. The on-premises machine creates a DNS query for db1.database.windows.net. After validating it does not have a cached entry it sends the DNS query to the on-premises DNS server which is configured as its DNS server.
  2. The on-premises DNS server receives the query, validates it does not have a cached entry and that it is not authoritative for the database.windows.net namespace. It determines it has a conditional forwarder for database.windows.net pointing to 10.1.0.4 which is the IP address of the inbound endpoint for the Azure Private DNS Resolver running in Azure. The query is recursively passed on to the inbound endpoint over the ExpressRoute or Site-to-Site VPN connection
  3. The inbound endpoint receives the query and passes it into the virtual network through the outbound endpoint which passes it on to the Azure-provided DNS service through the 168.63.129.16 virtual IP.
  4. The Azure-provided DNS services checks to see if there is an Azure Private DNS Zone named database.windows.net linked to the virtual network. Once it validates it does not, the recursive query is resolved against the public DNS namespace. During resolution the CNAME of privatelink.database.windows.net is returned. The Azure-provided DNS service checks to see if there is an Azure Private DNS Zone named privatelink.database.windows.net linked to the virtual network and determines there is. The query is resolved to the private IP address of 10.0.2.4 of the Private Endpoint.

Scenario 4 Key Takeaways

The key takeaways from this scenario are:

  1. You must setup a conditional forwarder on the on-premises DNS server for the PUBLIC namespace of the service. While using the privatelink namespace may work with your specific DNS service based on how the vendor has implemented, Microsoft recommends using the public namespace.
  2. Understand the risk you’re accepting with this setup. All DNS resolution for the public namespace will now be sent up to the Azure Private DNS Resolver or customer-managed DNS service. If your connectivity to Azure goes down, or those DNS components are unavailable, your on-premises endpoints may start having failures accessing websites that are using Azure services (think images being pulled from an Azure storage account).
  3. If your on-premises DNS servers use non-RFC1918 address space, you will not be able to use scenario 3b. The Azure Private DNS Resolver inbound endpoint DOES NOT support traffic received from non-RFC1918 address space.

Other Gotchas

Throughout these scenarios you have likely observed me using the public namespace when referencing the resources behind a Private Endpoint (example: using db1.database.windows.net versus using db1.privatelink.database.windows.net). The reason for doing this is because the certificates for Azure PaaS services does not include the privatelink namespace in the certificate provisioned to the instance of the service. There are exceptions for this, but they are few and far between. You should always use the public namespace when referencing a Private Endpoint unless the documentation specifically tells you not to.

Let me take a moment to demonstrate what occurs when an application tries to access a service behind a Private Endpoint using the PrivateLink namespace. In this scenario there is a virtual machine which has been configured with proper resolution to resolve Private Endpoints to the appropriate Azure Private DNS Zone.

Resolution of Private Endpoint to private IP address

Now I’ll attempt to make an HTTPS connection to the Azure Key Vault instance using the PrivateLink namespace of privatelink.vaultcore.azure.net. In the image below you can see the error returned states the PrivateLink namespace is not included in the subject alternate name field of the certificate presented by the Azure Key Vault instance. What this means is the client can’t verify the identity of the server because the identities presented in the certificate doesn’t match the identity that was requested. You’ll often see this error as a certificate name mismatch in most browsers or SDKs.

Certificate name mismatch error

Final Thoughts

There are some key takeaways for you with this post:

  1. Know your DNS resolution path. This is absolutely critical when troubleshooting Private Endpoint DNS resolution.
  2. Always check your zone links. 99% of the time you’re going to be used the centralized model for DNS described in this post. After you verify your DNS resolution path, validate that you’ve linked the Private DNS Zone to your DNS Server / Azure Private DNS Resolver virtual network.
  3. Create your on-premises conditional forwarders for the PUBLIC namespaces for Azure PaaS services, not the Private Link namespace.
  4. Call your services behind Private Endpoints using the public hostname not the Private Link hostname. Using the Private Link hostname will result in certificate mismatches when trying to establish secure sessions.
  5. Don’t go and link your Private DNS Zone to every single virtual network. You don’t need to do this if you’re using the centralized model. There are very rare instances where the zone must be linked to the virtual network for some deployment check the product group has instituted, but that is rare.
  6. Centralize your Azure Private DNS Zones in a single subscription and use a single zone for each PrivateLink service across your environments (prod, non-prod, test, etc). If you try to do different zones for different environments you’re going to run into challenges when providing on-premises resolution to those zones because you now have two authorities for the same namespace.

Before I close out I want to plug a few other blog posts I’ve assembled for Private Endpoints which are helpful in understanding the interesting way they work.

  • This post walks through the interesting routes Private Endpoints inject in subnet route tables. This one is important if you have a requirement to inspect traffic headed toward a service behind a Private Endpoint.
  • This post covers how Network Security Groups work with Private Endpoints and some of the routing improvements that were recently released to help with inspection requirements around Private Endpoints.

Thanks!

DNS in Microsoft Azure Part 4 – Private Link DNS

DNS in Microsoft Azure Part 4 – Private Link DNS

Updated October 2023

This is part of my series on DNS in Microsoft Azure.

Hi there geeks!

Azure Private Link is a common topic for customers. It provides a service provider with the ability to inject a path to the instance of their service into a customer’s virtual network. It should come as no surprise that Microsoft makes full use this service to provide its customers with the same capability for it’s PaaS services. While the DNS configuration is straightforward for 3rd-party-provided Private Link services, there is some complexity to DNS when using it for access to Microsoft PaaS services behind a Private Link Private Endpoint.

Before I dive into the complexities of DNS, I want to cover how and why the service came to be.

The service was introduced in September 2019.  One of the primary drivers behind the introduction of the service was to address the customer demand for secure and private connectivity to native Azure services and 3rd-party services.  Native Azure PaaS services used to be accessible only via public IP addresses which required traffic to traverse the public Internet. If the customer wanted the traffic to take a known path to that public IP and have some assurance of consistency in the latency, the customer was forced to implement ExpressRoute with Microsoft Peering (formerly known as ExpressRoute Public Peering) which can be complex and comes with lots of considerations.

Microsoft first tried to address this technical challenge with Service Endpoints, which were introduced in February 2018.  For you AWS folk, the Service Endpoints are probably closest to VPC Gateway Endpoints.  Service Endpoints provide a means for services deployed with the virtual network to access an Azure PaaS service directly over the Microsoft backbone while also tagging the traffic egressing the Service Endpoint with the virtual network identity. The customer’s instance of that PaaS service could then be locked down access from a specific virtual network.

Service Endpoints
Access to public IP of Azure PaaS services

Service Endpoints came with a few gaps. One gap was Service Endpoints are not routable from on-premises so machines coming from on-premises can’t benefit from their deployment. The other gap was a Service Endpoint creates a data exfiltration risk because it is a more efficient route to ALL instances of the Azure PaaS service. Customers were often implementing Service Endpoints directly on the compute subnets which would cause traffic to bypass an organization’s security appliance (think traditional hub-and-spoke) making that exfiltration risk that much greater.

Microsoft made an attempt to mitigate the exfiltration risk Service Endpoint policies which are similar to VPC Gateway Endpoint Policies in controls could be applied via the policy to limit which instances of a PaaS service the Service Endpoint would allow traffic to. Unfortunately, Service Endpoint Policies never seemed to catch on and they are limited to Azure Storage.

Microsoft needed a solution to these two gaps and that’s how Azure Private Link came to be. Azure Private Link includes the concept of an Azure Private Link Service and Private Link Endpoint. I won’t be digging into the details of the Private Link Service component, because the focus of this two-part series is on DNS and its role in providing name resolution for Microsoft native PaaS Private Link Private Endpoints. I do want to cover the benefits Private Link brings to the table because it will reinforce why it’s so important to under the DNS integration.

Private Link addresses the major gaps in Service Endpoints by doing the following:

  • Private access to services running on the Azure platform through the provisioning of a virtual network interface within the customer VNet that is assigned one of the VNet IP addresses from the RFC1918 address space.
  • Makes the services routable and accessible over private IP space to resources running outside of Azure such as machines running in an on-premises data center or virtual machines running in other clouds.
  • Protects against data exfiltration by the Private Endpoint providing access to only a specific instance of a PaaS service.
Azure Private Link
Azure Private Link architecture

Now that you understand what Private Link brings to the table, I’ll focus on the DNS integration required to support Azure native PaaS services deployed behind a Private Link Private Endpoint.

The series is continued in my second post.

Deep Dive into Azure AD and AWS SSO Integration – Part 4

Deep Dive into Azure AD and AWS SSO Integration – Part 4

Today we continue exploring the new integration between Microsoft’s Azure AD (Azure Active Directory) and AWS (Amazon Web Services) SSO (Single Sign-On).  Over the past three posts I’ve covered the high level concepts of both platforms, the challenges the integration seeks to solve, and how to enable the federated trust which facilitates the single sign-on experience.  If you haven’t read through those posts, I recommend you before you dive into this one.  In this post I’ll be covering the neatest feature of the new integration, which is the support for automated provisioning.

If you’ve ever worked in the identity realm before, you know the pains that come with managing the life cycle of an identity from initial provisioning, changes resulting to the identity such as department and position changes, to the often forgotten stage of de-provisioning.  On-premises these problems were used solved by cobbled together scripts or complex identity management solution such as SailPoint Identity IQ or Microsoft Identity Manager.  While these tools were challenging to implement and operate, they did their job in the world of Windows Active Directory, LDAP, SQL databases and the like.

Then came cloud, and all bets were off.  Identity data stores skyrocketed from less than a hundred to hundreds and sometimes thousands (B2C has exploded far beyond event that).  Each new cloud service introduced into the enterprise introduced yet another identity management challenge.  While some of these offerings have APIs that support identity management operations, most do not, and those that do are proprietary in nature.  Writing custom code to each of the APIs is a huge challenge that most enterprises can’t keep up.  The result is often manual management of an identity life cycle, through uploading exported CSV files or some poor soul pointing and clicking a thousand times in a vendor portal.

Wouldn’t it be great if there was some mythical standard out that would help to solve this problem, use a standard API through REST, and support the JSON format?  Turns out there is and that standard is SCIM (System for Cross-domain Identity Management).  You may be surprised to know the standard has been around for a while now (technically 2011).  I recall hearing about it at a Gartner conference many many hears ago.  Unfortunately, it’s taken a long time to catch on but support is steadily increasing.

Thankfully for us, Microsoft has baked support into Azure AD and AWS recognized the value and took advantage of the feature.  By doing this, the identity life cycle challenges of managing an Azure AD and AWS integration has been heavily re-mediated and our lives made easier.

Azure AD Provisioning - Example

Azure AD Provisioning – Example

Let’s take a look at how set it up, shall we?

The first place you’ll need to go is into the AWS account which is the master for the organization and into the AWS SSO Settings.  In Settings you’ll see the provisioning option which is initially set as manual.  Select to enable automatic provisioning.

AWS SSO Settings - Provisioning

AWS SSO Settings – Provisioning

Once complete, a SCIM endpoint will be created.  This is the endpoint in AWS (referred to as the SCIM service provider in the SCIM standard) that the SCIM service on Azure AD (referred to as the client in the SCIM standard) will interact with to search for, create, modify, and delete AWS users and groups.  To interact with this endpoint, Azure AD must authenticate to it, which it does with a bearer access token that is issued by AWS SSO.  Be aware that the access token has a one year life span, so ensure you set some type of reminder.  A quick search through the boto3 API doesn’t show a way to query for issued access tokens (yes you can issue more than one at at time) so you won’t be able to automate the process as of yet.

awssso-scimendpoint.png

After SCIM is enabled, AWS SSO Settings for provisioning now reports SCIM in use.

awssso-scimenabled.png

Next you’ll need to bounce over to Azure AD and go into the enterprise app you created (refer to my third post for this process).   There you’ll navigate to the Provisioning blade and select Automatic as the provisioning method.

azuread-scimprov.png

You’ll then need to configure the URL and access token you collected from AWS and test the connection.  This will cause Azure AD to test querying the endpoint for a random user and group to validate functionality.

azuread-scimtest.png

If your test is successful you can then save the settings.

azuread-scimtestsucccess.PNG

You’re not done yet.  Next you have to configure a mapping which map attributes in Azure AD to the resource and attributes in the SCIM schema.  Yes folks, SCIM does have a schema for attributes and resources (like users and groups).  You can extend it as needed, but in this integration it looks to be using the default user and group resources.

azuread-scimmappings

Let’s take a look at what the group mappings look like.

azuread-scimgroupmappings.PNG

The attribute names on the left are the names of the attributes in Azure AD and the attributes on the right are the names of the attributes Azure AD will write the values of the attributes to in AWS SSO.  Nothing too surprising here.

How about the user mappings?

azuread-scimusermappings1azuread-scimusermappings2

Lots more attributes in the user mappings by default.  Now I’m not sure how many of these attributes AWS SSO supports.  According to the SCIM standard, a client can attempt to write whatever it wants and any attributes the service provider doesn’t understand is simply discarded.  The best list of attributes I could find were located here, and it’s not near this number.  I can’t speak to what the minimum required attributes are to make AWS work, because their official instructions on this integration doesn’t say.  I know some of the product team sometimes reads the blog, so maybe we’ll luck out and someone will respond with that answer.

The one tweak you’ll need to make here is to delete the mailNickName mapping and replace it with a mapping of objectId to externalId.  After you make the change, click the save icon.

I don’t know why AWS requires this so I can only theorize.  Maybe they’re using this attribute as a primary key in the back end database or perhaps they’re using it to map the users to the groups?  I’m not sure how Azure AD is writing the members attribute over to AWS.  Maybe in the future I’ll throw together a basic app to visualize what the service provider end looks like.

newmapping.PNG

Now you need to decide what users and groups you want to sync to AWS SSO.  Towards the bottom of the provisioning blade, you’ll see the option to toggle the provisioning status.  The scope drop down box has an option to sync all users and groups or to sync only assigned users and groups.  Best practice here is basic security, only sync what you need to sync, so leave the option on sync only assigned users and group.

The assigned users and groups refers to users that have been assigned to the enterprise application in Azure AD.  This is configured on the Users and Groups blade for the enterprise app.  I tested a few different scenarios using an Azure AD dynamic group, standard group, and a group synchronized from Windows AD.  All worked successfully and synchronized the relevant users over.

Once you’re happy with your settings, toggle the provisioning status and save the changes.  It may take some time depending on how much you’re syncing.

syncsuccess.PNG

If the sync is successful, you’ll be able to hop back over to AWS SSO and you’ll see your users and groups.

awssyncedusersawssyncedgroup

Microsoft’s official documentation does a great job explaining the end to end cycle.  The short of it is there’s an initial cycle which grabs all users and groups from Azure AD, then filters the list down to the users and groups assigned to the application.  From there it queries the target system to match the user with the matching attribute and if it isn’t found creates it, and if found and needs updating, updates it.

Incremental cycles are down from that point forward every 40 minutes.  I couldn’t find any documentation on how to adjust the synchronization frequency.  Be aware of that 40 minute sync and consider the end to end synchronization if you’re sourcing from Windows Active Directory.  In that case making changes in Windows AD could take just over an hour (assuming you’re using the 30 minute sync interval in Azure AD Connect) to fully synchronize.

awsssotime.PNG

As I described in my third post, I have a lab environment setup where a Windows Active Directory domain is syncing to Azure AD.  I used that environment to play out a few scenarios.

In the first scenario I disabled Marge Simpson’s account.  After waiting some time for changes to synchronize across both platforms, I saw in AWS SSO that Marge Simpson was now disabled.

margedisabled.PNG

For another scenario, I removed Barney Gumble from the Network Operators Active Directory group.  After waiting time for the sync to complete, the Network Operators group is now empty reflecting Barney’s removal from the group.

networkoperators.PNG

Recall that I assigned four groups to the app in Azure AD, Network Operators, Security Admins, Security Auditors, and Systems Operators.  These are the four groups syncing to AWS SSO.  Barney Gumble was only a member of the Network Operators group, which means removing him put him out of scope for the app assignment.  In AWS SSO, he now reports as being disabled.

barneydisabled.PNG

For our final scenario, let’s look at what happens when I deleted Barney Gumble from Windows Active Directory.  After waiting the required replication time, Barney Gumble’s user account was still present in AWS SSO, but set as disabled.  While Barney wouldn’t be able to login to AWS SSO, there would still be cleanup that would need to happen on the AWS SSO directory to remove the stale identity records.

barneydisabled.PNG

The last thing I want to cover is the logging capabilities of the SCIM service in Azure AD.  There are two separate logs you can reference.  The first are the Provisioning Logs which are currently in preview.  These logs are going to be your go to to troubleshoot issues with the provisioning process.  They’re available with an Azure AD P1 or above license and are kept for 30 days.  Supposedly they’re kept for free for 7 days, but the documentation isn’t clear whether or not you have the ability to consume them.  I also couldn’t find any documentation on if it’s possible to pull the logs from an API for longer term retention or analysis in Log Analytics or a 3rd party logging solution.

If you’ve ever used Azure AD, you’ll be familiar with the second source of logs.  In the Azure AD Audit logs, you get additional information, which while useful, is more catered to tracking the process vs troubleshooting the process like the provisioning logs.

Before I wrap up, let’s cover a few key findings:

  • The access token used to access the SCIM endpoint in AWS SSO has a one year lifetime.  There doesn’t seem to be a way to query what tokens have been issued by AWS SSO at this time, so you’ll need to manage the life cycle in another manner until the capability is introduced.
  • Users that are removed from the scope of the sync, either by unassigning them from the app or deleting their user object, become disabled in AWS SSO.  The records will need to be cleaned up via another process.
  • If synchronizing changes from a Windows AD the end to end synchronization process can take over an hour (30 minutes from Windows AD to Azure AD and 40 minutes from Azure AD to AWS SSO).

That will wrap up this post.  In my opinion the SCIM service available in Azure AD is extremely under utilized.  SCIM is a great specification that needs more love.  While there is a growing adoption from large enterprise software vendors, there is a real opportunity for your organization to take advantage of the features it offers in the same way AWS has.  It can greatly ease the pain your customers and enterprise users experience having to manage the life cycle of an identity and makes for a nice belt and suspenders to modern identity capabilities in an application.

In the last post of my series I’ll demonstrate a few scenarios showing how simple the end to end experience is for users.  I’ll include some examples of how you can incorporate some of the advanced security features of Azure AD to help protect your multi-cloud experience.

See you next post!

 

Deep Dive into Azure AD and AWS SSO Integration – Part 3

Deep Dive into Azure AD and AWS SSO Integration – Part 3

Back for more are you?

Over the past few posts I’ve been covering the new integration between Azure AD and AWS SSO.  The first post covered high level concepts of both platforms and some of the problems with the initial integration which used the AWS app in the Azure Marketplace.  In the second post I provided a deep dive into the traditional integration with AWS using a non-Azure AD security token service like AD FS (Active Directory Federation Services), what the challenges were, how the new integration between Azure AD and AWS SSO addresses those challenges, and the components that make up both the traditional and the new solution.  If you haven’t read the prior posts, I highly recommend you at least read through the second post.

Azure AD and AWS SSO Integration

New Azure AD and AWS SSO Integration

In this post I’m going to get my hands dirty and step through the implementation steps to establish the SAML trust between the two platforms.  I’ve setup a fairly simple lab environment in Azure.  The lab environment consists of a single VNet (virtual network) with a four virtual machines with the following functions:

  • dc1 – Windows Active Directory domain controller for jogcloud.com domain
  • adcs – Active Directory Certificate Services
  • aadc1 – Azure Active Directory Connect (AADC)
  • adfs1 – Active Directory Federation Services

AADC has been configured to synchronize to the jogcloud.com Azure Active Directory tenant.  I’ve configured federated authentication in Azure AD with the AD FS server acting as an identity provider and Windows Active Directory as the credential services provider.

visio of lab environment

Lab Environment

On the AWS side I have three AWS accounts setup associated with an AWS Organization.  AWS SSO has not yet been setup in the master account.

Let’s setup it up, shall we?

The first thing you’ll need to do is log into the AWS Organization master account with an account with appropriate permissions to enable AWS SSO for the organization.  If you’ve never enabled AWS SSO before, you’ll be greeted by the following screen.

1.png

Click the Enable AWS SSO button and let the magic happen in the background.  That magic is provisioning of a service-linked role for AWS SSO in each AWS account in the organization.  This role has a set of permissions which include the permission to write to the AWS IAM instance in the child account.  This is used to push the permission sets configured in AWS SSO to IAM roles in the accounts.

Screenshot of AWS SSO IAM Role

AWS SSO Service-Linked IAM Role

After about a minute (this could differ depending on how many AWS accounts you have associated with your organization), AWS SSO is enabled and you’re redirected to the page below.

Screenshot of AWS SSO successfully enabled page

AWS SSO Successfully Enabled

Now that AWS SSO has been configured, it’s time to hop over to the Azure Portal.  You’ll need to log into the portal as a user with sufficient permissions to register new enterprise applications.  Once logged in, go into the Azure Active Directory blade and select the Enterprise Applications option.

Register new Enterprise Application

Register new Enterprise Application

Once the new blade opens select the New Application option.

Register new application

Register new application

Choose the Non-gallery application potion since we don’t want to use the AWS app in the Azure Marketplace due to the issues I covered in the first post.

Choose Non-gallery application

Choose Non-gallery application

Name the application whatever you want, I went with AWS SSO to keep it simple.  The registration process will take a minute or two.

Registering application

Registering application

Once the process is complete, you’ll want to open the new application and to go the Single sign-on menu item and select the SAML option.  This is the menu where you will configure the federated trust between your Azure AD tenant and AWS SSO on the Azure  AD end.

SAML Configuration Menu

SAML Configuration Menu

At this point you need to collect the federation metadata containing all the information necessary to register Azure AD with AWS SSO.  To make it easy, Azure AD provides you with a link to directly download the metadata.

Download federation metadata

Download federation metadata

Now that the new application is registered in Azure AD and you’ve gotten a copy of the federation metadata, you need to hop back over to AWS SSO.  Here you’ll need to go to Settings.  In the settings menu you can adjust the identity source, authentication, and provisioning methods for AWS SSO.  By default AWS SSO is set to use its own local directory as an identity source and itself for the other two options.

AWS SSO Settings

AWS SSO Settings

Next up, you select the Change option next to the identity source.  As seen in the screenshot below, AWS SSO can use its own local directory, an instance of Managed AD or BYOAD using the AD Connector, or an external identity provider (the new option).  Selecting the External Identity Provider option opens up the option to configure a SAML trust with AWS SSO.

Like any good authentication expert, you know that you need to configure the federated trust on both the identity provider and service provider.  To do this we need to get the federation metadata from AWS SSO, which AWS has been lovely enough to also provide it to us via a simple download link which you’ll want to use to get a copy of the metadata we’ll later import into Azure AD.

Now you’ll need to upload the federation metadata you downloaded from Azure AD in the Identity provider metadata section.  This establishes the trust in AWS SSO for assertions created from Azure AD.  Click the Next: Review button and complete the process.

AWS SSO Identity Sources

Configure SAML trust

You’ll be asked to confirm changing the identity source.  There a few key points I want to call out in the confirmation page.

  • AWS SSO will preserve your existing users and assignments -> If you have created existing AWS SSO users in the local directory and permission sets to go along with them, they will remain even after you enable it but those users will no longer be able to login.
  • All existing MFA configurations will be deleted when customer switches from AWS SSO to IdP.  MFA policy controls will be managed on IdP -> Yes folks, you’ll now need to handle MFA.  Thankfully you’re using Azure AD so you plenty of options there.
  • All items about provisioning – You have to option to manually provision identities into AWS SSO or use the SCIM endpoint to automatically provision accounts.  I won’t be covering it, but I tested manual provisioning and the single sign-on aspect worked flawless.  Know it’s an option if you opt to use another IdP that isn’t as fully featured as Azure AD.

Confirmation prompt

Confirmation prompt

Because I had to, I popped up the federation metadata to see what AWS requiring in the order of claims in the SAML assertion.  In the screenshot below we see is requesting the single claim of nameid-format:emailaddress.  This value of this claim will be used to map the user to the relevant identity in AWS SSO.

AWS SSO Metadata

Back to the Azure Portal once again where you’ll want to hop back to Single sign-on blade of the application you registered.  Here you’ll click the Upload metadata file button and upload the AWS metadata.

Uploading AWS federation metadata

Uploading AWS federation metadata

After the upload is successful you’ll receive a confirmation screen.  You can simple hit the Save button here and move on.

Confirming SAML

Confirming SAML

At this stage you’ve now registered your Azure AD tenant as an identity provider to AWS SSO.  If you were using a non-Azure AD security token service, you could now manually provision your users AWS SSO, create the necessary groups and permissions sets, and administer away.

I’ll wrap up there and cover the SCIM provisioning in the next post.  To sum it up, in this post we configured AWS SSO in the AWS Organization and established the SAML federated trust between the Azure AD tenant and AWS SSO.

See you next post!