Welcome back fellow geeks!
During brainstorming sessions with peers or customer conversations, “what if” type scenarios pop up. These are typically scenarios that aren’t documented at all or buried deep in the depths of the Internet. I thought it would be fun to create an ongoing series where I share some of the “what if” scenarios I run into and what I found when I labbed them out.
Lately I’ve been having a lot of internal and customer conversations around DNS and resolution with Azure Private Endpoints. Out of those conversations came some interesting “what if” scenarios. Those scenarios will be the subject of this post.
What if I create a second Private Endpoint for a single Azure resource and register it to the same Azure Private DNS zone?
This question popped up in some ongoing discussions around disaster recovery when Private Endpoints are in use. Specifically, the discussion was around Azure Storage accounts configured for GRS (geo-redundant storage). In this scenario a customer is accessing a storage account from on-premises via a Private Endpoints and is blocking all access to the storage account over the public endpoint. The customer has configured the storage account to register its record in the Azure Private DNS zone.
In the event an entire region fails, the private endpoint (seen above with IP of 10.0.1.4) in the primary region will become unavailable causing traffic to drop. If the customer creates a second private endpoint (seen above with the IP of 10.1.1.4) either ahead of time or after the failure what would happen when the private endpoint tried to register a duplicate record in Azure Private DNS?
Would the registration of the A record fail? Would it add an IP to the record set? Would it overwrite it?
The answer is it will overwrite the record. This means that the A record see in the above diagram for st1.privatelink.blob.core.windows.net would be overwritten to point to the new Private Endpoint address of 10.1.1.4.
What if I don’t want to depend solely on Azure Private DNS?
This conversation has popped up a lot lately with customers and in comments in my Azure DNS and Private Link blog series. As I discuss in that series, in its current state Private Endpoints depend heavily on Azure Private DNS. Azure Private DNS is relatively new so it’s a very basic DNS service with no fancy geo-load balancing capabilities or probes. This can create administrative overhead in the case of disaster recovery scenarios. Customers are eager to leverage their own DNS products such as InfoBlox or F5 GTMs due to the advanced capabilities of those products.
The challenge is customers are stuck using the privatelink namespaces (such as privatelink.database.windows.net) Microsoft has defined for the services. Now that wouldn’t be a problem if the customer could access the service directly by the privatelink FQDN (fully-qualified domain name), but that won’t work for encryption in transit to Microsoft PaaS (platform-as-a-service) offerings because the privatelink FQDN isn’t supported on the certificates provisioned to the services. This results in the dreaded certificate name mismatch scenario where the client (your machine) can’t verify the identity of the server because the server’s certificate doesn’t contain the identity you’re trying to access (mystuff.privatelink.database.windows.net). This requires you access the server using the public FQDN (mystuff.database.windows.net) which goes through the resolution path I discuss in my private link DNS series.
Unfortunately, there aren’t a ton of great options. I walk through some of the options in my series and Dan Mauser has done a wonderful job walking through some others in his postings. In his post, he discusses two solutions:
- Creating a conditional forwarder for the FQDN of the Azure resource
- Creating a forward lookup zone for the FQDN of the Azure resource.
The most common on-premises integration pattern looks something like the below In this pattern on-premises DNS servers are configured with a conditional forwarder to send all DNS queries for Azure PaaS services (such as database.windows.net) to a DNS resolver in Microsoft Azure. That resolver is configured with a standard forwarder to send all of its queries to the 18.104.22.168 virtual IP.
One of the downfalls of this pattern if you’re forwarding queries for all of public namespaces of the Azure PaaS DNS services you’re using up to Azure. If your ExpressRoute drops or S2S VPN (Site-to-Site VPN) drops, those queries will either timeout and fail to resolve or timeout and resolve to the public IP addresses.
In scenario 1, customers try to avoid that problem by creating conditional forwarders for each Azure resource’s private endpoint. For example, if you have a database named mydb.database.windows.net, you would create a conditional forwarder on-premises with the name of mydb.database.windows.net and point it to your upstream Azure DNS server which would go through the standard resolution path to resolve to the A record in Azure Private DNS.
In scenario 2, customers try to avoid the same problem by creating a forward zone for each Azure resource’s private endpoint. The concept is the same as a conditional forwarder where the forward zone is named the same as your resource (mydb.database.windows.net) but with the difference that your DNS server will resolve the record authoritatively.
So in short, both of these options work but in no way are they scalable to manage from a lifecycle perspective because of the scale and ephemeral nature of cloud. Creation or deletion of a forward zone or conditional forwarder is server-level change making it more likely someone will break something versus modification of an A record thus increasing the risk of this pattern. Finally, if you are using an Active Directory-integrated DNS zone, you get the added shi*t show of bloating your Active Directory DIT with the creations and deletions of all these records at scale.
My recommendation is to stick with the standard pattern I outlined above if you can. The Azure Private DNS service will evolve over time and more than likely new capabilities will be added to help address these gaps.