Hello folks! It’s been a busy past few months. I’ve been neck deep in summer activities, customer work, and building some learning labs for the wider Azure community. I finally had some time today to dig into the NSG and improved routing features for Private Endpoints that finally hit GA (general availability) last month. While I had written about the routing changes while the features were in public preview, I wanted to do a bit more digging now that it is officially GA. In this post I’ll take a closer look at the routing changes and try to clear up some of the confusion I’ve come across about what this feature actually does.
If you work for a company using Azure, likely you’ve come across Private Endpoints. I’ve written extensively about the feature over the course of the past few years covering some of the quirks that are introduced using it at scale in an enterprise. I’d encourage you to review some of those other posts if you’re unfamiliar with Private Endpoints or you’re interested in knowing the challenges that drove feature changes such as the NSG and improved routing features.
At the most basic level, Private Endpoints are a way to control network access to instances of PaaS (platform-as-a-service) services you consume in Microsoft Azure (they can also be used for PrivateLink Services you build yourself). Like most public clouds, every instance of a PaaS service in Azure is by default available over a public IP. While there are some basic controls layer 3 controls, such as IP restrictions offered for Azure App Services or the basic firewall that comes with Azure Storage, the service is only accessible directly via its public IP address. From an operations perspective, this can lead to inconsistencies with performance when users access the services behind Private Endpoints since the access is over an Internet connection. On the security side of the fence, it can make requirements to inspect and mediate the traffic with full featured security appliances problematic. There can even be a risk of data exfiltration if you are forced to allow access to the Internet for an entire service (such as *.blog.windows.net). Additionally, you may have internal policies driven by regulation that restrict sensitive data to being accessible only within your more heavily controlled private network.
Private Endpoints help solve the issues above by creating a network endpoint (virtual network interface) for the instance of your PaaS service inside of your Azure VNet (virtual network). This can help provide consistent performance when accessing the application because the traffic can now flow over an ExpressRoute Private Peering versus the user’s Internet connection. Now that traffic is flowing through your private network, you can direct that traffic to security appliances such as a Palo Alto to centrally mediate, log, and optionally inspect traffic up to and including at layer 7. Each endpoint is also for a specific instance of a service, which can mitigate the risk of data exfiltration since you could block all access to a specific Azure PaaS service if accessed through your Internet connection.
While this was possible prior to the new routing improvements that went into GA in August, it was challenging to manage at scale. I cover the challenge in detail in this post, but the general gist of it is the Azure networking fabric creates a /32 system route in each subnet within the virtual network where the Private Endpoint is placed as well as any directly peered VNets. If you’re familiar with the basics of Azure routing you’ll understand how this could be problematic in the situation where the traffic needs to be routed through a security appliance for mediation, logging, or inspection. To get around this problem customers had to create /32 UDRs (user-defined route) to override this system route. In a hub and spoke architecture with enough Private Endpoints, this can hit the limit of routes allowed on a route table.
An example of an architecture that historically solved for this is shown below. If you have user on-premises (A) trying to get to a Private Endpoint in the spoke (H) through the Application Gateway (L) and you have a requirement to inspect that traffic via a security appliance (F, E), you need to create a /32 route on the Application Gateway’s subnet to direct the traffic back to the security appliance. If that traffic is instead for some other type of service that isn’t fronted by an App Gateway (such as Log Analytics Workspace or Azure SQL instance), those UDRs need to be placed on the route table of the Virtual Network Gateway (B). The latter scenario is where scale and SNAT (see my other post for detail on this) can quickly become a problem.
To demonstrate the feature, I’m going to use my basic hub and spoke lab with the addition of an App Service running a very basic Python Flask application I wrote to show header and IP information from a web request. I’ve additionally setup a S2S VPN connection with a pfSense appliance I have running at home which is exchanging routes via BGP with the Virtual Network Gateway. The resulting lab looks like the below.
Since Microsoft still has no simple way to enumerate effective routes without a VM’s NIC being in the subnet, and I wanted to see the system routes that the Virtual Network Gateway was getting (az network vnet-gateway list-learned-routes will not do this for you), I created a new subnet and plopped a VM into it. Looking at the route table, the /32 route for the Private Endpoint was present.
Since this was temporary and I didn’t want to mess with DNS in my on-premises lab, I created a host file entry on the on-premises machine for the App Service’s FQDN pointing to the Private Endpoint IP address. I then accessed the service from a web browser on that machine. The contents of the web request show the IP address of my machine as expected because my traffic is entering the Azure networking plane via my S2S VPN and going immediately to the Private Endpoint for the App Service.
As I covered earlier, prior to these new features being introduced, to get this traffic going through my Azure Firewall instance I would have had to create /32 UDR on the Virtual Network Gateway’s route table and I would have had to SNAT at the firewall to ensure traffic symmetry (the SNAT component is covered in a prior post). The new feature lifts the requirement for the /32 route, but in a very interesting way.
The golden rule for networking has long been the most specific route is the preferred route. For example, in Azure the /32 system route for the Private Endpoint will the preferred route even if you put in a static route for the subnet’s CIDR block (/24 for example). The new routing feature for Private Endpoints does not follow this rule as we’ll see.
Support for NSGs and routing improvements for Private Endpoints is disabled by default. There is a property of each subnet in a VNet called privateEndpointNetworkPolicies which is set to disabled by default. Swapping this property from disabled to enabled kicks off the new features. One thing to note is you only have to enable this on the subnet containing the Private Endpoint.
In my lab environment I swapped the property for the snet-app subnet in the workload VNet. Looking back at the route table for the VM in the transit virtual network, we now see that the /32 route has been made invalid. The /16 route pointing all traffic to the workload VNet to the Azure Firewall is now the route the traffic will take, which allows me to mediate and optionally inspect the traffic.
Refreshing the web page from the on-premises VM now shows a source IP of 10.0.2.5 which is one of the IPs included in the Azure Firewall subnet. Take note that I have an application rule in place in Azure Firewall which means it uses its transparent proxy feature to ensure traffic symmetry. If I had a network rule in place, I’d have to ensure Azure Firewall is SNATing my traffic (which it won’t do by default for RFC1918 traffic). While some services (Azure Storage being one of them) will work without SNAT with Private Endpoints, it’s best practice to SNAT since all other services require it. The requirement will likely be addressed in a future release.
While the support for NSGs for Private Endpoints is awesome, the routing improvements are a feature that shouldn’t be overlooked. Let me summarize the key takeaways:
- Routing improvements (docs call it UDR support which I think is a poor and confusing description) for Private Endpoints are officially general available.
- SNAT is still required and best practice for traffic symmetry to ensure return traffic from Private Endpoints takes the same route back to the user.
- The privateEndpointNetworkPolicies property only needs to be set on the subnet containing the Private Endpoints. The routing improvements will then be active for those Private Endpoints for any route table assigned to a subnet within the Private Endpoint’s VNet or any directly peered VNets.
- Even though the /32 route is still there, it is now invalidated by a less specific UDR when this setting is set on a Private Endpoints subnet. You could create a UDR for the subnet CIDR containing the Private Endpoints or the entire VNet as I did in this lab. Remember this an exception to the route specificity rule.
Well folks, that sums up this post. Hopefully got some value out of it!
IF UDR is effective – why we still need SNAT? I am not clear here.
My customer requirement is all traffic between subnets also should be
Like after enabling privatenetworkendpoint policy for PE subnet, according my UDR traffic flows like this Client VM -> NVA-> PE
return traffic as per UDR, PE-> NVA-> Client.
Are you saying even with above config SNAT is needed? But why?
Hi there. Private Endpoints do not obey user defined routes on route tables. You cannot force return traffic from a Private Endpoint back through the NVA unless you SNAT. Excluding SNAT will result in an asymmetric traffic flow.
As I covered in my blog, the UDR improvements were introduced to reduce the overhead of managing the /32 routes.
To make it even more confusing, not every private endpoint behaves the same. For example, storage magically takes the correct route back, but no other service does to my knowledge. This is why SNAT is required.
You can check out my prior post for the details https://journeyofthegeek.com/2020/09/10/interesting-behaviors-with-private-endpoints-new/.
Cloud is dynamic, so there is always the possibility this has changed. However, I am not aware of a change to this behavior. Your best bet would be to submit a ticket to support if you need confirmation.