Interesting behaviors with Private Endpoints

Interesting behaviors with Private Endpoints

Hi folks!

Working for and with organizations in highly regulated industries like federal and state governments and commercial banks often necessitates diving REALLY deep into products and technologies. This means peeling back the layers of the onion most people do not. The reason this pops up is because these organizations tend to have extremely complex environments due the length of time the organization has existed and the strict laws and regulations they must abide by. This is probably the reason why I’ve always gravitated towards these industries.

I recently ran into an interesting use case where that willingness to dive deep was needed.

A customer I was working with was wrapping up its Azure landing zone deployment and was beginning to deploy its initial workloads. A number of these workloads used Microsoft Azure PaaS (platform-as-a-service) services such as Azure Storage and Azure Key Vault. The customer had made the wise choice to consume the services through Azure Private Endpoints. I’m not going to go into detail on the basics of Azure Private Endpoints. There is plenty of official Microsoft documentation that can cover the basics and give you the marketing pitch. You can check out my pasts posts on the topic such as my series on Azure Private DNS and Azure Private Endpoints.

This particular customer chose to use them to consume the services over a private connection from both within Azure and on-premises as well as to mitigate the risk of data exfiltration that exists when egressing the traffic to Internet public endpoints or using Azure Service Endpoints. One of the additional requirements the customer had as to mediate the traffic to Azure Private Endpoints using a security appliance. The security appliance was acting as a firewall to control traffic to the Private Endpoints as well to perform deep packet inspection sometime in the future. This is the requirement that drove me down into the weeds of Private Endpoints and lead to a lot of interesting observations about the behaviors of network traffic flowing to and back from Private Endpoints. Those are the observations I’ll be sharing today.

For this lab, I’ll be using a slightly modified version of my simple hub and spoke lab. I’ve modified and added the following items:

  • Virtual machine in hub runs Microsoft Windows DNS and is configured to forward all DNS traffic to Azure DNS (168.63.129.16)
  • Virtual machine in spoke is configured to use virtual machine in hub as a DNS server
  • Removed the route table from the spoke data subnet
  • Azure Private DNS Zone hosting the privatelink.blob.core.windows.net namespace
  • Azure Storage Account named mftesting hosting some sample objects in blob storage
  • Private Endpoint for the mftesting storage account blob storage placed in the spoke data subnet
Lab environment

The first interesting observation I made was that there was a /32 route for the Private Endpoint. While this is documented, I had never noticed it. In fact most of my peers I ran this by were never aware of it either, largely because the only way you would see it is if you enumerated effective routes for a VM and looked closely for it. Below I’ve included a screenshot of the effective routes on the VM in the spoke Virtual Network where the Private Endpoint was provisioned.

Effective routes on spoke VM

Notice the next hop type of InterfaceEndpoint. I was unable to find the next hop type of InterfaceEndpoint documented in public documentation, but it is indeed related to Private Endpoints. The magic behind that next hop type isn’t something that Microsoft documents publicly.

Now this route is interesting for a few reasons. It doesn’t just propagate to all of the route tables of subnets within the Virtual Network, it also propagates to all of the route tables in directly peered Virtual Networks. In the hub and spoke architecture that is recommended for Microsoft Azure, this means that every Private Endpoint you create in a spoke Virtual Network is propagated to as a system route to route tables of each subnet in the hub Virtual Network. Below you can see a screen of the VM running in the hub Virtual Network.

Effective routes on hub VM

This can make things complicated if you have a requirement such as the customer I was working with where the customer wants to control network traffic to the Private Endpoint. The only way to do that completely is to create a /32 UDRs (user defined routes) in every route table in both the hub and spoke. With a limit of 400 UDRs per route table, you can quickly see how this may break down at scale.

There is another interesting thing about this route. Recall from effective routes for the spoke VM, that there is a /32 system route for the Private Endpoint. Since this is the most specific route, all traffic should be routed directly to the Private Endpoint right? Let’s check that out. Here I ran a port scan against the Private Endpoint using nmap using the ICMP, UDP, and TCP protocols. I then opened the Log Analytics Workspace and ran a query across the Azure Firewall logs for any traffic to the Private Endpoint from the VM and lo and behold, there is the ICMP and UDP traffic nmap generated.

Captured UDP and ICMP traffic

Yes folks that /32 route is protocol aware and will only apply to TCP traffic. UDP and ICMP traffic will not be affected. Software defined networking is grand isn’t it? 🙂

You may be asking why the hell I decided to test this particular piece. The reason I followed this breadcrumb was my customer had setup a UDR to route traffic from the VM to an NVA in the hub and attempted to send an ICMP Ping to the Private Endpoint. In reviewing their firewall logs they saw only the ICMP traffic. This finding was what drove me to test all three protocols and make the observation that the route only affects TCP traffic.

Microsoft’s public documentation mentions that Private Endpoints only support TCP at this time, but the documentation does not specify that this system route does not apply to UDP and ICMP traffic. This can result in confusion such as it did for this customer.

So how did we resolve this for my customer? Well in a very odd coincidence, a wonderful person over at Microsoft recently published some patterns on how to approach this problem. You can (and should) read the documentation for the full details, but I’ll cover some of the highlights.

There are four patterns that are offered up. Scenario 3 is not applicable for any enterprise customer given that those customers will be using a hub and spoke pattern. Scenario 1 may work but in my opinion is going to architect you into a corner over the long term so I would avoid it if it were me. That leaves us with Scenario 2 and Scenario 4.

Scenario 2 is one I want to touch on first. Now if you have a significant background in networking, this scenario will leave you scratching your head a bit.

Microsoft Documentation Scenario 2

Notice how a UDR is applied to the subnet with the VM which will route traffic to Azure Firewall however, there is no corresponding UDR applied to the Private Endpoint. Now this makes sense since the Private Endpoint would ignore the UDR anyway since they don’t support UDRs at this time. Now you old networking geeks probably see the problem here. If the packet from the VM has to travel from A (the VM) to B (stateful firewall) to C (the Private Endpoint) the stateful firewall will make a note of that connection in its cache and be expecting packets coming back from the Private Endpoint representing the return traffic. The problem here is the Private Endpoint doesn’t know that it needs to take the C (Private Endpoint) to B (stateful firewall) to A (VM) because it isn’t aware of that route and you’d have an asymmetric routing situation.

If you’re like me, you’d assume you’d need to SNAT in this scenario. Oddly enough, due the magic of software defined routing, you do not. This struck me as very odd because in scenario 3 where everything is in the same Virtual Network you do need to SNAT. I’m not sure why this is, but sometimes accepting magic is part of living in a software defined world.

Finally, we come to scenario 4. This is a common scenario for most customers because who doesn’t want to access Azure PaaS services over an ExpressRoute connection vs an Internet connection? For this scenario, you again need to SNAT. So honestly, I’d just SNAT for both scenario 2 and 4 to make maintain consistency. I have successfully tested scenario 2 with SNAT so it does indeed work as you expect it would.

Well folks I hope you found this information helpful. While much of it is mentioned in public documentation, it lacks the depth that those of us working in complex environments need and those of us who like to geek out a bit want.

See you next post!

12 thoughts on “Interesting behaviors with Private Endpoints

  1. Thanks for the blog post. This is really helpful. One quick question, have you ever noticed any issues with directing private endpoint traffic via Firewall? Majorly network and/or speed issues.

    Like

    • I have not but I could certainly see it occurring since the firewall becomes the bottleneck. I typically don’t recommend my customers route private endpoint traffic through a firewall. It’s operationally burdensome right now due to the /32 routes. You are better off documenting the security exception for now. There are changes on the horizon that will make this easier and more viable.

      Like

      • Thanks for the quick reply and I agree. I am also skeptical to use it as it becomes single point of failure too. I could not find any documentation on firewall’s performance data to reach a logical conclusion.

        Like

  2. Just ran into this same behavior. Thinking about solving it with a separate vnet with only a peering to the HUB. Possibly with a transit vnet in between so we dont get all the /32 routes in the HUB. Firewall is a requirement so traffic has to be routed through the Hub.

    Like

      • With a third party firewall, you’ll want to check with your vendor. Each vendor will have a different space where that is configured. However, you’ll configure your device to SNAT in the same manner, where you are SNATing anytime traffic hits your firewall with a destination IP of a PE. You can either use the /32 (not scalable but fine for testing) or the larger address space for the spoke(s) the PEs live in

        Like

  3. Hi,
    Can you please explain for scenario 2, where exactly you place and configure the SNAT and which method you’ve used – it would be in the LB that exposes the NVA to the internet? I am using 3rd party firewall (CloudGuard). i understand from previous comments that ” you’ll configure your device to SNAT in the same manner, where you are SNATing anytime traffic hits your firewall with a destination IP of a PE. You can either use the /32 (not scalable but fine for testing) or the larger address space for the spoke(s) the PEs live in”. Can you please describe for dummies 🙂 ? thanks.

    Like

    • That’s not a dummy question at all!

      The typical pattern for a 3rd party firewall in Azure involves giving the VM three network interfaces. One network interface is reserved for management, one for “private traffic” (east/west) which is behind an internal load balancer, and one for “public traffic” (north/south) which is behind an external load balancer. Both the “private” and “public” interface are assigned IPs from the CIDR block assigned to the virtual network.

      Cloud load balancers such as the Azure Load Balancer use some magic in encapsulation to preserve the public IP of the machine making the incoming call to Azure. This means the packet that arrives to the firewall’s “public” interface will see the source IP as a public IP. You would likely need to SNAT there.

      The “public” interface is typically blackholed to the rest of Azure. This means the firewall will need to internally route the traffic from the “public” interface to the “private” interface SNATing to the private interface such that the compute within the virtual network will see the private interface IP as the source and will be able to appropriately return traffic.

      Using a PE, that pattern shouldn’t change. Take note that the assumption I’m making here is you’re ingressing non HTTP/HTTPS traffic, because that type of traffic is typically ingressed via a layer 7 load balancer like App Gateway. If you have a requirement to funnel that traffic through the firewall after it ingresses through the App Gateway, that’s a different discussion because the pattern is more complex. Take a look at my most recent blog post that covers a possible pattern for that.

      A great resource for understanding how 3rd party firewalls integrate with Azure is the Palo Alto reference architecture for Azure document. Palo does a stellar job walking through this flow patterns in detail. You can also check out a Git repo I’ve been assembling that walks through common Azure networking patterns (https://github.com/mattfeltonma/azure-networking-patterns).

      Hope that helps!

      Like

  4. “Scenario 1 may work but in my opinion is going to architect you into a corner over the long term so I would avoid it if it were me.”

    Can you please give me some examples of how this would architect us into a corner?

    Like

    • There a few reasons I’m not a fan of that pattern.

      1. You are placing multiple Private Endpoints from different workloads in the same Virtual Network. This means that Private Endpoint resources (the private endpoints themselves, not the services they are fronting) need to be placed in the same subscription as the Virtual Network. This would likely need to be some type of core services or shared services subscription.

      Unless you’re far down the path of automation, this means that some centralized team will need to manage the lifecycle of these Private Endpoint resources separate from the resources themselves. This potentially increases the operational overhead, decreases developer agility, and separates workload resources across subscriptions which could cause issues with lifecycle management.

      2. Traffic from a workload that needs to communicate with the Private Endpoint (say a Key Vault dedicated to a workload) needs to traverse two peering (we may do something funky at the SDN layer to make this a single peering, but let’s assume two for the purposes of this convo), one to the hub and one to the spoke. This will result in additional costs. These costs could be significant if the Private Endpoint is sitting in front of an Azure SQL database and the app just executed a bad query that is returning GBs of data.

      3. Moving away from this pattern and to a pattern where the private endpoints exist in the spoke means downtime. Each Private Endpoint would need to be recreated which would result in a change of IP address. TTLs of cached DNS records pointing to the old IPs of these Private Endpoints would add to the potential pain of testing and validation. Scale this across a few hundred applications, and this becomes a fairly large migration undertaking with downtime and risk of service disruption.

      Are all of these items surmountable? Most definitely, with proper planning and preparation anything can be done. However, at scale, this could lead to being a lot of work and a lot of risk that no one wants to fund and sign off. We have all been part of those decisions we believed to be temporary, but ended up being permanent due to the cost and risk of reversing them. This is less of an architectural corner caused by a technical constraint, and more of an architectural corner caused by a business constraint (cost and risk).

      Customers do move forward with this pattern and it is a Microsoft supported pattern. However, you need to be aware of the potential work sitting in front of you if you go this route, and you need to decide internally if this is a corner for your org or if it’s something you can eventually work around.

      Again, the above is just my opinion. I’m sure you can find many folks who would disagree. However, that’s the great part of this blog, I get to toss out my opinion. 🙂

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s