Interesting behaviors with Private Endpoints

Interesting behaviors with Private Endpoints

Hi folks!

Working for and with organizations in highly regulated industries like federal and state governments and commercial banks often necessitates diving REALLY deep into products and technologies. This means peeling back the layers of the onion most people do not. The reason this pops up is because these organizations tend to have extremely complex environments due the length of time the organization has existed and the strict laws and regulations they must abide by. This is probably the reason why I’ve always gravitated towards these industries.

I recently ran into an interesting use case where that willingness to dive deep was needed.

A customer I was working with was wrapping up its Azure landing zone deployment and was beginning to deploy its initial workloads. A number of these workloads used Microsoft Azure PaaS (platform-as-a-service) services such as Azure Storage and Azure Key Vault. The customer had made the wise choice to consume the services through Azure Private Endpoints. I’m not going to go into detail on the basics of Azure Private Endpoints. There is plenty of official Microsoft documentation that can cover the basics and give you the marketing pitch. You can check out my pasts posts on the topic such as my series on Azure Private DNS and Azure Private Endpoints.

This particular customer chose to use them to consume the services over a private connection from both within Azure and on-premises as well as to mitigate the risk of data exfiltration that exists when egressing the traffic to Internet public endpoints or using Azure Service Endpoints. One of the additional requirements the customer had as to mediate the traffic to Azure Private Endpoints using a security appliance. The security appliance was acting as a firewall to control traffic to the Private Endpoints as well to perform deep packet inspection sometime in the future. This is the requirement that drove me down into the weeds of Private Endpoints and lead to a lot of interesting observations about the behaviors of network traffic flowing to and back from Private Endpoints. Those are the observations I’ll be sharing today.

For this lab, I’ll be using a slightly modified version of my simple hub and spoke lab. I’ve modified and added the following items:

  • Virtual machine in hub runs Microsoft Windows DNS and is configured to forward all DNS traffic to Azure DNS (168.63.129.16)
  • Virtual machine in spoke is configured to use virtual machine in hub as a DNS server
  • Removed the route table from the spoke data subnet
  • Azure Private DNS Zone hosting the privatelink.blob.core.windows.net namespace
  • Azure Storage Account named mftesting hosting some sample objects in blob storage
  • Private Endpoint for the mftesting storage account blob storage placed in the spoke data subnet
Lab environment

The first interesting observation I made was that there was a /32 route for the Private Endpoint. While this is documented, I had never noticed it. In fact most of my peers I ran this by were never aware of it either, largely because the only way you would see it is if you enumerated effective routes for a VM and looked closely for it. Below I’ve included a screenshot of the effective routes on the VM in the spoke Virtual Network where the Private Endpoint was provisioned.

Effective routes on spoke VM

Notice the next hop type of InterfaceEndpoint. I was unable to find the next hop type of InterfaceEndpoint documented in public documentation, but it is indeed related to Private Endpoints. The magic behind that next hop type isn’t something that Microsoft documents publicly.

Now this route is interesting for a few reasons. It doesn’t just propagate to all of the route tables of subnets within the Virtual Network, it also propagates to all of the route tables in directly peered Virtual Networks. In the hub and spoke architecture that is recommended for Microsoft Azure, this means that every Private Endpoint you create in a spoke Virtual Network is propagated to as a system route to route tables of each subnet in the hub Virtual Network. Below you can see a screen of the VM running in the hub Virtual Network.

Effective routes on hub VM

This can make things complicated if you have a requirement such as the customer I was working with where the customer wants to control network traffic to the Private Endpoint. The only way to do that completely is to create a /32 UDRs (user defined routes) in every route table in both the hub and spoke. With a limit of 400 UDRs per route table, you can quickly see how this may break down at scale.

There is another interesting thing about this route. Recall from effective routes for the spoke VM, that there is a /32 system route for the Private Endpoint. Since this is the most specific route, all traffic should be routed directly to the Private Endpoint right? Let’s check that out. Here I ran a port scan against the Private Endpoint using nmap using the ICMP, UDP, and TCP protocols. I then opened the Log Analytics Workspace and ran a query across the Azure Firewall logs for any traffic to the Private Endpoint from the VM and lo and behold, there is the ICMP and UDP traffic nmap generated.

Captured UDP and ICMP traffic

Yes folks that /32 route is protocol aware and will only apply to TCP traffic. UDP and ICMP traffic will not be affected. Software defined networking is grand isn’t it? ūüôā

You may be asking why the hell I decided to test this particular piece. The reason I followed this breadcrumb was my customer had setup a UDR to route traffic from the VM to an NVA in the hub and attempted to send an ICMP Ping to the Private Endpoint. In reviewing their firewall logs they saw only the ICMP traffic. This finding was what drove me to test all three protocols and make the observation that the route only affects TCP traffic.

Microsoft’s public documentation mentions that Private Endpoints only support TCP at this time, but the documentation does not specify that this system route does not apply to UDP and ICMP traffic. This can result in confusion such as it did for this customer.

So how did we resolve this for my customer? Well in a very odd coincidence, a wonderful person over at Microsoft recently published some patterns on how to approach this problem. You can (and should) read the documentation for the full details, but I’ll cover some of the highlights.

There are four patterns that are offered up. Scenario 3 is not applicable for any enterprise customer given that those customers will be using a hub and spoke pattern. Scenario 1 may work but in my opinion is going to architect you into a corner over the long term so I would avoid it if it were me. That leaves us with Scenario 2 and Scenario 4.

Scenario 2 is one I want to touch on first. Now if you have a significant background in networking, this scenario will leave you scratching your head a bit.

Microsoft Documentation Scenario 2

Notice how a UDR is applied to the subnet with the VM which will route traffic to Azure Firewall however, there is no corresponding UDR applied to the Private Endpoint. Now this makes sense since the Private Endpoint would ignore the UDR anyway since they don’t support UDRs at this time. Now you old networking geeks probably see the problem here. If the packet from the VM has to travel from A (the VM) to B (stateful firewall) to C (the Private Endpoint) the stateful firewall will make a note of that connection in its cache and be expecting packets coming back from the Private Endpoint representing the return traffic. The problem here is the Private Endpoint doesn’t know that it needs to take the C (Private Endpoint) to B (stateful firewall) to A (VM) because it isn’t aware of that route and you’d have an asymmetric routing situation.

If you’re like me, you’d assume you’d need to SNAT in this scenario. Oddly enough, due the magic of software defined routing, you do not. This struck me as very odd because in scenario 3 where everything is in the same Virtual Network you do need to SNAT. I’m not sure why this is, but sometimes accepting magic is part of living in a software defined world.

Finally, we come to scenario 4. This is a common scenario for most customers because who doesn’t want to access Azure PaaS services over an ExpressRoute connection vs an Internet connection? For this scenario, you again need to SNAT. So honestly, I’d just SNAT for both scenario 2 and 4 to make maintain consistency. I have successfully tested scenario 2 with SNAT so it does indeed work as you expect it would.

Well folks I hope you found this information helpful. While much of it is mentioned in public documentation, it lacks the depth that those of us working in complex environments need and those of us who like to geek out a bit want.

See you next post!

Helpful hints for resolving AD FS problems – Part 1

Hi everyone.

Over the past week I’ve been building a lab for an upcoming deep dive into Microsoft’s Web Application Proxy. ¬†During the course of building the lab I ran into a few interesting issues with AD FS and the Web Application Proxy that I wanted to cover. ¬†Some were similar to issues I’ve run into in production environments and some were new to me.

These issues are interesting in that there aren’t any obvious indicators of the problem in any of the typical logs. ¬†Two out of three required some trial and error to determine root cause, while the third drove me quite insane for a good two weeks before getting an answer from an “official” source. ¬†Over the course of this series of blogs I’ll cover each issue in detail with the hopes that it will help others troubleshoot these issues in the future.

Issue 1: AD FS Certificate authentication fails

I’m going to start with the problem that took me the longest to resolve and eventually required getting the answer directly from an official source.

For those of you that are unfamiliar, AD FS provides the capability to offer multi-factor authentication methods both native and third-party. ¬†Out of the box, it supports certificate-based authentication as an option for a multi-factor or “step-up” authentication mechanism.

A few months back I wanted to take advantage of the certificate authentication feature to provide a¬†two-factor authentication solution for applications integrated with AD FS. ¬†Like a good engineer I did my Googling, read the Microsoft articles and various blogs out there to understand how the feature worked and what the requirements were. ¬†I built a lab in Azure, setup an AD FS server, and ensured port 49443 was open in addition to the the typical ports required by AD FS. ¬†I created my instance of AD CS, issued a user certificate containing the user’s UPN in the subject alternate name field, and setup a sample SAML app and configured it to require Certificate authentication.

How easy it all sounds right? ¬†I navigated to the sample application and got the screen below…

Screen Shot 2017-06-04 at 9.29.35 PM

and I waited…. ¬†and waited…. and waited… ¬†Ummm, what went wrong? ¬†Well surely the AD FS log will tell me what happened.

Screen Shot 2017-06-04 at 9.34.03 PM.png

Well isn’t that odd. ¬†No errors or warnings in the AD FS Admin log. ¬†A quick check of the Application and System logs showed no errors either. ¬†Maybe the AD FS Debug log would show me something? ¬†I flipped on the log and attempted another authentication.

Screen Shot 2017-06-04 at 9.38.07 PM

Nothing as well? ¬†Maybe the server can’t query the revocation lists designated in the certificates CDP? ¬†Nope, not that either the server can successfully contact the CDP endpoints. ¬†At this point I began to get quite frustrated and attempted packet captures, Fiddler captures, and anything and everything I could think of. ¬†Nothing I tried revealed the answer.

I finally gave in (which I can tell you is incredibly challenging for me) and reached out to an “official” source. ¬†We chatted back and forth and went through much of the same steps as outlined above to ensure I didn’t miss anything.¬† However, we ran into another dead end. ¬†He then¬†reached out to some other engineers he knew and eventually we got a hit. ¬†We were told to check to see if there were any intermediary certificates stored within the trusted root certificate authorities store. ¬†Sounds like an odd circumstance, but sure why not.

Upon opening up the certificates MMC, opening the machine store, and exploring the trusted root certificate authorities store low and behold I see an intermediary certificate within the store.  I deleted the certificate, restarted the AD FS server and attempted another login to the sample claim application and hit the screen below.

Screen Shot 2017-06-04 at 9.50.16 PM

Boom, I’m finally receiving the certificate prompt. ¬†Clicking the OK button brings about the successful login below.

Screen Shot 2017-06-04 at 9.51.23 PM

So what was the issue? ¬†Apparently AD FS certificate authentication fails without generating an error in any logical location (maybe nowhere at all?)¬†if there is an intermediary certificate in the trusted root certificate authority machine store. ¬†I’ve verified this is an issue in both AD FS 2012 R2 and AD FS 2016. ¬†Now why this occurs is unknown to me. ¬†It could be the underlining HTTPS.SYS driver that pukes and doesn’t report any errors to the event logs. ¬†I didn’t get a straight answer as to why this occurs, just that it will due to some type of integrity check on the machine certificate store. ¬†Odd right?

That completes the rundown of the first of three problems I’ll be outlining in this series of blogs. ¬†Hopefully this helps save someone else some time and aggravation.

See you next post!