A Deep Dive Into Azure Route Server

Hello again fellow geeks.

I recently had a customer reach out to me with an interest in learning more about ARS (Azure Route Server). The customer hoped it might ease some of the burden of managing routing across a large Azure implementation. I had yet to mess around with ARS since it only recently went GA (generally available), so I figured this would be a great opportunity to build out a lab and give it a whirl. Let’s get to it!

ARS is one of Microsoft’s new networking offerings in Azure offering a managed routing service that is hooked into Azure’s SDN (software defined network). The way I like to think of the service is a couple of VMs (virtual machines) that are managed by Microsoft, running a BGP (Border Gateway Protocol) service, and which have the ability to program routes directly into VNets (virtual networks). In addition to introducing some pretty cool new networking patterns such as an SD WAN and ExpressRoute pattern and dual-homed network, the feature that most interested my customer was its capability of BGP peering with a customer’s NVA (network virtual appliance) such as a Palo Alto or Cisco appliance and injecting those learned dynamic routes into the VNet. This is the feature which I’ll cover in this blog post.

I did some thinking about how I wanted to lab this out and what I wanted to use as an NVA. Since the behavior I wanted to test was primarily BGP, I figured I’d keep it simple and run a Linux VM which would host a lightweight BGP service. I found a wonderful post from Adam Stuart (seriously a great read and really cool Azure networking pattern in his post) which mentioned Exabgp. After doing a bit of research on it, it looked relatively easy to setup and use so the choice was made (Thanks Adam!) In addition to the NVA, I decided to build out a hub-and-spoke architecture and make use of my home pfSense appliance for S2S VPN (site-to-site virtual private network) connectivity and to replicate an on-premises environment. The result was the lab pictured below (image 1).

Image 1 – Azure Route Server Lab

One thing you may notice in the above image is ARS has a public IP address associated to it. Remember when I mentioned this is a couple of Microsoft-managed VMs? Well this public IP facilitates Microsoft management of the VMs similar to other managed services such as Azure SQL MI (Managed Instance). Before you ask, no you can’t associate an NSG (Network Security Group) to the subnet ARS is provisioned into, and the subnet has to be named RouteServerSubnet.

I setup a S2S VPN connection and BGP peering between my pfSense appliance and Azure VPN Gateway and advertised the set of routes documented in the lab diagram (image 1). I then provisioned a couple of Ubuntu VMs with two in the hub, one in the first spoke, and one in the second spoke.

Exabgp was a bit painful to setup because the documentation out there is a bit sparse and I had to piece it together from multiple sources. Given that, I’m going to spend a few minutes walking through the setup.

I first setup my ARS using the instructions in this link. Ensure you use a valid ASN for your NVA (autonomous system number).

Once ARS is setup, you can begin setting up Exabgp on the VM using the commands below.

sudo apt update
sudo apt install exabgp

You’ll then need to modify the service file located in /lib/systemd/system/exabgp.service to uncomment the lines below.

...
[Service]
#User=exabgp
#Group=exabgp
Environment=exabgp_daemon_daemonize=false
PermissionsStartOnly=true
ExecStartPre=-mkfifo /run/exabgp.in
...

Now you’ll need to create a configuration file for exabgp and save it to /etc/exabgp/exabgp.conf. Below is the configuration file I used for the lab.

neighbor 10.0.2.4 {
        router-id 10.0.0.4;
        local-address 10.0.0.4;
        local-as 65010;
        peer-as 65515;

        static {
                route 192.168.10.0/24 next-hop 10.0.0.4;
        #       route 192.168.1.0/24 next-hop 10.0.0.4 as-path [65010 65010] community [65010:2];
                route 192.168.1.0/24 next-hop 10.0.0.4 community [65010:2];
                route 10.1.0.0/16 next-hop 10.0.0.4;
                route 10.10.0.0/16 next-hop 10.0.0.4;
                route 0.0.0.0/0 next-hop 10.0.0.4;
        }
}
neighbor 10.0.2.5 {
        router-id 10.0.0.4;
        local-address 10.0.0.4;
        local-as 65010;
        peer-as 65515;

        static {
                route 192.168.10.0/24 next-hop 10.0.0.4;
        #       route 192.168.1.0/24 next-hop 10.0.0.4 as-path [65010 65010] community [65010:2];
                route 192.168.1.0/24 next-hop 10.0.0.4 community [65010:2];
                route 10.1.0.0/16 next-hop 10.0.0.4;
                route 10.10.0.0/16 next-hop 10.0.0.4;
                route 0.0.0.0/0 next-hop 10.0.0.4;
        }
}

Each instance of ARS is configured to be highly available and is deployed across availability zones if the region supports it. It comes with two BGP peering IPs you’ll need to peer with and advertise your routes to. If you only peer one with or advertise different routes, you’ll get funky behaviors. In the configuration file above, each JSON object represents one of these peers. You’ll notice I’m advertising a number of routes which will demonstrate different behaviors I’ll walk through with this post.

Once you’ve done the above you can start the service and validate that it successfully started.

sudo systemctl start exabgp
sudo systemctl status exabgp

Give it a minute and then you can run the following command to output the routes the exabgp service has learned from ARS.

sudo exabgpcli show adj-rib in extensive

In the output below (image 2) you can see ARS advertising the address space of the hub and spoke1 Vnet.

Image 2 – Exabgp output

So why isn’t spoke 2 being advertised? Well ARS requires the peerings between the VNets to be configured with UseRemoteGateway and AllowGatewayTransit properties (referred to in the Portal as Use the remote virtual network’s gateway or Route Server and Use this virtual network’s gateway or Route Server) in order for ARS to propagate routes between the Vnets. As you’ll note from the lab image, I enabled these settings for spoke 1 but not spoke 2.

This requirement does present a challenge in that when it’s enabled ARS propagates routes to the peered Vnets, but so does your ExpressRoute or VPN Gateway. My customer base typically requires all traffic leaving the spoke to flow through a security appliance. If you do not enable this setting the spoke only knows about itself and the hub VNet. To direct the traffic through an appliance in a hub you only need two UDRs (one for 0.0.0.0/0 and one for the hub VNet address space). This makes it easy to stamp out spokes from a networking perspective and optionally audit and enforce with Azure Policy.

Looking at the effective routes of the VM in spoke 1 (image 3), you can see routes highlighted in red are routes being propagated into the spoke by my VPN Gateway which is receiving them from my pfSense appliance. To ensure traffic between on-premises and Azure flows through a security appliance you’d need to define all of the routes you’re propagating over your ExpressRoute/VPN in your NVA. This way ARS propagates them to the peered VNets and overrides the ones coming in from the gateway.

Image 3 – Spoke 1 Effective Routes

Recall from my lab image (image 1), I’m sending 192.168.0.1/24, 192.168.2.0/24, 192.168.3.0/24, and 192.168.4.0/24 over BGP to Azure. In the image above (image 3) you’ll notice three out of four of these routes are coming from my VPN Gateway (10.0.3.5), but 192.168.0.1/24 is coming from Exabgp VM (10.0.0.4). The documentation says that when two routes for the same address space but different AS PATH lengths are received from different NVAs, ARS will only program the route with the shorter AS PATH. Apparently this isn’t just a trait of ARS and is a trait of the Azure SDN.

To validate this is the case, I edited my Exabgp config file and appended a longer AS PATH length to the route as seen below.

Exabgp config with modified AS PATH

After about a minute, the route table on my spoke 1 VM now displays the route for 192.168.0.1 coming from the VPN Gateway because the route coming from my Exabgp now has a longer AS PATH. This indicates that when two routes for the same address space are advertised, the route with the shortest AS PATH gets programmed to the VNet while the route with the longer AS PATH is seemingly discarded. I would have expected to see the 192.168.0.1/24 coming from on-premises with an Active value of Invalid, but instead it’s discarded.

Image 4 – VPN Gateway with shorter AS PATH

The next route I want to look at it is 10.1.0.0/16 which is the address space of spoke 1. I configured the Exabgp VM to advertise this route to ARS. Looking at the effective routes for the VM in the hub (VM-DEMO) the only route for this address space is the system route for the peering. This is because system routes for the VNet, VNet peering, or service endpoints are the preferred routes, even if BGP routes are more specific. This means you can’t use ARS to push routes that would force traffic from a spoke to flow through an NVA in the hub because of the VNet peering system route. For that use case it looks like you will need to continue to use UDRs (user defined routes)

I have the default route of 0.0.0.0/0 which I am advertising from the Exabgp VM. You can see in the above images is propagating to both the hub and spoke. Take note that if you advertise this route, you’re going to want to ensure you place a route table and UDR on the subnet your NVAs are in. Otherwise you’ll run into a scenario where you’ll have a routing loop because the default route would be received by the NVAs subnet.

Let’s pull the VPN Gateway into the mix. ARS does support BGP peering with an ExpressRoute or VPN Gateway. This allows you to propagate the routes ARS is learning from the NVA back on-premises. You enable this functionality by enabling the Branch-to-branch feature of ARS. In my lab environment, I commented out the default route in my Exabgp conf file because I didn’t want to send that back on-premises because I don’t know how to filter out routes in FRRoute (the BGP service running on pfSense). I then ran an az network vnet-gateway list-learned-routes and confirmed my VNG is now receiving routes from ARS as seen in the image below (image 5).

Image 5 – VPN Gateway learned routes

On the pfSense side I was now able to see the routes coming from my NVA coming over the VPN Gateway and back my on-premises appliance. These are the routes with an AS PATH of 65510 65515 65010 in the image below (image 6).

Image 6 – pfSense learned routes

The above shows that the routes coming from the NVA are being propagated back on-premises. What about the routes coming from on-premises? Are they being propagated all the way to the NVA? Let’s check it out!

To verify this I printed the routes ARS is advertising to the NVA about using az network routeserver peering list-advertised-routes. The routes in bold are the routes coming from the pfSense appliance showing that ARS is receiving the routes and advertising them to the Exabgp VM.

RouteServiceRole_IN_0:
- asPath: '65515'
  localAddress: 10.0.2.4
  network: 10.0.0.0/16
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0
- asPath: '65515'
  localAddress: 10.0.2.4
  network: 10.1.0.0/16
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0
- asPath: 65515-65510-65501
  localAddress: 10.0.2.4
  network: 192.168.4.0/24
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0
- asPath: 65515-65510-65501
  localAddress: 10.0.2.4
  network: 192.168.2.0/24
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0
- asPath: 65515-65510-65501
  localAddress: 10.0.2.4
  network: 192.168.3.0/24
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0

Lastly, I wanted to see the routes and details on the Exabpg VM. For that I ran the command below on the VM.

sudo exabgpcli show adj-rib in extensive
Image 7 – Exabgp learned routes

In the image above (image 7) the on-premises routes for the 192.168.0 address spaces have been learned and have the AS PATH leading back on-premises. Note the community value of 65517:65517 being attached to the routes. That is not coming from my pfSense appliance but rather the VPN Gateway. The community tag is used to identify these routes as coming from a VPN Gateway and are used by Microsoft for filtering.

Well folks that about wraps it up. The key takeaways of my time with ARS were the following:

  • ARS requires the UseRemoteGateway and AllowGatewayTransit properties be configured on VNet peering. This makes managing traffic flow between an spoke and on-premises a bit more complicated. Instead of defining a 0.0.0.0/0 route and calling it a day, you would need to define all the routes in your NVA and propagate them using ARS.

    This isn’t necessarily a bad thing, you’re just shifting management of routing outside the Azure control plane and into the NVA data plane. That may be your preference.
  • When multiple routes for the same address space and different length AS PATHs are propagated to a VNet, the Azure SDN will only program the route the shortest AS PATH. The other route with the longer AS PATH is discarded.
  • You can’t use an NVA and ARS to propagate the hub and spoke VNet address spaces back on-premises because system routes for the VNet, VNet peering, and Service Endpoints supersede routes propagated via BGP. Instead, you could use a larger summarized route encompassing the entire address space for the VNets in your region and propagate that back on-premises using the NVA and ARS.
  • You can use an NVA and ARS to propagate routes from Azure back to on-premises. A use case might be that you want all Internet-bound traffic to egress out of Azure because you get better performance than you’re getting from your ISP on-premises. I’ve never seen it, but nevertheless. 🙂

Hopefully some of the above helps you become more familiar with Azure Route Server and what the benefits and considerations are.

See you next post!

Identifying Orphaned Managed Identities

Hello again fellow geeks!

Recently I was giving a customer an overview of Azure Managed Identities and came across an interesting find while building a demo environment. If you’re unfamiliar with managed identities, check out my prior series for an overview. Long story short, managed identities provide a solution for non-human identities where you don’t have to worry about storing, securing, and rotating the credentials. For those of you coming from AWS, managed identities are very similar to AWS Roles. They come in two flavors, user-assigned and system-assigned. For the purposes of this post, I’ll be focusing on system-assigned.

Under the hood, a managed identity is essentially a service principal with some orchestration on top of it. Interestingly enough, there are a number of different service principal types. Running the command below will spit back the different types of service principals that exist in your Azure AD tenant.

az ad sp list --query='[].servicePrincipalType' --all | sort | uniq
Service principal types

If you’re interested in seeing the service principals associated with managed identities in your Azure AD tenant, you can run the command below.

az ad sp list --query="[?servicePrincipalType=='ManagedIdentity']" --all

Managed identities include a property called alternativeNames which is an array. In my testing I observed two values within this array. The first value is “isExplicit=True” or “isExplicit=False” which is set to True for user-assigned managed identities and False when it’s a system-assigned managed identity. If you want to see all system-assigned managed identities for example, you can run the command below.

az ad sp list --query="[?servicePrincipalType=='ManagedIdentity' && alternativeNames[?contains(@,'isExplicit=False')]]" --all

The other value in this array is the resource id of the managed identity in the case of a user-assigned managed identity. With a system-assigned managed identity this is the resource id of the Azure resource the system-assigned managed identity is associated with.

System-assigned managed identity

So why does any of this matter? Before we get to that, let’s cover the major selling point of a system-assigned managed identity when compared to a user-assigned managed identity. With a system-assigned managed identity, the managed identity (and its service principal) share the lifecycle of the resource. This means that if you delete the resource, the service principal is cleaned up… well most of the time anyway.

Sometimes this cleanup process doesn’t happen and you’re left with orphaned service principals in your directory. The most annoying part is you can’t delete these service principals (I’ve tried everything including calls direct to the ARM API) and the only way to get them removed is to open a support ticket. Now there isn’t a ton of risk I can think of with having these orphaned service principals left in your tenant since I’m not aware of any means to access the credential associated with it. Without the credential no one can authenticate as it. Assuming the RBAC permissions are cleaned up, it’s not really authorized to do anything within Azure either. However, beyond dirtying up your directory, it’s an identity with a credential that shouldn’t be there anymore.

I wanted an easy way to identify these orphaned system-assigned managed identities so I could submit a support ticket and get it cleaned up before it started cluttering up my demonstration tenant. This afternoon I wrote a really ugly bash script to do exactly that. The script uses some of the az cli commands I’ve listed above to identify all the system-assigned managed identities and then uses az cli to determine if the resource exists. If the resource doesn’t exist, it logs the displayName property of the system-assigned managed identity to a text file. Quick and dirty, but does the job.

Orphaned system-assigned managed identities

Interestingly enough, I had a few peers run the script on their tenants and they all had some of these orphaned system-assigned managed identities, so it seems like this problem isn’t restricted to my tenants. Again, I personally can’t think of a risk of these identities remaining in the directory, but it does point to an issue with the lifecycle management processes Microsoft is using in the backend.

Well folks that’s it! Have a great night!

Behavior of Azure Event Hub Network Security Controls

Behavior of Azure Event Hub Network Security Controls

Welcome back fellow geeks.

In this post I’ll be covering the behavior of the network security controls available for Azure Event Hub and the “quirk” (trying to be nice here) in enforcement of those controls specifically for Event Hub. This surfaced with one of my customers recently, and while it is publicly documented, I figured it was worthy of a post to explain why an understanding of this “quirk” is important.

As I’ve mentioned in the past, my primary customer base consists of customers in regulated industries. Given the strict laws and regulations these organizations are subject to, security is always at the forefront in planning for any new workload. Unless you’ve been living under a rock, you’ve probably heard about the recent “issue” identified in Microsoft’s CosmosDB service, creatively named ChaosDB. I’ll leave the details to the researchers over at Wiz. Long story short, the “issue” reinforced the importance of practicing defense-in-depth and ensuring network controls are put in place where they are available to supplement identity controls.

You may be asking how this relates to Event Hub? Like CosmosDB and many other Azure PaaS (platform-as-a-service) services, Event Hub has more than one method to authenticate and authorize to access to both the control plane and the data plane. The differences in these planes are explained in detail here, but the gist of it is the control plane is where interactions with the Azure Resource Manager (ARM) API occurs and involves such operations as creating an Event Hub, enabling network controls like Private Endpoints, and the like. The data plane on the other hand consists of interactions with the Event Hub API and involves operations such as sending an event or receiving an event.

Azure AD authentication and Azure RBAC authorization uses modern authentication protocols such as OpenID Connect and OAuth which comes with the contextual authorization controls provided by Azure AD Conditional Access and the granularity for authorization provided by Azure RBAC. This combination allows you to identify, authenticate, and authorize humans and non-humans accessing the Event Hub on both planes. Conditional access can be enforced to get more context about a user’s authentication (location, device, multi-factor) to make better decisions as to the risk of an authentication. Azure RBAC can then be used to achieve least privilege by granting the minimum set of permissions required. Azure AD and Azure RBAC is the recommended way to authenticate and authorize access to Event Hub due to the additional security features and modern approach to identity.

The data plane has another method to authenticate and authorize into Event Hubs which uses shared access keys. There are few PaaS services in Azure that allow for authentication using shared access keys such as Azure Storage, CosmosDB, and Azure Service Bus. These shared keys are generated at the creation time of the resource and are the equivalent to root-level credentials. These keys can be used to create SAS (shared access signatures) which can then be handed out to developers or applications and scoped to a more limited level of access and set with a specific start and expiration time. This makes using SAS a better option than using the access keys if for some reason you can’t use Azure AD. However, anyone who has done key management knows it’s an absolute nightmare of which you should avoid unless you really want to make your life difficult, hence the recommendation to use Azure AD for both the control plane and data plane.

Whether you’re using Azure AD or SAS, the shared access keys remain a means to access the resource with root-like privileges. While access to these keys can be controlled at the control plane using Azure RBAC, the keys are still there and available for use. Since the usage of these keys when interacting with the data plane is outside Azure AD, it means conditional access controls aren’t an option. Your best bet in locking down the usage of these keys is to restrict access at the control plane as to who can retrieve the access keys and use the network controls available for the service. Event Hubs supports using Private Endpoints, Service Endpoint, and the IP-based restrictions via what I will refer to as the service firewall.

If you’ve used any Azure PaaS service such as Key Vault or Azure Storage, you should be familiar with the service firewall. Each instance of a PaaS service in Azure has a public IP address exposes that service to the Internet. By default, the service allows all traffic from the Internet and the method of controlling access to the service is through identity-based controls provided by the supported authentication and authorization mechanism.

Access to the service through the public IP can be restricted to a specific set of IPs, specific Virtual Networks, or to Private Endpoints. I want quickly to address a common point of confusion for customers; when locking down access to a specific set of Virtual Networks such as available in this interface, you are using Service Endpoints. This list should be empty because there are few very situations where you are required to use a Service Endpoint now because of the availability of Private Endpoints. Private Endpoints are the strategic direction for Microsoft and provide a number of benefits over Service Endpoints such as making the service routable from on-premises over an ExpressRoute or VPN connection and mitigating the risk of data exfiltration that comes with the usage of Service Endpoints. If you’ve been using Azure for any length of time, it’s worthwhile auditing for the usage of Service Endpoints and replacing them where possible.

Service Firewall options

Now this is where the “quirk” of Event Hubs comes in. As I mentioned earlier, most of the Azure PaaS services have a service firewall with a similar look and capabilities as above. If you were to set the the service firewall to the setting above where the “Selected Networks” option is selected and no IP addresses, Virtual Networks, or Private Endpoints have been exempted you would assume all traffic is blocked to the service right? If you were talking about a service such as Azure Key Vault, you’d be correct. However, you’d be incorrect with Event Hubs.

The “quirk” in the implementation of the service firewall for Event Hub is that it is still accessible to the public Internet when the Selected Networks option is set. You may be thinking, well what if I enabled a Private Endpoint? Surely it would be locked down then right? Wrong, the service is still fully accessible to the public Internet. While this “quirk” is documented in the public documentation for Event Hubs, it’s inconsistent to the behaviors I’ve observed in other Microsoft PaaS services with a similar service firewall configuration. The only way to restrict access to the public Internet is to add a single entry to the IP list.

Note in public documentation

So what does this mean to you? Well this means if you have any Event Hubs deployed and you don’t have a public IP address listed in the IP rules, then your Event Hub is accessible from the Internet even if you’ve enabled a Private Endpoint. You may be thinking it’s not a huge deal since “accessible” means open for TCP connections and that authentication and authorization still needs to occur and you have your lovely Azure AD conditional access controls in place. Remember how I covered the shared access key method of accessing the Event Hubs data plane? Yeah, anyone with access to those keys now has access to your Event Hub from any endpoint on the Internet since Azure AD controls don’t come into play when using keys.

Now that I’ve made you wish you wore your brown pants, there are a variety of mitigations you can put in place to mitigate the risk of someone exploiting this. Most of these are taken directly from the security baseline Microsoft publishes for the service. These mitigations include (but are not limited to):

  • Use Azure RBAC to restrict who has access to the share access keys.
  • Take an infrastructure-as-code approach when deploying new Azure Event Hubs to ensure new instances are configured for Azure AD authentication and authorization and that the service firewall is properly configured.
  • Use Azure Policy to enforce Event Hubs be created with correctly configured network controls which include the usage of Private Endpoints and at least one IP address in the IP Rules. You can use the built-in policies for Event Hub to enforce the Private Endpoint in combination with this community policy to ensure Event Hubs being created include at least one IP of your choosing. Make sure to populate the parameter with at least one IP address which could be a public IP you own, a non-publicly routable IP, or a loopback address.
  • Use Azure Policy to audit for existing Event Hubs that may be publicly available. You can use this policy which will look for Event Hub hubs with a default action of Allow or Event Hubs with an empty IP list.
  • Rotate the access keys on a regular basis and whenever someone who had access to the keys changes roles or leaves the organization. Note that rotating the access keys will invalidate any SAS, so ensure you plan this out ahead of time. Azure Storage is another service with access keys and this article provides some advice on how to handle rotating the access keys and the repercussions of doing it.

If you’re an old school security person, not much of the above should be new to you. Sometimes it’s the classic controls that work best. 🙂 For those of you that may want to test this out for yourselves and don’t have the coding ability to leverage one of the SDKs, take a look at the Event Hub add-in for Visual Studio Code. It provides a very simplistic interface for testing sending and receiving messages to an Event Hub.

Well folks, I hope you’ve found this post helpful. The biggest piece of advice I can give you is to ensure you read documentation thoroughly whenever you put a new service in place. Never assume one service implements a capability in the same way (I know, it hurts the architect in me as well), so make sure you do your own security testing to validate any controls which fall into the customer responsibility column.

Have a great week!

A Pattern for App Services with Private Endpoints

A Pattern for App Services with Private Endpoints

Hello again fellow geeks.

Last year Microsoft announced the general availability of Private Endpoint support for App Services. This was a feature I was particularly excited about because when combined with regional Vnet integration, it offered an alternative to an ASE (App Service Environment) for customers who wanted to run internally-facing applications or who didn’t feel comfortable exposing their publicly facing application directly via a public IP.

Over the past year I created a few different labs that demonstrated these features including a web app and function app. Each lab was built around a hub and spoke architecture where the application was serving as an internally-facing application. Given my recent post on Azure Firewall Premium, and the fact I still had the lab environment up and running, I thought it would be interesting to switch the virtual machine out and switch App Services in, which resulted in the lab environment pictured below.

Lab environment

I established the following goals for the lab environment:

  1. Perform IDPS on traffic to the web app
  2. Mediate and inspect traffic initiated from the web app to third-party APIs
  3. Expose a web application to the Internet without giving it a direct public IP address

To accomplish these goals I was able to re-use much of the pattern I described in my last post.

The first step in the process was to create a very simple hub and spoke architecture as pictured above. The environment would consist of a transit Vnet (virtual network), shared services Vnet, and spoke Vnet. The transit Vnet would contain the guts of the solution with an Azure Firewall Premium SKU instance and Azure Application Gateway v2 instance. In the shared services Vnet I used a Windows Server VM (virtual machine) running the DNS Server service and providing DNS resolution for the environment. I could have taken a shortcut here and used Azure Firewall’s DNS proxy capability, but I like to leave the options open for conditional forwarding which Azure Firewall and Azure Private DNS do not support at this time. Finally, in the spoke Vnet I created two subnets. One subnet hosted the private endpoint for App Services and the other subnet was delegated to App Services to support regional Vnet integration.

Once all components were in place, I deployed a simple Python web application I wrote. All it does it queries two public APIs, one for the current time and the other for a random Breaking Bad quote (greatest show of all time!). I took the cheating way out and deployed the app directly via Visual Studio using the Azure App Service add-in prior to deploying the private endpoint and regional Vnet integration capabilities. This allowed me to test the app to validate it was still working as intended while also allowing me to deploy the app from my home machine. Deploying a private endpoint for app services locks down not only access to the running application but also to the SCM (source control management) endpoint that is used for deploying code to the app.

The next step was to create the Azure Private DNS Zone for app services which is named privatelink.azurewebsites.net. I then linked this zone to my shared services Vnet and setup the DNS server running in that Vnet with a standard forward to 168.63.129.16. This setup allows DNS queries made to the Private DNS zone to be resolved by the DNS server. I’ve written extensively about how DNS works with Azure Private Link so I won’t go into any more detail on that flow. Last step in the DNS process is to configure each Vnet to use the DNS server IP in its DNS Server settings. This also needs to configured directly on the Azure Firewall.

Now that the infrastructure would be able to resolve queries to the Private DNS Zone used by App Services, it was time to create the private endpoint for the web app. This registered the appropriate DNS records in the Private DNS Zone for my web app. With the private endpoint created, the last step was to enable Vnet integration.

Private DNS Zone with records for App Service instance

At this point I had all the guts of my solution and it was time to configure traffic to flow the way I wanted it to flow. Before I could begin work in Azure I needed to do the work outside of Azure. This involved creating an A record in my DNS hosting provider to point the public DNS name of my web app to the AGW’s (Application Gateway) public IP. This name can be whatever you want as long as you provide a certificate to the AGW such that it will identify itself as that name to the user. The AGW configuration is very straightforward and this tutorial will get you most of the way there. One thing to note is you’ll need to set the backend to point to the name given to the app service. This allows you to take advantage of the certificate Microsoft automatically provisions to the given app service.

App Gateway backend config

Since I wanted to route traffic destined for the web app through the Azure Firewall Premium instance, I needed to ensure the AGW trusted the certificate served up by Azure Firewall. This is done by modifying the HTTP setting used in the AGW rule for the app. Here you can upload the root certificate that issued the Azure Firewall Premium intermediate certificate.

App Gateway HTTP Setting

Now that the AGW is configured, I needed to create a route table for the subnet the AGW has its private IP address in. In this route table I disabled BGP (border gateway protocol) propagation to ensure the default route since this AGW v2 requires a default route pointing directly to the Internet. Now this is where it gets interesting from a routing perspective. Whenever you create a private endpoint for a service, a system route is added to the route table of all the subnets within the private endpoint’s vnet (virtual network) as well as any vnet the private endpoint’s vnet is peered with. If I want traffic from the AGW to route through the Azure Firewall, I need to override that system route with a /32 UDR. As you can imagine, this can become extremely tedious at scale and even risks hitting the max routes per route table depending on the scale we’re talking about. On the positive side, the issues around this are something Microsoft is aware of so hopefully that means this will be addressed at some point. In the meantime you’ll need to use the /32 route in a pattern such as this.

App Gateway route table

Azure Firewall Premium needs to be configured to allow traffic from the AGW to the web app and inspect this traffic. You can check out my last post for instructions on configured Azure Firewall to perform these activities.

Excellent, work is done right? Nope! If we influence routing on one side, we have to ensure the other side routes the same way. Toss a route table on the private endpoint subnet you say? Sorry, that isn’t supported my friend. Instead I needed to enable the Azure Firewall instance to SNAT (source NAT) traffic to the spoke Vnet. This ensures routing is symmetric and will eliminate any potential connection issues by creating only the UDR on the AGW subnet.

Incoming traffic flow

Lastly, I needed to create and apply a route table to the subnet I delegated to App Services for regional Vnet integration. This route table can be very simple and be configured with a single UDR for the default route pointing to the Azure Firewall private IP. This results in the traffic flow below:

Outgoing Internet traffic flow

This one was another fun pattern to solve. I got a chance to mess around more with AGW and also get a pattern I had theorized would work actually working. I find implementing the pretty pictures I draw helps drive home the benefits and considerations of such a pattern. With this pattern specifically, it suffers from similar considerations as the pattern in my last post. These include:

  • Challenges with observability
  • Operational overhead of certificate management
  • Possible latency issues depending on latency requirements and traffic patterns

This pattern tacks on more operational overhead by requiring that /32 route be added to the AGW route table each time a new app service is provisioned with a private endpoint. Observability further suffers when tracing a packet flow end-to-end due to the additional layer of SNAT required via Azure Firewall. One thing to note about this is you may be able to get around the SNAT requirement for web-specific traffic because of the transparent proxy functionality behind Azure Firewall application rules. I want to highlight I have not tested this myself and I typically tend to SNAT for this use case in my lab because many of my customers may use similar patterns but with a 3rd party firewall.

So there you go folks, another pattern to add to your inventory. Hopefully we’ll see the /32 issue issue private endpoints resolved sometime in the near future. Have a great weekend!