“Infra/Security Stuff” In the Azure OpenAI Service

Welcome back fellow geeks!

The past few months have been crazy busy. My customer load has doubled and customers who went into hibernation for holidays have decided to wake up in full force. With that new demand comes interesting new use cases and blog topics.

Unless you’ve been living under a rock, you’re well aware of the insane amount of innovation and technical developments in the AI space. It seems every day there’s 10 articles on OpenAI’s models (hilarious South Park episode on ChatGPT recently). Microsoft decided to dive straight into the deep end and formed a partnership with OpenAI. Out of this partnership came the Azure OpenAI Service which runs OpenAI models like ChatGPT on Azure infrastructure. As you can imagine, this offering has big appeal to new and existing Azure customers.

Given the demand I was seeing within my own customers, I decided to take a look at the security controls (or infra/security stuff as one of my data counterparts calls it) available within the service. Before jumping into the service, I did some basic experimentation with the OpenAI’s own service using this wonderful tutorial by the Part Time Larry. I found his step-by-step walkthrough of some of the sample code to be absolutely stellar in understanding just how simple it is to interact with the service.

With a very basic (and I do stress basic) understanding of how to interact with OpenAI’s API, I decided upon a use case. The use case I decided upon was to use the summarization feature he davinci GPT-3 model to summarize the NIST document on Zero Trust. I was interested in which key points it would extract from document and whether those would align with what I drew from the document after reading through it fully (re-reading the doc is still in my todo list!).

Before I could do any of the cool stuff I had to get onboarded to the service. At this time, customers must request their subscriptions be onboarded into the service using the process described in Microsoft’s public documentation. While I waited for my subscription to be onboarded, I read through the public documentation with a focus on the “infra/security” stuff. Like most of the data services in Azure, the information on the levers customers can pull around security controls like network, encryption-at-rest, and identity were very high level and not very useful. Lots of mentions of words, but no real explanation of those features would “look” when enabled in the service. There is also the matter of how Microsoft is handling and securing the data the customer data for the service.

Like every cloud provider, Microsoft operates within the shared responsibility model where Microsoft is responsible for the security of the cloud and you, the customer, are responsible for security within the cloud. Simply put, there are controls Microsoft manages behind the scenes and there are controls Microsoft puts in the customer’s hands and it’s on the customer to enable those controls. Microsoft describes how the data is processed and secured for the Azure OpenAI Service in the public documentation. Customers should additionally review the Microsoft Products and Services Data Protection Addendum and specific product terms. Another great resource to review is documentation within the Microsoft Services Trust Portal. In the Trust Portal you can find all the compliance-related documentation such as the SOC-2 Type II which will provide detail as to Microsoft’s processes and controls it uses to protect data. For a much deeper dive, you can review the FedRAMP SSP (System Security Plan). I typically find myself scanning through the SOC2 first and then very often diving deeper by reading through the relevant sections in the FedRAMP SSP. I’ll let you read through and consume the documentation above (and you should be doing that for every service you consume). For the purposes of this blog post, I’m going to look at the “security within the cloud”.

I’m a big fan of taking a step back and looking at things from a high level architectural view. After reading through documentation, I envisioned the following Azure components being they key components required in any implementation of the service within a regulated industry.

Azure OpenAI Azure Components

Let’s walk through each of these components.

The first component is the Cognitive Services Account. The Azure OpenAI Service falls under the Azure Cognitive Services’ umbrella. This includes existing services like speech-to-text, image analysis, and the like. This was a great idea by Microsoft because it would allow the Product Group (PG) managing the Azure OpenAI Service to leverage existing architectural standards already adopted for other services under the Cognitive Services umbrella. Think of the Cognitive Service Account instance as your slice of the service serving as a limits and authorization boundary.

The next component is the Azure Key Vault instance. Within an instance of Azure OpenAI Service their are three types of data stored on the Microsoft-managed side. This data includes training data you may provide to fine-tune models, the fine-tuned models themselves, and prompts and completions. This data is encrypted-at-rest by default with Microsoft-managed keys when stored within the Microsoft-managed boundary. This means that Microsoft manages the authorization and rotation of the keys. Many regulated customers have regulatory requirements or internal policies that require the customer to manage authorization and rotation of any keys used to encrypt data in their environment. For that reason, cloud providers such as Microsoft provide the option to use CMKs (Customer Managed Keys). In Azure, these CMKs are stored within an Azure Key Vault instance within a customer’s subscription and the customer controls authorization and access to the keys.

The Azure OpenAI Service supports the use of CMKs to protect at least two out of three of these sets of data. The documentation is unclear as to whether the prompts and completions can be encrypted with CMKs. If you happen to know, let me know in the comments. Take note that for now you need to request access to get your subscription approved for CMKs with the Azure OpenAI Service.

For those unfamiliar with the OpenAI models (like myself) prompts are the questions or tasks you issue to the AI and the completions are its response. By default, Microsoft stores this data for 30 days to review against abuse (there are lots of malicious use cases one could undertake with these models). Customers do have the option to opt out of this process using the process outlined in this piece of public documentation.

Next up we have virtual networks, private endpoints and Azure Private DNS. Like the rest of the services in the Cognitive Services umbrella, the OpenAI service supports private endpoints as a means to lock down network access to your private IP space. The DNS namespace for the service is privatelink.openai.azure.com. Best practice would have you hosting this zone in Azure Private DNS which we’ll see later on when I share a sample architecture.

Next, we have Azure Storage. When uploading training data to the Azure OpenAI Service, you have the option of uploading the data directly from your computer or sourcing it from Azure Storage. There are some limitations to the controls you can exercise over this data which I will cover late in this post.

Lastly, we have managed identities and Azure RBAC. For the service, managed identities are used to access the CMKs stored in the customer Key Vault instance. Azure RBAC will be used to control access to the Azure OpenAI Services instance and keys used to call the service APIs.

Stepping back and looking at the components above and how they fit together to provide security controls across identity, network, and encryption, I see it like the below.

Azure OpenAI Service Security Options

For the Azure OpenAI Service instance running the models, you lock down the service using Azure RBAC. You secure network access by restricting access to the service using private endpoints. Data is optionally encrypted with CMKs stored in a customer-managed Key Vault instance to enable the customer to control access to the keys, rotate keys, and audit usage of those keys.

The Azure Key Vault instance used when customers opt to use CMKs can have access to the keys controlled using Azure RBAC (when using a Key Vault instance enabled for Azure RBAC vault policies) and managed identities. The Azure OpenAI Service instance will access the CMK using the managed identity assigned to the service. Take note that as of today, you cannot use the Key Vault service firewall to restrict network access. Azure Cognitive Services is not considered a Trusted Azure Service for Key Vault and thus can’t be allowed network access when the service firewall is enabled.

If the customer chooses to store training data in an Azure Storage Account before uploading to the service, the account can be secured for user access with Azure RBAC or SAS tokens. Since SAS tokens are a nightmare to manage for humans, you’ll want to control access to the data for humans using Azure RBAC. The Azure OpenAI Service itself does not support the use of a managed identity for access of Azure Storage today. This means you’ll need to secure the data using a SAS token for non-human access of the data during upload. Since the Azure OpenAI Service does not yet support a managed identity for access to Azure Storage, it cannot take advantage of the service instance authorization rules. Allowing just the trusted services for Azure Storage doesn’t seem to work either in my testing. This means that you’ll need to allow all public network access to the storage account. Your means to secure that data will be SAS tokens largely for the access coming from the Azure OpenAI Service. Not ideal, but hey, the service is very new.

So putting everything together than we’ve learned, what could this look like architecturally?

Azure OpenAI Service Sample Architecture

Above is an example architecture that is common in regulated organizations that have adopted Azure VWAN. In this pattern, all service instances related to the deployment would be placed in a dedicated workload subscription as indicated by the orange outline. This includes the virtual network containing the Azure OpenAI Service private endpoint, the Azure OpenAI Service instance, user-assigned managed identity used by the Azure OpenAI Service instance, the workload key vault containing the CMK used to encrypt the data held by the Azure OpenAI Service, and the Azure Storage Account used to stage training data to be uploaded to the service.

The Azure OpenAI Service would have its network access secured to the private endpoint. Both the Azure Key Vault instance and Storage Account would have their network access open to public networks. Access to the data for Azure Key Vault would be secured with Azure AD authentication and Azure RBAC vault policies for authorization. The Azure Storage account would use Azure AD authentication and Azure RBAC to control access for human users and SAS tokens to control access from the Azure OpenAI Service instance.

Lastly, although not listed in the images, it should go without saying that Azure Policy should be put in place to ensure all of the resources look the way you and your security team has decided the resources need to look.

As the service grows and matures, I expect some of these gaps in network controls to be addressed through support for managed identities to access storage accounts and the addition of the service to Azure Key Vault’s trusted services. I also wouldn’t be surprised to see some type of VNet-injection or VNet-integration to be introduced similar to what is available in Azure Machine Learning.

Well folks, I hope this helped you infra and security folks do your “infra/security stuff” for the day and you now better understand some of the levers and switches you have available to you to secure the service. As I progress in my learning of the service and AI in general, I plan on adding some posts which will walk through the implementation in action doing a deeper dive how this architecture looks when implemented. I have it running in my demo environment, but time is a very limited thing these days.

Thanks folks and I hope your journey into AI has been as fun as mine has been so far!

Application Gateway and Private Link

Welcome back fellow geeks!

Over the past few years I’ve written a ton on Private Endpoints for PaaS (platform-as-a-service) services Microsoft provides. I haven’t written anything about the Private Link service that powers the Private Endpoints. There is a fair amount of community knowledge and documentation on building a Private Link service behind an Azure Load Balancer, but far less on how to do it behind an Application Gateway (Adam Stuart’s video on it is a wonderful resource). Today, I’m going to make an attempt at furthering that collective community knowledge with a post on the feature and give you access to a deployable lab you can use to replicate what I’ll be writing about in this post. Keep in mind the service is still in public preview, so remember to check the latest documentation to validate the correctness of what I discuss below.

Let’s get to it!

I’ll be using a lab environment that I’ve built which mimics a typical enterprise environment. The lab uses a hub-and-spoke architecture where on-premises connectivity and centralized mediation and optional inspection is provided in a transit virtual network which is peered to all spoke virtual network. A shared services virtual network provides core infrastructure services such as DNS. The other spoke contains the workload which is a simple Python application deployed in Azure App Services.

The App Service has been configured to inject both its ingress and egress traffic into the virtual network using a combination of Private Endpoints and Regional VNet Integration. An Application Gateway has been placed in front of the App Service and has been deployed with both a public listener (listening on 8443) and a private listener (listening on 443). The application is accessible to internal clients (such as the VMs in the shared service virtual network) by issuing an HTTP request to https://www.jogcloud.com. Azure Private DNS provides the necessary DNS resolution for internal clients.

The deployed Python application retrieves the current time from a public API (assuming the API is up) and returns the source IP on the HTTP request as well as the X-Forwarded-For header. I’ll use this application to show some of the caveats of this pattern that are worth knowing if you ever plan to operationalize it.

To maintain visibility and control of traffic coming in either publicly or privately to the application, the route table assigned to the Application Gateway subnet is configured to route traffic through the Azure Firewall instance in the hub before allowing the traffic to the App Service. This pattern allows for democratization of Application Gateway while maintaining the ability to exercise additional IDS/IPS (intrusion detection/intrusion prevention) via the security appliance in the hub.

Lab Environment

Imagine this application is serving up confidential data and you need to provide a partner organization with access. Your information security team does not want the partner accessing the application over the Internet due to the sensitivity of the information the partner will be accessing. While direct connectivity with the partner is an option, it would likely result in a significant amount of design to ensure the partner’s network only knows about the application IP space and appropriate firewall rules are in place to limit access to the Application Gateway endpoint. In this scenario, your organization will be the provider and the customer’s organization will be the consumer. I don’t know about you, but I’ve been in this situation a lot of times in my past. Back in the day (yeah I’m old, what of it?) you’d have to go the direct connectivity route and you’d spend months putting together a design and getting it approved by the powers that be. Let’s now look at how the new Private Link feature of Application Gateway can make this whole problem a lot easier to solve.

Assume this partner has a presence in Azure so we don’t have to get into the complexity of alternatives (such as building an isolated virtual network with VPN Gateway the partner connects to). The service could be exposed to the customer using the architecture below. Note that I’ve trimmed down the provider environment to show only the workload virtual network and illustrated a few compute services on the consumer end that are capable of accessing services exposed through Private Endpoints.

Goal State

In the above image you will notice a new subnet in the provider’s virtual network. This subnet is used for the Private Link configuration. Traffic entering the provider environment will be NATed to an IP within this subnet. You can opt to use an existing subnet, but I’d recommend dedicating a subnet instead vs mixing it within the any of the application tier subnets.

There are considerations when sizing the subnet. Each IP allocated to the subnet can be used to service 64,000 connections and you can have up to eight IP addresses as of today allowing you to escape with a /28 (5 IP addresses reserved by Azure + 8 IPs for PrivateLink configuration). Just remember this is preview so that limit could be changed in the future. For the purposes of this post I used a /24 since I’m terrible at subnetting.

New subnet for Private Link Configuration

It’s time to create the Private Link configuration now that the subnet is in place. This can be done in all the usual ways (Portal, CLI, PowerShell, REST). When using the Portal you will need to navigate to the Application Gateway instance you’re using, select the Private Link menu item and select the option to add a new Private Link configuration.

Private Link Configuration Setup

On the next screen you will need to select the subnet you’ll use for the Private Link configuration. You will also pick the listener you want to expose and determine the number of IPs you want to allocate to the service. Note that both the public and private listeners are available. If you’re exposing a service within your virtual network, you’ll likely be creating these with private listeners almost exclusively. A use case for a public listener might be a single client wants a more consistent network experience provided by their ExpressRoute or VPN connectivity into Azure vs going over the Internet.

Private Link configuration

Once completed, you can freely create Private Endpoints for your service within the same tenant. Within the same tenant, your Private Link service will be detected when creating a Private Endpoint as seen below. All that is left for you to do is create a DNS entry that matches the FQDN you are presenting within the certificates loaded on your Application Gateway. At this point you should be saying, “That’s all well and good Matt, but my use case is providing this to a consumer in a DIFFERENT tenant.” Let’s explore that scenario.

Creating Private Endpoint in same tenant

I switched to a subscription in a separate Azure AD tenant which would represent the consumer. In this tenant I created a virtual network with a single subnet with the IP space of which overlaps with the provider’s network demonstrating that overlapping IP space doesn’t matter with Private Link. In that subnet I placed a VM running Ubuntu that I would use to SSH in. I created this resources in the Australia East region to demonstrate that the service exposed via Private Link can have Private Endpoints created for it in any other Azure region. Connections made through the Private Endpoint will ride the Azure backbone to the destined service.

Once the basics were in place for testing, I then created the Private Endpoint for the provider service within the consumer’s network. This can be done through the Private Link Center blade using the Private Endpoint menu item in the Azure Portal as seen below.

Creation of Private Endpoint

On the resource screen you will need to provide the resource id of the Application Gateway and the listener name. This is additional information you would need to pass to the consumer of any Application Gateway Private Link enabled service.

Private Endpoint Creation – Resource

Bouncing back to the provider tenant, I navigated back to the Application Gateway resource and the Private Link menu item under the Private endpoint connections section. Private Endpoint creation for Private Link services across tenant work via request and approval process. Here I was able to approve the association of the consumer’s Private Endpoint with the Private Link service in the provider tenant.

Approval of Private Endpoint association

Once approved, I bounced back to the consumer tenant and grabbed the IP address assigned to the Private Endpoint that was created. I then SSH’d into the Ubuntu VM and created a DNS entry in the host file of the VM for the service I was consuming. In this scenario, I had created a listener on the Application Gateway which handles all requests from *.jogcloud.com. Once the DNS record was created, I then used curl to issue a request to the application. Success!

Successful access of application from consumer

The application spits back the client IP and X-Forwarded-For header of the HTTP request. Ignore the client IP of, that is appearing due to the load balancer component of the App Service. Focus instead on the X-Forwarded-For. Notice that the first value in the header is the NATd IP from the subnet that was dedicated to the Private Link service. The next IP in line is the private IP address of the Azure Firewall instance. As I mentioned earlier, the Application Gateway is configured to send incoming traffic through the Azure Firewall instance for additional inspection before passing on to the App Service instance. The Azure Firewall is configured for NAT to ensure traffic symmetry in this scenario.

What I want you to take away from the above is that the Private Link service is NATing the traffic, so unless the consumer has a forward web proxy on the other end appending to the X-Forwarded-For header (or potentially other headers to aid with identification), troubleshooting a user’s connection will take careful correlation of requests across App Gateway, Azure Firewall, and the underlining application logs. In the below image, you can see I used curl to add a value to the X-Forwarded-For header which was carried on through the request.

Request with X-Forwarded-For value added by consumer

What I love about this integration is it’s very simple to setup and it allows a whole bunch of additional security controls to be introduced into the flow such as the Application Gateway WAF or a firewall’s IDS/IPS features.

Here are some key takeaways for you to ponder on over this holiday break:

  • For HTTP/HTTPS traffic, the Application Gateway Private Link pattern allows for the introduction of additional security controls into the network flow beyond what you’d get with a Private Link service fronted by an Azure Standard Load Balancer
  • Setup for the consumer is very simple. All you need to do is provide them with the resource id of the application gateway and the listener name. They can then use the native Private Endpoint creation experience to setup access to your service.
  • Don’t forget the importance of ensuring the customer trusts the certificate the Application Gateway is providing and can reach applicable CRL/OCSP endpoints if you’re using them. Best bet is to use a trusted 3rd party certificate authority.
  • DNS DNS DNS. The customer will need to manage the relevant DNS records on their end. You will want to ensure they know which FQDNs you are including within your certificate so the records they create match those FQDNs. If there is a mismatch, any secure session setup will fail.

With that said, feel free to give the feature a try. You can use the lab I’ve posted on GitHub and the steps I’ve outlined in this blog to experiment with the service yourself.

Have a happy holiday!

Protecting Azure Backups with Resource Guard – Part 1

Hello geeks!

I recently was asked to talk about Azure Backup with a customer. Whenever I’m asked about a service my order of operations is to read through the public documentation, lab it out, talk to peers about it, and then put together key findings, best practices, and a deployable lab. I’ve published the package I put together for Azure Backup on GitHub.

When doing my research into Azure Backup, I came across an interesting limitation. The Recovery Services Vaults (RSVs), which orchestrate and manage storage of the Virtual Machine (VM) backup, must be created in the same subscription as the VMs being backed up. This surprised me, because it puts the resource being backed up and the backup itself within the same authorization boundary.

If you’ve done any work in AWS, you know best practice is to store the backups of the EC2 instances in a separate AWS account to ensure you aren’t stacking both the resources and backup in the same security boundary. The Code Spaces hack is a great example of what happens when you don’t do this. In the Azure scenario, I’m forced to take the risk of an attacker gaining Owner-level permissions of the subscription and locking/destroying both my resource and backup creating quite nasty ransomware scenario. What the heck Microsoft?

Thankfully, in 2021 Microsoft introduced a really creative feature to address this risk in the form of Resource Guard. A Resource Guard is an Azure resource that can be created in the same subscription, a different subscription in the same Azure AD tenant, or even a subscription in a different Azure AD tenant! When associated to an RSV, a user looking to make risky modifications to it (such as removing soft delete) must have permissions on BOTH the Resource Guard and the RSV. This means it can support separate authorization boundaries at the subscription level or even completely separate identity and authentication boundaries at the tenant.

Resource Guard isn’t something I often hear discussed by Microsoft folks when explaining Azure Backup to customers. Given what I’ve explained above, it should become quite obvious this is a critical feature to incorporate into your design if you plan on using Azure Backup.

In the next post in this short series, I’ll walk through a demonstration of the feature in action using the lab in this repository with the addition of a second Azure AD tenant as pictured in the image below. See you next post!

Azure Backup Lab for this series

Revisiting UDR improvements for Private Endpoints

Revisiting UDR improvements for Private Endpoints

Hello folks! It’s been a busy past few months. I’ve been neck deep in summer activities, customer work, and building some learning labs for the wider Azure community. I finally had some time today to dig into the NSG and improved routing features for Private Endpoints that finally hit GA (general availability) last month. While I had written about the routing changes while the features were in public preview, I wanted to do a bit more digging now that it is officially GA. In this post I’ll take a closer look at the routing changes and try to clear up some of the confusion I’ve come across about what this feature actually does.

If you work for a company using Azure, likely you’ve come across Private Endpoints. I’ve written extensively about the feature over the course of the past few years covering some of the quirks that are introduced using it at scale in an enterprise. I’d encourage you to review some of those other posts if you’re unfamiliar with Private Endpoints or you’re interested in knowing the challenges that drove feature changes such as the NSG and improved routing features.

At the most basic level, Private Endpoints are a way to control network access to instances of PaaS (platform-as-a-service) services you consume in Microsoft Azure (they can also be used for PrivateLink Services you build yourself). Like most public clouds, every instance of a PaaS service in Azure is by default available over a public IP. While there are some basic controls layer 3 controls, such as IP restrictions offered for Azure App Services or the basic firewall that comes with Azure Storage, the service is only accessible directly via its public IP address. From an operations perspective, this can lead to inconsistencies with performance when users access the services behind Private Endpoints since the access is over an Internet connection. On the security side of the fence, it can make requirements to inspect and mediate the traffic with full featured security appliances problematic. There can even be a risk of data exfiltration if you are forced to allow access to the Internet for an entire service (such as *.blog.windows.net). Additionally, you may have internal policies driven by regulation that restrict sensitive data to being accessible only within your more heavily controlled private network.

PaaS with no Private Endpoint

Private Endpoints help solve the issues above by creating a network endpoint (virtual network interface) for the instance of your PaaS service inside of your Azure VNet (virtual network). This can help provide consistent performance when accessing the application because the traffic can now flow over an ExpressRoute Private Peering versus the user’s Internet connection. Now that traffic is flowing through your private network, you can direct that traffic to security appliances such as a Palo Alto to centrally mediate, log, and optionally inspect traffic up to and including at layer 7. Each endpoint is also for a specific instance of a service, which can mitigate the risk of data exfiltration since you could block all access to a specific Azure PaaS service if accessed through your Internet connection.

PaaS with Private Endpoint

While this was possible prior to the new routing improvements that went into GA in August, it was challenging to manage at scale. I cover the challenge in detail in this post, but the general gist of it is the Azure networking fabric creates a /32 system route in each subnet within the virtual network where the Private Endpoint is placed as well as any directly peered VNets. If you’re familiar with the basics of Azure routing you’ll understand how this could be problematic in the situation where the traffic needs to be routed through a security appliance for mediation, logging, or inspection. To get around this problem customers had to create /32 UDRs (user-defined route) to override this system route. In a hub and spoke architecture with enough Private Endpoints, this can hit the limit of routes allowed on a route table.

An example of an architecture that historically solved for this is shown below. If you have user on-premises (A) trying to get to a Private Endpoint in the spoke (H) through the Application Gateway (L) and you have a requirement to inspect that traffic via a security appliance (F, E), you need to create a /32 route on the Application Gateway’s subnet to direct the traffic back to the security appliance. If that traffic is instead for some other type of service that isn’t fronted by an App Gateway (such as Log Analytics Workspace or Azure SQL instance), those UDRs need to be placed on the route table of the Virtual Network Gateway (B). The latter scenario is where scale and SNAT (see my other post for detail on this) can quickly become a problem.

Common workaround for inspection of Private Endpoint traffic

To demonstrate the feature, I’m going to use my basic hub and spoke lab with the addition of an App Service running a very basic Python Flask application I wrote to show header and IP information from a web request. I’ve additionally setup a S2S VPN connection with a pfSense appliance I have running at home which is exchanging routes via BGP with the Virtual Network Gateway. The resulting lab looks like the below.

Lab environment

Since Microsoft still has no simple way to enumerate effective routes without a VM’s NIC being in the subnet, and I wanted to see the system routes that the Virtual Network Gateway was getting (az network vnet-gateway list-learned-routes will not do this for you), I created a new subnet and plopped a VM into it. Looking at the route table, the /32 route for the Private Endpoint was present.

Private Endpoint /32 route

Since this was temporary and I didn’t want to mess with DNS in my on-premises lab, I created a host file entry on the on-premises machine for the App Service’s FQDN pointing to the Private Endpoint IP address. I then accessed the service from a web browser on that machine. The contents of the web request show the IP address of my machine as expected because my traffic is entering the Azure networking plane via my S2S VPN and going immediately to the Private Endpoint for the App Service.

Request without new Private Endpoint features turned on

As I covered earlier, prior to these new features being introduced, to get this traffic going through my Azure Firewall instance I would have had to create /32 UDR on the Virtual Network Gateway’s route table and I would have had to SNAT at the firewall to ensure traffic symmetry (the SNAT component is covered in a prior post). The new feature lifts the requirement for the /32 route, but in a very interesting way.

The golden rule for networking has long been the most specific route is the preferred route. For example, in Azure the /32 system route for the Private Endpoint will the preferred route even if you put in a static route for the subnet’s CIDR block (/24 for example). The new routing feature for Private Endpoints does not follow this rule as we’ll see.

Support for NSGs and routing improvements for Private Endpoints is disabled by default. There is a property of each subnet in a VNet called privateEndpointNetworkPolicies which is set to disabled by default. Swapping this property from disabled to enabled kicks off the new features. One thing to note is you only have to enable this on the subnet containing the Private Endpoint.

In my lab environment I swapped the property for the snet-app subnet in the workload VNet. Looking back at the route table for the VM in the transit virtual network, we now see that the /32 route has been made invalid. The /16 route pointing all traffic to the workload VNet to the Azure Firewall is now the route the traffic will take, which allows me to mediate and optionally inspect the traffic.

Route table after privateEndpointNetworkPolicies property enabled on Private Endpoint subnet

Refreshing the web page from the on-premises VM now shows a source IP of which is one of the IPs included in the Azure Firewall subnet. Take note that I have an application rule in place in Azure Firewall which means it uses its transparent proxy feature to ensure traffic symmetry. If I had a network rule in place, I’d have to ensure Azure Firewall is SNATing my traffic (which it won’t do by default for RFC1918 traffic). While some services (Azure Storage being one of them) will work without SNAT with Private Endpoints, it’s best practice to SNAT since all other services require it. The requirement will likely be addressed in a future release.

Request with new routing features enabled

While the support for NSGs for Private Endpoints is awesome, the routing improvements are a feature that shouldn’t be overlooked. Let me summarize the key takeaways:

  • Routing improvements (docs call it UDR support which I think is a poor and confusing description) for Private Endpoints are officially general available.
  • SNAT is still required and best practice for traffic symmetry to ensure return traffic from Private Endpoints takes the same route back to the user.
  • The privateEndpointNetworkPolicies property only needs to be set on the subnet containing the Private Endpoints. The routing improvements will then be active for those Private Endpoints for any route table assigned to a subnet within the Private Endpoint’s VNet or any directly peered VNets.
  • Even though the /32 route is still there, it is now invalidated by a less specific UDR when this setting is set on a Private Endpoints subnet. You could create a UDR for the subnet CIDR containing the Private Endpoints or the entire VNet as I did in this lab. Remember this an exception to the route specificity rule.

Well folks, that sums up this post. Hopefully got some value out of it!