Azure OpenAI Service – Controlling Outbound Access

Hello again folks! Work has been complete insanity and has been preventing me from posting as of late. Finally, I was able to carve out some time to get a quick blog post done that I’ve been sitting on for while.

I have blogged extensively on the Azure OpenAI Service (AOAI) and today I will continue with another post in that series. These days you folks are more likely using AOAI through the new Azure AI Services resource versus using the AOAI resource. The good news is the guidance I will provide tonight will be relevant to both resources.

One topic I often get asked to discuss with customers is the “old person” aspects of the service such as the security controls Microsoft makes available to customers when using the AOAI service. Besides the identity-based controls, the most common security control that pops up in the regulated space is the available networking controls. While incoming network controls exercised through Private Endpoints or the service firewall (sometimes called the IP firewall) are common, one of the often missed controls within the AOAI service is the outbound network controls. Oddly enough, this is missed often in non-compute PaaS.

You may now be asking yourself, “Why the heck would the AOAI service need to make its own outbound network connections and why should I care about it?”. Excellent question, and honestly, not one I thought about much when I first started working with the service because the use cases I’m going to discuss didn’t come up because the feature set didn’t exist or it wasn’t commonly used. There are two main use cases I’m going to cover (note there are likely others and these are simply the most common):

The first use case is easily the most common right now. The “chat with your data” feature is a feature within AOAI that allows you to pass some extra information in your ChatCompletion API call that will instruct the AOAI service to query an AI Search index that you have populated with data from your environment to extend the model’s knowledge base without completely retraining it. This is essentially a simple way to muck with a retrieval augmented generation (RAG) pattern without having to write the code to orchestrate the interaction between the two services such as detailed in the link above. Instead, the “chat with your data” feature handles the heavy lifting of this for you if you have a populated index and want to add a few additional lines of code. In a future article I’ll go into more depth around this pattern because understanding the complete network and identity interaction is pretty interesting and often misconfigured. For now, understand the basics of it with the flow diagram below. Here also is some sample code if you want to play around with it yourself.

The second use case is when using a multimodal model like GPT-4 or GPT-4o. These models allow for you to pass them other types of data besides text such as images and audio. When requesting an image be analyzed, you have the option of passing the image as base64 or you can pass it a URL. If you pass it a URL the AOAI service will make an outbound network connection to the endpoint specified in the URL to retrieve the image for analysis

 response = client.chat.completions.create(
            ## Model must be a multimodal model
            model=os.getenv('LLM_DEPLOYMENT_NAME'),
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "Describe the image"
                        },
                        {
                            "type": "image_url",
                            "image_url": {"url": "{{SOME_PUBLIC_URL}}"}
                        }
                    ]
                }
            ],
            max_tokens=100         
        )

In both of these scenarios the AOAI service establishes a network connection from the Microsoft public backbone to the resource (such as AI Search in scenario 1 or a public blob in scenario 2). Unlike compute-based PaaS (App Services, Functions, Azure Container Apps, AKS, etc), today Microsoft does not provide a means for you to send this outbound traffic from AOAI through your virtual network with virtual network injection or virtual network integration. Given that you can’t pass this traffic through your virtual network, how can you mitigate potential data exfiltration risks or poisoning attacks? For example, let’s say an attacker compromises an application’s code and modifies it such that the “chat with your data” feature uses an attacker’s instance of AI Search to capture sensitive data in the queries or poisoning the responses back to the user with bad data. Maybe an attacker decides to use your AOAI instances to process images stolen from another company and placed on a public endpoint. I’m sure someone more creative could come up with a plethora of attacks. Either way, you want to control what your resources communicate with. The positive news is there is a way to do this today and likely an even better way to do this tomorrow when it comes to the AOAI service.

The AOAI (and AI Services) resources fall under the Cognitive Services framework. The benefit of being within the scope of this framework is they inherit some of the security capabilities of this framework. Some examples include support for Private Endpoints or disabling local key-based authentication. Another feature that is available to AOAI is an outbound network control. On an AOAI or AI Services resource, you can configure two properties to lock down the services’ ability to make outbound network calls. These two properties are:

  • restrictOutboundNetworkAccess – Boolean and set to true to block outbound access to everything but the exceptions listed in the allowedFqdnList property
  • allowedFqdnList – A list of FQDNs the service should be able to communicate with for outbound network calls

Using these two controls you can prevent your AOAI or AI Services resource from making outbound network calls except to the list of FQDNs you include. For example, you might whitelist your AI Search instance FQDN for the “chat with your data” feature or your blob storage account endpoint for image analysis. This is a feature I’d highly recommend you enable by default on any new AOAI or AI Service you provision moving forward.

The good news for those of you in the Terraform world is this feature is available directly within the azurerm provider as seen in a sample template below.

resource "azurerm_cognitive_account" "openai" {
  name                = "${local.openai_name}${var.purpose}${var.location_code}${var.random_string}"
  location            = var.location
  resource_group_name = var.resource_group_name
  kind                = "OpenAI"

  custom_subdomain_name = "${local.openai_name}${var.purpose}${var.location_code}${var.random_string}"
  sku_name              = "S0"

  public_network_access_enabled = var.public_network_access
  outbound_network_access_restricted = true
  fqdns = var.allowed_fqdn_list

  network_acls {
    default_action = "Deny"
    ip_rules = var.allowed_ips
    bypass = "AzureServices"
  }

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags

  lifecycle {
    ignore_changes = [
      tags["created_date"],
      tags["created_by"]
    ]
  }
}

If a user attempts to circumvent these controls they will receive a descriptive error stating that outbound access is restricted. For those of you operating in a regulated environment, you should be slapping this setting on every new AOAI or AI Service instance you provision just like you’re doing with controlling inbound access with a Private Endpoint.

Alright folks, that sums up this quick blog post. Let me summarize the lessons learned:

  1. Be aware of which PaaS Services in Azure are capable of establishing outbound network connectivity and explore the controls available to you to restrict it.
  2. For AOAI and AI Services use the restrictOutboundNetworkAccess and allowedFqdnList properties to block outbound network calls except to the endpoints you specify.
  3. Make controlling outbound access of your PaaS a baseline control. Don’t just focus on inbound, focus on both.

Before I close out, recall that I mentioned a way today (the above) and a way in the future. The latter will be the new feature Microsoft announced into public preview Network Security Perimeters. As that feature matures and its list of supported services expands, controlling inbound and outbound network access for PaaS (and getting visibility into it via logs) is going to get far easier. Expect a blog post on that feature in the near future.

Thanks folks!

Deploying Resources Across Multiple Azure tenants

Hello fellow geeks! Today I’m going to take a break from my AI Foundry series and save your future self some time by walking you through a process I had to piece together from disparate links, outdated documentation, and experimentation. Despite what you hear, I’m a nice guy like that!

The Problem

Recently, I’ve been experimenting with AVNM (Azure Virtual Network Manager) IPAM (IP address management) solution which is currently in public preview. In the future I’ll do a blog series on the product, but today I’m going to focus on some of the processes I went through to get a POC (proof-of-concept) working with this feature across two tenants. The scenario was a management and managed tenant concept where the management tenant is the authority for the pools of IP addresses the managed tenant can draw upon for the virtual networks it creates.

Let’s first level set on terminology. When I say Azure tenant, what I’m really talking about is the Entra ID tenant the Microsoft Azure subscriptions are associated with. Every subscription in Azure can be associated with only one Entra ID tenant. Entra ID provides identity and authentication services to associated Azure subscriptions. Note that I excluded authorization, because Azure has its own authorization engine in Azure RBAC (role-based access control).

Relationship between Entra ID tenant and Azure resources

Without deep diving into AVNM, its IPAM feature uses the concepts of “pools” which are collections of IP CIDR blocks. Pools can have a parent and child relationship where one large pool can be carved into smaller pools. Virtual networks in the same regions as the pool can be associated with these pools (either before or after creation of the virtual network) to draw down upon the CIDR block associated with the range. You also have the option of creating an object called a Static CIDR which can be used to represent the consumption of IP space on-premises or another cloud. For virtual networks, as resources are provisioned into the virtual networks IPAM will report how many of the allocated IP addresses in a specific pool are being used. This allows you to track how much IP space you’ve consumed across your Azure estate.

AVNM IPAM Resources

My primary goal in this scenario was to create a virtual network in TenantB that would draw down on an AVNM address pool in TenantA. This way I could emulate a parent company managing the IP allocation and usage across its many child companies which could be spread across multiple Azure tenants. To this I needed to

  1. Create an AVNM instance in TenantA
  2. Setup the cross-tenant AVNM feature in both tenants.
  3. Create a multi-tenant service principal in TenantB.
  4. Create a stub service principal in TenantA representing the service principal in TenantB.
  5. Grant the stub service principal the IPAM Pool User Azure RBAC role.
  6. Create a new virtual network in TenantB and reference the IPAM pool in TenantA.

My architecture would similar to image below.

Multi-Tenant AVNM IPAM Architecture

With all that said, I’m now going to get into the purpose of this post which is focusing steps 3, 4, and 6.

Multi-Tenant Service Principals

Service principals are objects in Entra ID used to represent non-human identities. They are similar to an AWS IAM user but cannot be used for interactive login. The whole purpose is non-interactive login by a non-human. Yes, even the Azure Managed Identity you’ve been using is a service principal under the hood.

Unlike Entra ID users, service principals can’t be added to another tenant through the Entra B2B feature. To make a service principal available across multiple tenants you need to create what is referred to as a multi-tenant service principal. A multi-tenant service principal exist has an authoritative tenant (I’ll refer to this as the trusted tenant) where the service principal exists as an object with a credential. The service principal has an attribute named appid which is a unique GUID representing the app across all of Entra. Other tenants (I’ll refer to these as trusting tenants) can then create a stub service principal in their tenant by specifying this appid at creation. Entra ID will represent this stub service principal in the trusted tenant as an Enterprise Application within Entra.

Multi-Tenant Service Principal in Trusted Tenant

For my use case I wanted to have my multi-tenant service principal stored authoritatively TenantB (the managed tenant) because that is where I would be deploying my virtual network via Terraform. I had an existing service principal I was already using so I ran the command below to update the existing service principal to support multi-tenancy. The signInAudience attribute is what dictates whether a service principal is single-tenant (AzureADMyOrg) or multi-tenant (AzureADMultipleOrgs).

az login --tenant <TENANTB_ID>
az ad app update --id "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --set signInAudience=AzureADMultipleOrgs

Once my service principal was updated to a multi-tenant service principal I next had to provision it into TenantA (management tenant) using the command below.

az login --tenant <TENANTA_ID>
az ad sp create --id "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX"

The id parameter in each command is the appId property of the service principal. By creating a new service principal in TenantA with the same appId I am creating the stub service principal for my service principal in TenantB.

Many of the guides you’ll find online will tell you that you need to grant Admin Consent. I did not find this necessary. I’m fairly certain it’s not necessary because the service principal does not need any tenant-wide permissions and won’t be acting on behalf of any user. Instead, it will exercise its direct permissions against the ARM API (Azure Resource Manager) based on the Azure RBAC role assignments created for it.

Once these commands were run, the service principal appeared as an Enterprise Application in TenantA. From there I was able to log into TenantA and create an RBAC role assignment associating the IPAM Pool user role to the service principal.

Creating The New VNet in TenantB with Terraform… and Failing

At this point I was ready to create the new virtual network in TenantB with an address space allocated from an IPAM pool in TenantA. Like any sane human being writing Terraform code, my first stop was to the reference document for the AzureRm provider. Sadly my spirits were quickly crushed (as often happens with that provider) the provider module (4.21.1) for virtual networks does not yet support the ipamPoolPrefixAllocations property. I chatted with the product group and support for it will be coming soon.

When the AzureRm provider fails (as it feels like it often does with any new feature), my fallback was to AzApi provider. Given that the AzApi is a very light overlay on top of the ARM REST API, I was confident I’d be able to use it with the proper ARM REST API version to create my virtual network. I wrote my code and ran my terraform apply only to run into an error.

Forbidden({"error":{"code":"LinkedAuthorizationFailed","message":"The client has permission to perform action 'Microsoft.Network/networkManagers/ipamPools/associateResourcesToPool/action' on scope '/subscriptions/97515654-3331-440d-8cdf-XXXXXXXXXXXX/resourceGroups/rg-demo-avnm/providers/Microsoft.Network/virtualNetworks/vnettesting', however the current tenant '6c80de31-d5e4-4029-93e4-XXXXXXXXXXXX' is not authorized to access linked subscription '11487ac1-b0f2-4b3a-84fa-XXXXXXXXXXXX'."}})

When performing cross-tenant activities via the ARM API, the platform needs to authenticate the security principal to both Entra tenants. From a raw REST API call this can be accomplished by adding the x-ms-authorization-auxiliary header to the headers in the API call. In this header you include a bearer token for the second Entra ID tenant that you need to authenticate to.

Both the AzureRm and AzApi providers support this feature through the auxiliary_tenant_ids property of the provider. Passing that property will cause REST calls to be made to the Entra ID login points for each tenant to obtain an access token. The tenant specified in the auxiliary_tenant_ids has its bearer token passed in the API calls in the x-ms-authorization-auxiliary header. Well, that’s the way it’s supposed to work. However, after some Fiddler captured I noticed it was not happening with AzApi v2.1.0 and 2.2.0. After some Googling I turned up this Github repository issue where someone was reporting this as far back as February 2024. It was supposed resolved in v1.13.1, but I noticed a person posting just a few weeks ago that it was still broken. My testing also seemed to indicate it is still busted.

What to do now? My next thought was to use the AzureRm provider and pass an ARM template using the azurerm_resource_group_deployment module. I dug deep into the recesses of my brain to surface my ARM template skills and I whipped up a template. I typed in terraform apply and crossed my fingers. My Fiddler capture showed both access tokens being retrieved (YAY!) and being passed in the underlining API call, but sadly I was foiled again. I had forgotten that ARM templates to not support referencing resources outside the Entra ID tenant the deployment is being pushed to. Foiled again.

My only avenue left was a REST API call. For that I used the az rest command (greatest thing since sliced bread to hit ARM endpoints). Unlike PowerShell, the az CLI does not support any special option for auxiliary tenants. Instead, I need to run az login to each tenant and store the second tenant’s bearer token in a variable.

az login --service-principal --username "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --password "CLIENT_SECRET" --tenant "<TENANTB_ID>"

az login --service-principal --username "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --password "CLIENT_SECRET" --tenant "<TENANTA_ID>"

auxiliaryToken=$(az account get-access-token \
--resource=https://management.azure.com/ \
--tenant "<TENANTA_ID>" \
--query accessToken -o tsv)

Once I had my bearer tokens, the next step was to pass my REST API call.

az rest --method put \
--uri "https://management.azure.com/subscriptions/97515654-3331-440d-8cdf-XXXXXXXXXXXX/resourceGroups/rg-demo-avnm/providers/Microsoft.Network/virtualNetworks/vnettesting?api-version=2022-07-01" \
--headers "x-ms-authorization-auxiliary=Bearer ${auxiliaryToken}" \
--body '{
"location": "centralus",
"properties": {
"addressSpace": {
"ipamPoolPrefixAllocations": [
{
"numberOfIpAddresses": "100",
"pool": {
"id": "/subscriptions/11487ac1-b0f2-4b3a-84fa-XXXXXXXXXXXX/resourceGroups/rg-avnm-test/providers/Microsoft.Network/networkManagers/test/ipamPools/main"
}
}
]
}
}
}'

I received a 200 status code which meant my virtual network was created successfully. Sure enough the new virtual network in TenantB and in TenantA I saw the virtual network associated to the IPAM pool.

Summing It Up

Hopefully the content above saves someone from wasting far too much time trying to get cross tenant stuff to work in a non-interactive manner. While my end solution isn’t what I’d prefer to do, it was my only option due to the issues with the Terraform providers. Hopefully, the issue with the Az Api provider is remediated soon. For AVNM IPAM specifically, AzureRm providers will be here soon so the usage of Az Api will likely not be necessary.

What I hope you took out of this is a better understanding of how cross tenant actions like this work under the hood from an identity, authentication, and authorization perspective. You should also have a better understanding of what is happening (or not happening) under the hood of those Terraform providers we hold so dear.

TLDR;

When performing cross tenant deployments here is your general sequence of events:

  1. Create a multi-tenant service principal in Tenant A.
  2. Create a stub service principal in Tenant B.
  3. Grant the stub service principal in Tenant B the appropriate Azure RBAC permissions.
  4. Obtain an access token from both Tenant A and Tenant B.
  5. Package one of the access tokens in the x-ms-authorization-auxiliary header in your request and make your request. You can use the az rest command like I did above or use another tool. Key thing is to make sure you pass it in addition to the standard Authorization header.
  6. ???
  7. Profit!

Thanks for reading!

AI Foundry – The Basics

This is a part of my series on AI Foundry:

Happy New Year! Over the last few months of 2024, I was buried in AI Foundry (FKA Azure AI Studio. Hey marketing needs to do something a few times a year.) proof-of-concepts with my best buddy Jose. The product is very new, very powerful, but also very complex to get up and running in a regulated environment. Through many hours labbing scenarios out and troubleshooting with customers, I figured it was time to share some of what I learned.

So what is AI Foundry? If you want the Microsoft explanation of what it is, read the documentation. Here you’ll get the Matt Felton opinion on what it is (god help you). AI Foundry is a toolset intended to help AI Engineers build Generative AI applications. It allows them to interact with the LLMs (large language models) and build complex workflows (via Prompt Flows) in a no code (like Chat Playground) or low code (prompt flow interface) environment. As you can imagine, this is an attractive tool to get the people who know Generative AI quick access to the LLMs so use cases can be validated before expensive development cycles are spent building the pretty front-end and code required to make it a real application.

Before we get into the guts of the use cases Jose and I have run into, I want to start with the basics of how the hell this service is setup. This will likely require a post or two, so grab your coffee and get ready for a crash course.

AI Foundry is built on top of AML (Azure Machine Learning). If you’re ever built out a locked down AML instance, you have some understanding of the many services that work together to provide the service. AI Foundry inherits these components and provides a sleek user interface on top and typically requires additional resources like Azure OpenAI and AI Search (for RAG use cases). Like AML there are lots and lots of pieces that you need to think about, plan for, and implement in the correct manner to make the secured instance of AI Foundry work.

One specific feature of AML plays an important role within AI Foundry and that is the concept of a hub workspace. A hub is an AML workspace that centrally manages the security, networking, compute resources, and quota for children AML workspaces. These child AML workspaces are referred to as projects. The whole goal of the hub is to make it easier for your various business units to do the stuff they need to do with AML/AI Foundry without having to manage the complex pieces like the security and networking. My guidance would be to think about giving each business unit a Foundry Hub that they can group projects of similar environment (prod or non-prod) and data sensitivity.

General relationship between AI Foundry Hub and AI Foundry Projects

Ok, so you get the basic gist of this. When you deploy an AI Foundry instance, behind the scenes an AML workspace designated as a hub is created. Each project you create is a “child” AML workspace of the hub workspace and will inherit some resources from the hub. Now that you’re grounded in that basic piece, let’s talk about the individual components.

The many resources involved with a secured AI Foundry instance

As you can see in the above image, there are a lot of components of this solution that you will likely use if you want to deploy an AI Foundry instance that has the necessary security and networking controls. Let me give you a quick and dirty explanation of each component. I’ll dive deeper into the identity, authorization, and networking aspects of these components in future posts.

  • Managed identities
    • There are lots of managed identities in use with this product. There is a managed identity for the hub, a managed identity for the project, and managed identities for the various compute. One of the challenges of AI Foundry is knowing which managed identity (and if not a managed identity, the user’s identity) is being used to access a specific resource.
  • Azure Storage Account
    • Just like in AML, there is a default storage account associated with the workspace. Unlike with a traditional AML workspace you may be familiar with, the hub feature allows all project workspaces to leverage the same storage account. The storage account is used by the workspaces to store artifacts like logs in blob storage and artifacts like prompt flows in file storage. The hub and projects isolate their data to specific containers (for blob) and folders (for file) with Azure ABAC (holy f*ck a use case for this feature finally) setup such that the managed identities for the workspaces can only access containers/folders for data related to their specific workspace.
  • Azure Key Vault
    • The Azure Key Vault will store any keys used for connections created from within the AI Foundry project. This could be keys for the default storage account or keys used for API access to models you deploy from the model catalog.
  • Azure Container Registry
    • While this is deemed optional, I’d recommend you plan on deploying it. When deploy a prompt flow there uses certain functionality to the managed compute, the container image used isn’t the default runtime and an ACR instance will be spun up automatically without all the security controls you’ll likely want.
  • Azure OpenAI Service
    • This is used for deployment of OpenAI and some Microsoft chat and embedding models
  • Azure AI Service
    • This can be used as an alternative to the Azure OpenAI Service. It has some additional functionality beyond just hosting the models such as speed and the like.
  • Azure AI Search
    • This will be used for anything RAG related. Most likely you’ll see it used with the Chat With Your Data feature of the Chat Playground.
  • Managed Network
    • This is used to host the compute instances, serverless endpoints and managed online endpoints spun up for compute operations within AI Foundry. I’ll do a deep dive into networking within the service in a future post.
  • Azure Firewall
    • If building a secured instance, you’re going to use a managed virtual network that is locked down to all Internet access via outbound rules. Under the hood an Azure Firewall instance (standard for almost all use cases) will be spun up in the managed virtual network. You interact with it through the creation of the outbound rules and can’t directly administer it. However, you will be paying for it.
  • Role Assignments
    • So so so many role assignments. I’ll cover these in a future post.
  • Azure Private DNS
    • Used heavily for interacting with the AI Foundry instance and models/endpoints you deploy. I’ll cover this in an upcoming post.

Are you frightened yet? If not you should be! Don’t worry though, over this series I’ll walk you through the pain points of getting this service up and running. Once you get past the complex configuration, it’s a crazy valuable service that you’ll have a high demand for from your business units.

In the next post I’ll walk through the complexities of authorization within AI Foundry.

Azure Networking – Inspecting traffic to Private Endpoints Revisited… Again. Maybe for the last time?

Update 11/4/2024 – Added limitations
Update 10/11/2024 – Updated with generally available announcement

Welcome back! Today I’m going to step back from the Generative AI world and talk about some good ole networking. Networking is one of those technical components of every solution that gets glossed over until the rubber hits the road and the application graduates to “production-worthy”. Sitting happily beside security, it’s the topic I’m most often asked to help out with at Microsoft. I’m going to share a new feature has gone generally available under the radar that is pretty damn cool, even if a bit confusing.

Organizations in the regulated space frequently have security controls where a simple 5-tuple-based firewall rule at OSI layer 4 won’t suffice and traffic inspection needs to occur to analyze layer 7. Take for an example a publicly facing web application deployed to Azure. These applications can be subject to traffic inspection at multiple layers like an edge security service (Akamai, CloudFlare, FrontDoor, etc) and again when the traffic enters the customer’s virtual through a security appliance (F5, Palo Alto, Application Gateway, Azure Firewall, etc). Most of the time you can get away with those two inspection points (edge security service and security appliance deployed into virtual network) for public traffic and one inspection point for private traffic (security appliance deployed into virtual network and umpteenth number of security appliances on-premises). However, that isn’t always the case.

Many customers I work with have robust inspection requirements that may require multiple inspection points within Azure. The two most common patterns where this pops up is when traffic first moves through an Application Gateway or APIM (API Management) instance. In these scenarios some customers want to funnel the traffic through an additional inspection point such as their third-party firewall for additional checks or a centralized choke point managed by information security (in the event Application Gateways / APIM have been democratized). When the backend is a traditional virtual machine or virtual network injected/integrated (think something like an App Service Environment v3) the routing is quite simple and looks like something like the below.

Traffic inspection with traditional virtual machine or VNet Injected/VNet integrated service

In the above image we slap a custom route table on the Application Gateway subnet, and add a user-defined route that says when contacting the subnet containing the frontend resources of the application, it needs to go the firewall first. To ensure the symmetry of return traffic, we put a route table on the frontend subnet with a user-defined route that says communication to the Application Gateway subnet needs to also go to the firewall. The routes in these two route tables are more specific than the system route for the virtual network and take precedence forcing both the incoming and return traffic to flow symmetrically through the firewall. Easy enough.

The routing when inspecting traffic to services which receive their inbound traffic via a Private Endpoint (such as an App Service running in a Premium App Services Plan, a Storage Account, a Key Vault, etc) that inspection gets more challenging. These challenges exist for both controlling the traffic to the Private Endpoint and controlling the return traffic.

When a Private Endpoint is provisioned in a virtual network, a new system route is injected into the route tables of each subnet in that virtual network AND any peered virtual networks. This route is a /32 for the IP address assigned to the network interface associated with the Private Endpoint as seen in the image below.

System route added by the creation of a Private Endpoint in a virtual network

Historically, to work around this you had to drop /32 routes everywhere to override those routes to push the incoming traffic to the Private Endpoints through an inspection point. This was a nightmare at scale as you can imagine. Back in August 2023, Microsoft introduced what they call Private Endpoint Network Policies, which is a property of a subnet that allows you to better manage this routing (in addition to optionally enforcing Network Security Groups on Private Endpoints) by allowing less specific routes to override the more specific Private Endpoint /32 routes. You set this property to Enabled (both this routing feature and network security group enforcement) or RouteTableEnabled (just this routing feature). This property is set on the subnet you place the Private Endpoints into. Yeah I know, confusing because that is not how routing is supposed to work (where less specific routes of the same length override more specific routes), but this is an SDN (software defined network) so they’ll do what they please and you’ll like it.

Private Endpoint route invalid because Private Endpoint Network Policy property set to RouteTableEnabled

While this feature helped to address traffic to the Private Endpoint, handling the return traffic wasn’t so simple. Wrapping a custom route table around a subnet containing Private Endpoints does nothing to control return traffic from the Private Endpoints. They do not care about your user-defined routes and won’t honor them. This created an asymmetric traffic flow where incoming traffic was routed through the inspection point but return traffic bypassed it and went direct to the calling endpoint.

This misconfiguration was very common in customer environments and rarely was noticed because many TCP sessions with Private Endpoints are short lived and thus the calling client isn’t affected by the TCP RST sent by the firewall after X number of minutes. Customers could work around this by SNATing to the NVA’s (inspection point) IP address and ensure the return traffic was sent back to the NVA before it was passed back to the calling client. What made it more confusing was some services “just worked” because Microsoft was handling that symmetry in the data plane of the SDN. Azure Storage was an example of such a service. If you’re interested in understanding the old behavior, check out this post.

Prior asymmetric behavior without SNAT at NVA

You’ll notice I said “prior” behavior. Yes folks, SNATing when using a 3rd-party NVA (announcement is specific to 3rd-party NVAs. Those of you using Azure Firewall in a virtual network, Azure Firewall in a VWAN Secure Hub, or a 3rd-party NVA in a VWAN Secure Hub will need to continue to SNAT for now (As of 11/2024) until this feature is extended to those use case.

I bet you’re thinking “Oh cool, Microsoft is now having Private Endpoints honor user-defined routes in route tables”. Ha ha, that would make far too much sense! Instead Microsoft has chosen to require resource tags on the NICs of the NVAs to remove the SNAT requirement. Yeah, wouldn’t have been my choice either but here we are. Additionally, in my testing, I had it working without the resource tags to get a symmetric flow of traffic. My assumption (and total assumption as an unimportant person at Microsoft) is that this may be the default behavior on some of the newer SDN stacks while older SDN stacks may require the tags. Either way, do what the documentation says and put the tags in place.

As of today (10/11/2024) the generally available documentation is confusing as to what you need to do. I’ve provided some feedback to the author to fix some of the wording, but in the meantime let me explain what you need to do. You need to create a resource tag on either the NIC (non-VMSS) or VM instance (VMSS) that has a key of disableSnatOnPL with a value of true.

Magic of SDN ensuring symmetric flow without SNAT

TLDR; SNAT should no longer be required to ensure symmetric traffic flow when placing an NVA between an endpoint and a Private Endpoint if you have the proper resource tag in place. My testing of the new feature was done in Central US and Canada Central with both Azure Key Vault and Azure SQL. I tested when the calling endpoint was within the same virtual network, when it was in a peered virtual network connected in a hub and spoke environment, and when the calling machine was on-premises calling a private endpoint in a spoke. In all scenarios the NVA showed a symmetric flow of traffic in a packet capture.