Microsoft Foundry – BYO AI Gateway – Part 3

Posted on April 19, 2026 by mattfeltonma

This is part of my series on Microsoft Foundry:

Hello once again folks! Today I’m going to add yet another post to my BYO AI Gateway feature of Microsoft Foundry series. In my first post I gave a background on the use case for this feature, in the second post I walked the concepts required to understand the feature, the resources involved in the setup, and the schema of those resource objects. In this post I’m going to walk through the architecture I setup to play with this feature, why I made the choices I did, and dig into some of the actual Terraform code I put together to set this whole thing up. Let’s dive in!

The foundational architecture

When I wanted to experiment with this feature I wanted to test it in an architecture that is typical to my customer base. For this I chose the classic tried and true hub and spoke architecture. I opted out of VWAN and went with a traditional virtual network model because I prefer the visibility and control to that model during experimentation. When the hub becomes a managed VWAN Hub, I get that fancy overlay which makes invisible some of the magic of what is happening underneath. This model enables me to do packet captures at every step and manage routing at a very granular level, which is a must when playing with cutting edge features.

For this setup I have a lab I built out in Terraform which gives me that hub and spoke architecture, centralized DNS resolution, logging, and access to multiple regions. The multiple regions piece of the puzzle is key because feature availability across Foundry features and APIM v2 SKUs are still in flux. The lab also uses three spoke virtual networks. This gives allows me to plop pieces in different spokes to see how things behave and track traffic patterns. It also gives me flexibility when I need to wait for purge operations like when purging a Microsoft Foundry resource configured with a standard agent setup and clearing the lock on the delegated subnet for the VNet injection model. If you’ve mucked around with this you know sometimes it can be 15 minutes and sometimes it can be 2 days.

I drop one of three spokes into one of the “hero” regions. This is a region that gets new features sooner than ours. For example, in this lab I drop it into East US 2 while the hub and other two spokes go in West US 3 (where I’m less likely to run into an quota or capacity issues). East US 2 gives me the option to deploy APIM v2 Standard SKU. In the next section I’ll explain why I’m going with v2 for this experimentation.

AI Gateway Architecture

For an AI Gateway I decided to use APIM. My buddy Piotr Karpala has a great repository of 3rd-party AI Gateway solutions if you want to test this with something outside of APIM. I’m going to plop this into the “hero” region spoke in East US 2 to so I can deploy a v2 Standard SKU. The reason I’m using a v2 SKU is it provides another networking model that the classic SKUs do not, and that is Private Endpoint and VNet integration. In this model I block public traffic to the APIM service, create a Private Endpoint to enable private inbound access, and setup VNet integration to a delegated subnet to keep outbound traffic from any of the APIM instances flowing through my virtual network so I can mediate it and optionally inspect it. While the Private Endpoint is only supported for the Gateway and not the Developer Portal, I don’t care in this instance because I don’t plan on using the Developer Portal on an APIM acting as an AI Gateway. You can also create a private endpoint for a APIM v2 service instance that uses VNet injection, but it requires the Premium SKU and I’m super cheap, so I opted out of that.

**APIM v2 with Private Endpoint and VNet Integration**

The reason I picked this networking model for APIM is it makes it easy for me to inject the service into a Microsoft Foundry account configured with a standard agent and the managed virtual network model. In a future post I’ll dive more into the managed virtual network model. For now, just be aware that is exists, it’s in preview, and it doesn’t have many of the limitations the Foundry Agent Service VNet injection model has. There are considerations no doubt, but my personal take is it’s the better of the two strategically.

On the APIM instance I configured two backend objects, one for each Foundry instance. The backends are organized into a pooled backend so I could load balance across the two Foundry instances to maximize my TPM (tokens per minute). I defined four APIs. Two APIs support the Azure OpenAI inferencing and authoring API, one supports the Azure OpenAI v1 API, and the last is a simple custom Hello World API I use to test connectivity. I use two APIs for the Azure OpenAI inferencing and authoring API because one is designed to support APIM as an AI Gateway uses some custom policy snippets and the other is very generic and is used to test model gateway connections from Foundry purely so I’m familiar with the basics of them.

Foundry Architecture

The Foundry architecture is quite simple. I deployed a single instance of Foundry configured to support standard agents and using a VNet injection model. A subnet is delegated in a different spoke to support the agent vnet injection and supporting Private Endpoints are deployed to a separate subnet in that same virtual network.

The whole setup looks something like the below:

Setting up the AI Gateway

At this point you should have a good understanding of what I’m working with. Let’s talk button pushing. The first thing you’ll need to do is get your AI Gateway setup. To setup the APIM instance I using the Terraform AzureRM and AzApi providers. Like I mentioned above, it was setup as a v2 with the standard SKU public network access disabled, inbound access restricted to private endpoints and outbound access configured for VNet integration. You can find the whole of the code in my lab repository if you’re curious. For the purposes of the post, I’ll only be including the relevant snippets.

One critical thing to take note of is whatever networking model you choose for APIM for this integration, you need to use a certificate issued by a trusted public CA (certificate authority). This is required because at the date of this post, the agent service does not support certificates issued by private CAs. Reason being, you have no ability to inject that root and intermediate certs into the trusted store of the agent compute. For this lab I used the Terraform Acme and Cloudflare providers. It’s actually not bad at all to have a fresh cert provisioned directly as part of the pipeline for labbing and the like, and best part is it’s free for cheap people like myself. There is a sample of that code in the repo.

As I mentioned in my last post, the BYO AI Gateway integration with Foundry supports static or dynamic setup. In the static model you define the models directly in the connection metadata you want to be made available to the connection (see my last post for an example). In the dynamic model the models can be fetched by an API call to the management.azure.com API. This latter option requires additional operations be defined in the API such as what you see below.

			
## Create an operation to support getting a specific deployment by name when using the Foundry APIM connection
##
resource "azurerm_api_management_api_operation" "apim_operation_openai_original_get_deployment_by_name" {
  depends_on = [
    azurerm_api_management_api.openai_original
  ]
  operation_id        = "get-deployment-by-name"
  api_name            = azurerm_api_management_api.openai_original.name
  api_management_name = azurerm_api_management.apim.name
  resource_group_name = azurerm_resource_group.rg_ai_gateway.name
  display_name        = "Get Deployment by Name"
  method              = "GET"
  url_template        = "/deployments/{deploymentName}"
  template_parameter {
    name     = "deploymentName"
    required = true
    type     = "string"
  }
}
## Create an operation to support enumerating deployments when using the Foundry APIM connection
##
resource "azurerm_api_management_api_operation" "apim_operation_openai_original_list_deployments_by_name" {
  depends_on = [
    azurerm_api_management_api_operation_policy.apim_policy_openai_original_get_deployment_by_name
  ]
  operation_id        = "list-deployments"
  api_name            = azurerm_api_management_api.openai_original.name
  api_management_name = azurerm_api_management.apim.name
  resource_group_name = azurerm_resource_group.rg_ai_gateway.name
  display_name        = "List Deployments"
  method              = "GET"
  url_template        = "/deployments"
}

		

You then define a policy for that operation to configure it to call the correct endpoint via the ARM API like below. Notice I used the authentication-managed-identity policy snippet to use the APIM managed identity to call the Foundry resource to fetch deployment information. If you’re sharing the API across backends, make sure all backends have all the same models deployed. If not, you’ll need to incorporate some additional logic to hit the backend for each pool to ensure you don’t return models that don’t exist in a specific backend. This will require your APIM instance managed identity to have at least the Azure RBAC Reader role over the Foundry resources.

			
## Create an policy for the get deployment by name operation to route to the Foundry APIM connection
##
resource "azurerm_api_management_api_operation_policy" "apim_policy_openai_original_get_deployment_by_name" {
  depends_on = [
    azurerm_api_management_api_operation.apim_operation_openai_original_get_deployment_by_name,
  ]
  api_name            = azurerm_api_management_api.openai_original.name
  operation_id        = azurerm_api_management_api_operation.apim_operation_openai_original_get_deployment_by_name.operation_id
  api_management_name = azurerm_api_management.apim.name
  resource_group_name = azurerm_resource_group.rg_ai_gateway.name
  xml_content = <<XML
    <policies>
      <inbound>
        <authentication-managed-identity resource="https://management.azure.com/" />
        <rewrite-uri template="/deployments/{deploymentName}?api-version=${local.ai_services_arm_api_version}" copy-unmatched-params="false" />
        <!--Specify a Foundry deployment that has the models deployed -->
        <set-backend-service base-url="https://management.azure.com${azurerm_cognitive_account.ai_foundry_accounts[keys(local.ai_foundry_regions)[0]].id}" />
      </inbound>
      <backend>
        <base />
      </backend>
      <outbound>
        <base />
      </outbound>
      <on-error>
        <base />
      </on-error>
    </policies>
  XML
}
## Create an policy for the list deployments operation to route to the Foundry APIM connection
##
resource "azurerm_api_management_api_operation_policy" "apim_policy_openai_original_list_deployments_by_name" {
  depends_on = [
    azurerm_api_management_api_operation.apim_operation_openai_original_list_deployments_by_name
  ]
  api_name            = azurerm_api_management_api.openai_original.name
  operation_id        = azurerm_api_management_api_operation.apim_operation_openai_original_list_deployments_by_name.operation_id
  api_management_name = azurerm_api_management.apim.name
  resource_group_name = azurerm_resource_group.rg_ai_gateway.name
  xml_content = <<XML
    <policies>
      <inbound>
        <authentication-managed-identity resource="https://management.azure.com/" />
        <rewrite-uri template="/deployments?api-version=${local.ai_services_arm_api_version}" copy-unmatched-params="false" />
        <!--Azure Resource Manager-->
        <set-backend-service base-url="https://management.azure.com${azurerm_cognitive_account.ai_foundry_accounts[keys(local.ai_foundry_regions)[0]].id}" />
      </inbound>
      <backend>
        <base />
      </backend>
      <outbound>
        <base />
      </outbound>
      <on-error>
        <base />
      </on-error>
    </policies>
  XML
}

		

In my lab, I defined these two operations for both the classic (OpenAI Inferencing and Authoring API) and v1 API. This allowed me to mess around with both static and dynamic APIM and Model Gateway connections.

Once you get Foundry hooked into APIM using this integration (and I’ll cover the Foundry part in the next post), you get access to some pretty neat information in the headers. As of the date of this post, these will be some of the headers you’ll see. You’ll notice my x-forwarded-for path includes my endpoint’s IP address as well as the IP of the container running in the managed Microsoft-compute environment (notice that is using CGNAT IP space which clears up why CGNAT is unsupported to be used by the customer when using agent with VNet injection). The x-ms-foundry-project-id is the unique project GUID of the project the agent was created under (could be useful for throttling and logging). The x-ms-foundry-agent-id is the unique agent identifier of the specific revision of the agent (again useful for logging and throttling). The x-ms-client-request-id is actually the Foundry project managed identity, not the agent identity which is important to note. If you want to use Entra for the BYO AI Gateway APIM connection, you’re going to be limited to this or API key. There is a connection authentication option to use the agent’s actual Entra ID Agent Identity, but I’ve only used that for the MCP Server feature of Foundry, never for this so I’m not sure if it works or is supported.

			
{
  "Authorization": "Bearer REDACTED",
  "Content-Length": "474",
  "Content-Type": "application/json; charset=utf-8",
  "Host": "apimeusXXXXX.azure-api.net",
  "Max-Forwards": "10",
  "Correlation-Context": "leaf_customer_span_id=173926958944XXXXXX",
  "traceparent": "00-62ff160923b2c1724242c037be40e7cb-4f1b402461aXXXXX-01",
  "X-Request-ID": "96534855-a35a-481a-886d-XXXXXXXXXXXX",
  "x-ms-client-request-id": "76ddf586-260b-4e37-8f4c-XXXXXXXXXXXX",
  "openai-project": "sampleproject1",
  "x-ms-foundry-agent-id": "TestAgent-ai-gateway-static:5",
  "x-ms-foundry-model-id": "conn1apimgwstaticopenai/gpt-4o",
  "x-ms-foundry-project-id": "455cbebf-a0bc-425e-99f6-XXXXXXXXXXX",
  "x-forwarded-for": "100.64.9.87;10.0.9.213:10095",
  "x-envoy-external-address": "100.64.9.87",
  "x-envoy-expected-rq-timeout-ms": "1800000",
  "x-k8se-app-name": "j8820ec0658b4aeXXXXX-dataproxy--vuww7ja",
  "x-k8se-app-namespace": "wonderfulsky-a2fXXXXX",
  "x-k8se-protocol": "http1",
  "x-k8se-app-kind": "web",
  "x-ms-containerapp-name": "j8820ec0658b4aeXXXXX-dataproxy",
  "x-ms-containerapp-revision-name": "j8820ec0658b4aeXXXXX-dataproxy--vuww7ja",
  "x-arr-ssl": "2048|256|CN=Microsoft Azure RSA TLS Issuing CA 04;O=Microsoft Corporation;C=US|CN=*.azure-api.net;O=Microsoft Corporation;L=Redmond;S=WA;C=US",
  "x-forwarded-proto": "https",
  "x-forwarded-path": "/v1/https/apimeusXXXXX.azure-api.net/openai/deployments/gpt-4o/chat/completions?api-version=2025-03-01-preview",
  "X-ARR-LOG-ID": "76ddf586-260b-4e37-8f4c-XXXXXXXXXXXX",
  "CLIENT-IP": "10.0.9.213:10095",
  "DISGUISED-HOST": "apimeusXXXXX.azure-api.net",
  "X-SITE-DEPLOYMENT-ID": "apimwebappXXXXXX6OTVsZqxOcTZLpubQ9iNmzQ8kzMOmkEhw",
  "WAS-DEFAULT-HOSTNAME": "apimwebappXXXXXX6otvszqxoctzlpubq9inmzq8kzmomkehw.apimaseXXXXXXX6otvszqxoctz.appserviceenvironment.net",
  "X-AppService-Proto": "https",
  "X-Forwarded-TlsVersion": "1.3",
  "X-Original-URL": "/openai/deployments/gpt-4o/chat/completions?api-version=2025-03-01-preview",
  "X-WAWS-Unencoded-URL": "/openai/deployments/gpt-4o/chat/completions?api-version=2025-03-01-preview",
  "X-Azure-JA4-Fingerprint": "t13d1113h2_d3731e0d3936_XXXXXXXXXXXX"
}

		

Using the information above, I crafted the policy below. It’s nothing fancy, but shows an example of throttling based on the project id and logging the agent identifier via the token metrics policy to potentially make chargeback more granular. Either way, these additional headers give you more to play with.

			
## Create an API Management policy for the OpenAI v1 API
##
resource "azurerm_api_management_api_policy" "apim_policy_openai_v1" {
  depends_on = [
    azurerm_api_management_api.openai_v1
  ]
  api_name            = azurerm_api_management_api.openai_v1.name
  api_management_name = azurerm_api_management.apim.name
  resource_group_name = azurerm_resource_group.rg_ai_gateway.name
  xml_content = <<XML
    <policies>
      <inbound>
          <base />
          <!-- Evaluate the JWT and ensure it was issued by the right Entra ID tenant -->
          <validate-jwt header-name="Authorization" failed-validation-httpcode="403" failed-validation-error-message="Forbidden">
              <openid-config url="https://login.microsoftonline.com/${var.entra_id_tenant_id}/v2.0/.well-known/openid-configuration" />
              <issuers>
                  <issuer>https://sts.windows.net/${var.entra_id_tenant_id}/</issuer>
              </issuers>
          </validate-jwt>
          <!-- Extract the Entra ID application id from the JWT -->
          <set-variable name="appId" value="@(context.Request.Headers.GetValueOrDefault("Authorization",string.Empty).Split(' ').Last().AsJwt().Claims.GetValueOrDefault("appid", "none"))" />
          <!-- Extract the Agent ID from the x-ms-foundry-agent-id header. This is only relevant for Foundry native agents -->
          <set-variable name="agentId" value="@(context.Request.Headers.GetValueOrDefault("x-ms-foundry-agent-id", "none"))" />
          <!-- Extract the project GUID from the x-ms-foundry-project-id header. This is only relevant for Foundry native agents -->
          <set-variable name="projectId" value="@(context.Request.Headers.GetValueOrDefault("x-ms-foundry-project-id", "none"))" />
          <!-- Extract the Foundry Project name from the "openai-project" header. This is only relevant for Foundry native agents -->
          <set-variable name="projectName" value="@(context.Request.Headers.GetValueOrDefault("openai-project", "none"))" />
          <!-- Extract the deployment name from the uri path -->
          <set-variable name="uriPath" value="@(context.Request.OriginalUrl.Path)" />
          <set-variable name="deploymentName" value="@(System.Text.RegularExpressions.Regex.Match((string)context.Variables["uriPath"], "/deployments/([^/]+)").Groups[1].Value)" />
          <!-- Set the X-Entra-App-ID header to the Entra ID application ID from the JWT -->
          <set-header name="X-Entra-App-ID" exists-action="override">
              <value>@(context.Variables.GetValueOrDefault<string>("appId"))</value>
          </set-header>
          <set-header name="X-Foundry-Agent-ID" exists-action="override">
              <value>@(context.Variables.GetValueOrDefault<string>("agentId"))</value>
          </set-header>
          <set-header name="X-Foundry-Project-Name" exists-action="override">
              <value>@(context.Variables.GetValueOrDefault<string>("projectName"))</value>
          </set-header>
          <set-header name="X-Foundry-Project-ID" exists-action="override">
              <value>@(context.Variables.GetValueOrDefault<string>("projectId"))</value>
          </set-header>
          <choose>
            <!-- If the request isn't from a Foundry native agent and is instead an application or external agent -->
            <when condition="@(context.Variables.GetValueOrDefault<string>("agentId") == "none" && context.Variables.GetValueOrDefault<string>("projectId") == "none")">
              <!-- Throttle token usage based on the appid -->
              <llm-token-limit counter-key="@(context.Variables.GetValueOrDefault<string>("appId","none"))" estimate-prompt-tokens="true" tokens-per-minute="10000" remaining-tokens-header-name="x-apim-remaining-token" tokens-consumed-header-name="x-apim-tokens-consumed" />
              <!-- Emit token metrics to Application Insights -->
              <llm-emit-token-metric namespace="openai-metrics">
                  <dimension name="model" value="@(context.Variables.GetValueOrDefault<string>("deploymentName","None"))" />
                  <dimension name="client_ip" value="@(context.Request.IpAddress)" />
                  <dimension name="appId" value="@(context.Variables.GetValueOrDefault<string>("appId","00000000-0000-0000-0000-000000000000"))" />
              </llm-emit-token-metric>
            </when>
            <!-- If the request is from a Foundry native agent -->
            <otherwise>
              <!-- Throttle token usage based on the agentId -->
              <llm-token-limit counter-key="@($"{context.Variables.GetValueOrDefault<string>("projectId")}_{context.Variables.GetValueOrDefault<string>("agentId")}")" estimate-prompt-tokens="true" tokens-per-minute="10000" remaining-tokens-header-name="x-apim-remaining-token" tokens-consumed-header-name="x-apim-tokens-consumed" />
              <!-- Emit token metrics to Application Insights -->
              <llm-emit-token-metric namespace="llm-metrics">
                  <dimension name="model" value="@(context.Variables.GetValueOrDefault<string>("deploymentName","None"))" />
                  <dimension name="client_ip" value="@(context.Request.IpAddress)" />
                  <dimension name="agentId" value="@(context.Variables.GetValueOrDefault<string>("agentId","00000000-0000-0000-0000-000000000000"))" />
                  <dimension name="projectId" value="@(context.Variables.GetValueOrDefault<string>("projectId","00000000-0000-0000-0000-000000000000"))" />
              </llm-emit-token-metric>
            </otherwise>
          </choose>
          <choose>
            <!-- If the request is from a Foundry native agent -->
            <when condition="@(context.Variables.GetValueOrDefault<string>("agentId") != "none" && context.Variables.GetValueOrDefault<string>("projectId") != "none")">
            <authentication-managed-identity resource="https://cognitiveservices.azure.com/" />
            </when>
          </choose>
          <set-backend-service backend-id="${module.backend_pool_aifoundry_instances_openai_v1.name}" />
      </inbound>
      <backend>
          <forward-request />
      </backend>
      <outbound>
          <base />
      </outbound>
  </policies>
XML
}

		

Summing it up

I was going to go crazy and incorporate the Foundry setup and testing into this post as well but decided against it. There is a point when the brain melts and if mine is already melting, yours may be as well. I’ll walk through those pieces in the next post. You have a few main takeaways. First, let’s review the high level setup of your AI Gateway.

Create your backends that point to the Microsoft Foundry endpoints.
Import the relevant API. If at all possible, go with the v1 API. It will support access to other models besides OpenAI models and additional features.
Add the GET and LIST operations and define the relevant policies if you’re planning on supporting dynamic models vs static. Dynamic seems to make more sense to me, but I haven’t seen enough orgs adopt this yet to form a good opinion.
Craft your custom policies. I highly recommend you regularly review the headers being passed. They could change and even better data may be added to them.

Next, let’s talk about key gotchas.

The certificate used on your AI Gateway MUST be issued from a well-known public CA in order for it to be trusted by the agent running in Foundry comptue. If it isn’t, this integration will fail and may not fail in a way that is obvious the TLS session failure between the agent compute and the AI Gateway is to blame.
If you’re using APIM, think about the Private Endpoint and VNet integration pattern if you’re capable of using v2. If it won’t work for you, or you’re still using the classic SKU, if you want to support managed VNet you’ll need to incorporate an Application Gateway in front of your AI Gateway likely. This means more operational overhead and costs.
While every Foundry Agent (v2) is given an Entra ID Agent Identity created from the Entra ID Agent Blueprint associated to the project, when using the ProjectManagedIdentity authentication type, you’ll see the project’s managed identity in the logs. If you’re able to test with the agent identity authentication type, let me know.
Really noodle on how you can use the project headers for throttling and possibly chargeback. It makes a ton of sense if you’re aligning your Foundry account and project model correctly.

See you next post!

Network Security Perimeters – NSPs in Action – Key Vault Example

Posted on September 10, 2025 by mattfeltonma

This is part of my series on Network Security Perimeters:

Welcome back to my third post in my NSP (Network Security Perimeter) series. In this post I’m going to start covering some practical use cases for NSPs and demonstrating how they work. I was going to group these use cases in a single post, but it would have been insanely long (and I’m lazy). Instead, I’ll be covering one per post. These use cases are likely scenarios you’ve run into and do a good job demonstrating the actual functionality.

A Quick Review

In my first post I broke PaaS services into compute-based PaaS and service-based PaaS. NSPs are focused on solving problems for service-based PaaS. These problems include a lack of outbound network controls to mitigate data exfiltration, inconsistent offerings for inbound network controls across PaaS services, scalability with inbound IP whitelisting, difficulty configuring and managing these controls at scale, and inconsistent quality of logs across services for simple fields you’d expect in a log like calling inbound IP address.

**Compute-based PaaS vs Service-based PaaS**

My second post walked through the components that make up a Network Security Perimeter and their relationships to each other. I walked through each of the key components including the Network Security Perimeter, profiles, access policies, and resource associations. If you haven’t read that post, you need to read it before you tackle this one. My focus in this post will be where those resources are used and will assume you grasp their function and relationships.

**Network Security Perimeter components and their relationships**

With that refresher out of the way, let’s get to the good stuff.

Use Case 1: Securing Azure Key Vaults

Azure Key Vault is Microsoft’s native PaaS offering for secure key, secret, and certificate storage. Secrets sometimes need to be accessed by Microsoft SaaS (software-as-a-service) and compute-based PaaS that do not support virtual network injection or a managed virtual network concept, such as some use cases for PowerBI. Vaults are used by 3rd-party products outside of Azure or from another CSP (cloud service provider). There are also use cases where a customer may be new to Azure and doesn’t yet have the necessary hybrid connectivity and Private DNS configuration to support the usage of Private Endpoints. In these scenarios, Private Endpoints are not an option and the traffic needs to come in the public endpoint of the vault. Here is our use case for NSPs.

**Services accessing Key Vault for secrets**

Historically, folks would try to solve this with IP whitelisting on the Key Vault service firewall. As I covered in my first post, this is a localized configuration to the resource and can be mucked with by the resource owner unless permissions are properly scoped or an Azure Policy is used to enforce a specific configuration. This makes it difficult to put network control in the hands of security while leaving the rest of the configuration of the resource to the resource owner. Another issue that sometimes pops up with this pattern is hitting the maximum of 400 prefixes for the service firewall rules.

NSPs provide us with a few advantages here:

Network security can be controlled by the NSP and that NSP can be controlled by the security team while leaving the rest of the resource configuration to the resource owner.
You can allow more than 400 inbound IP rules (up to 500 in my testing). Sometimes a few extra IP prefixes is all you needed back with the service firewall.

In this type of use case, we could do something like the below. Here we have a single Network Security Perimeter for our two categories of Key Vaults for a product line. Category 1 Key Vaults need to be accessed by 3rd-party SaaS and applications that do not have the necessary network path to use Private Endpoints. Category 2 are Key Vaults used by internally facing application within Azure and those Key Vaults need to be restricted to a Private Endpoint.

Public Key Vault

For this scenario we can build something like in the image above where we have a single NSP with two profiles. One profile will be used by our “public” Key Vaults. This profile will be associated with an access rule that allows a set of trusted IP addresses from a 3rd-party SaaS solution. The other profile will have no associated access rules, thus blocking all access to the Key Vault over the public endpoint. Both resource associations will be set to enforced to ensure the the NSP rules override the local service firewall rules.

Let’s take a look at this in action.

For this scenario, I have an NSP design setup exactly as above. The access rule applied to my public vault has the IP address of my machine as seen below:

**Profile access rule for publicly-facing Key Vault**

At this point my vault isn’t associated with the NSP yet and it has been configured to allow all public network access. Attempts at accessing the vault from a random public IP shows successful as would be expected.

**Successful retrieval of secret from untrusted IP address prior to NSP association**

Next I associate the vault to the NSP and set it to enforced mode. By default it will be configured in Transition mode (see my second post for detail on this) which means it will log whether the traffic would be allowed or denied but it won’t block the traffic. Since I want the NSP to override the local service firewall, I’m going to set it to enforced.

When trying to pull the secret from the vault using a machine with the trusted public IP listed in the access rule associated to the profile, I’m capable of getting the secret.

**Successful call from trusted IP listed in NSP profile access rule**

If I attempt to access the secret from an untrusted IP, even with the service firewall on the vault configured to allow all public network access, I’m rejected with the message below.

**Denied call from an untrusted IP due to NSP**

Review of the logs (NSPAccessLogs table) shows that the successful call was due to the access rule I put in place and the denied call triggered DefaultDenyAll rule.

Now what about my private vault? Let’s take a look at that one next.

Private Key Vault

For this scenario I’m going to use the second profile in the NSP. This profile doesn’t have any associated access rules which effectively blocks all traffic to the public endpoint originating from outside the NSP. My goal is to make this vault accessible only from a private endpoint.

First, I associate the resource to the NSP profile and configure it in enforced mode.

**Private Key Vault associated to profile in enforced mode**

This is another vault where I’ve configured the service firewall to allow all public network access. Attempting to access the resource throws the message indicating the NSP is blocking access.

**Denied call from a public IP when NSP denies all public access**

I’ve created a Private Endpoint for this vault as well. As I covered earlier in this series, NSPs are focused on public access and do not limit Private Endpoint access, so that means it doesn’t log access from a Private Endpoint, right? Wrong! A neat feature of NSP wrapped resources is those the NSPs will allow the traffic and log it as seen below.

**NSP log entry showing access through Private Endpoint**

In the above log entry you’ll see the traffic is labeled as private indicating it’s traffic being allowed through the DefaultAllowAll rule and the TrafficType set to Private because it’s coming in through a Private Endpoint. Interestingly enough, you also get the operation that was being performed in the request. I could have sworn these logs used to include the specific Private Endpoint resource ID the traffic ingressed from, but perhaps I imagined that or it was removed when the service graduated to GA (generally available).

Summing it up

In this post I gave an overview of a simple use case that many folks may have today. You could easily sub out Key Vault for any of the other supported PaaS that has a similar public endpoint access model and the setup will be the same. Here are some key takeaways:

NSPs allow you to enforce public access network controls regardless how the resource owner configures the service firewall on the resource.
Profiles seem to support a maximum of 500 IP prefixes for inbound and 500 for outbound. This is more than the 400 available in the service firewall. This is based on my testing and no idea if it’s a soft or hard limit.
NSPs provide a standardized log format for network access. No more looking at 30 different log schemas across different resources, half of which don’t contain network information or someone drank too much tequila and decided to mask an octet of the IP. Additionally, they will log network access attempts through Private Endpoints.

In my next post I’ll cover a use where we have two resources in the same NSP that communicate with each other.

See you next post!

Network Security Perimeters – NSP Components

Posted on August 26, 2025 by mattfeltonma

This is part of my series on Network Security Perimeters:

Welcome back fellow geeks!

Today I will be continuing my series on NSPs (Network Security Perimeters). In the last post I outlined the problems NSPs were built to solve. I covered how users of Azure have historically controlled inbound and outbound traffic for PaaS (platform-as-a-service) in Azure and the gaps and challenges of those designs. In this post I’m going to dig into the components that make up an NSP, what their function is, how they’re related, and how they work together.

A Quick Review

Before I dive into the gory details of NSP primitives, I want to do a quick refresh on terminology I’ll be using in this post. As I covered in the last post, I divide PaaS services in Azure into what I call compute-based PaaS and service-based PaaS. Service-based PaaS is PaaS where you upload data but don’t control the code executed by the PaaS whereas with compute-based PaaS you control the code executed within the service. NSPs shine in the service-based PaaS realm and that will be my focus for this series.

Securing service-based PaaS with the traditional tooling of IP whitelisting, service whitelisting, resource whitelisting, resource-specific outbound controls (and they are very rare) presented the problems below:

Issues at scale (IP whitelisting).
Certain features were not available in all PaaS (resource-based whitelisting or product-specific outbound controls).
The configuration for inbound network control features lived as properties of the resource and could be configured differently by different teams resulting in inconsistent configurations.
Logging widely differed across products.

All the challenges above demanded a better more standardized solution, and that’s where NSPs come in.

Network Security Perimeter Components

I’m a huge fan of breaking down any technology into the base components that make it up. It makes it way easier to understand how the hell these things work together holistically to solve a problem. Let’s do that.

Network Security Perimeter Hierarchy

At the highest level is the Network Security Perimeter resource. This is what I refer to as a top-level resource, meaning a resource that exists directly under an Azure Resource Provider and is visible under a resource group. The Network Security Perimeter resource exists in the Microsoft.Network resource provider, is regional in nature (vs global), and serve as the outer container for the logic of the NSP.

The resource is very simple and the only properties of note are name, location, and tags.

Profiles

Underneath the Network Security Perimeter resource is the Profile resource. The easiest way to think about a profile is that it’s a collection of inbound and outbound network access rules that you plan to tie to a resource or resources. Each Network Security Perimeter resource can have 1 or more profiles. It’s recommended you have less than 200 profiles, however, I have trouble thinking of a use case for that many unless you’re in an older and more legacy subscription model where you’re packing everything into as few subscriptions as possible (not where you should be these days).

From mucking around with profiles, I can see the use for putting resources in the same NSP into different profiles. For example, I may wrap an NSP around a Storage Account and the Key Vault which holds the CMK (customer-managed key) used to encrypt the Storage Account. I’d likely want one profile for the storage account with a large set of inbound rules and another profile for the Key Vault with a profile with no inbound or outbound rules restricting the Key Vault to be accessed by the Storage Account. My CI/CD pipeline could reach the Key Vault via a Private Endpoint as NSPs DO NOT affect traffic ingressing through the resource’s Private Endpoint. Resources within the same NSP can communicate with each other as long as they’re associated with a managed identity (according to docs, still want to test and validate this myself).

Access Rules

Next up you have the Access Rule resource which exists underneath the Profile resource. This is where the the rubber hits the road. Access rules should be familiar to you old folks like myself as they are very similar to firewall rules (at least today). You have a direction (inbound or outbound) and some type of type of property to filter by. For example, as of the date of this post, inbound traffic can support filtering on subscriptions (very similar to the resource whitelisting in Azure Storage) and IP-based rules (similar to IP whitelisting of traditional service firewall). Outbound traffic can be filtered by FQDN (fully qualified domain name). You can have up to 200 rules per profile (this is a hard limit).

If you’re nosy like I am, you may have glanced at the API reference. Inside the API reference, there are additional items that may someday come to fruition such as email addresses and phone numbers (really curious as to the use cases for these) and service tags (DAMN handy but not yet usable as of the date of this post).

**Network Security Perimeter Profile Access Rules**

Resource Association

The Resource Association resource exists underneath the Network Security Perimeter. Resource associations connect a supported Azure resource to Network Security Perimeter’s profile and dictate an access mode. There are two documented access modes today which include learning (since renamed transition mode in documentation), and enforced. Transition mode (or learning mode in the APIs) is the default mode and is the mode you’ll start with to understand the inbound and outbound traffic of the resources associated with the NSP. Only after you understand those patterns should you switch to enforced mode. In enforced mode the access rules applied the relevant profile will take effect and all inbound and outbound traffic outside the NSP will be blocked unless explicitly allowed.

Audit mode is a third mode that is available via the API but isn’t mentioned in main public documentation. Your use case for audit mode as far as I can see is if you (say you’re information security) want to audit inbound and outbound traffic out of PaaS resources that are associated to a Network Security Perimeter you do not control.

To answer a few questions that I know immediately popped in my head:

Yes, you can associate resources to a network security perimeter that is in different a subscription.
No, you cannot associate a resource to multiple profiles in enforced mode in the same Network Security Group. Associations to other profiles must be configured in access mode.
No, you cannot associate a resource to multiple profiles in enforced mode in different Network Security Groups. Associations to other profiles must be configured in access mode.

**Network Security Perimeter Resource Associations**

Transition and Enforcement Mode

I want to dig a bit more into transition and enforcement mode and how they affect the resource-level service firewall controls.

In transition mode (which is the default mode) the NSP will evaluate the access rules of the profile the resource is associated with and will log whether the traffic would be allowed or denied (if diagnostic logging is enabled which you most definitely should enable and I’ll cover quickly below but more in depth in a future post) by the NSP and will then fallback to obeying the rules defined in the resource’s service firewall. This is a stellar way for you to get a baseline on how much shit will break with your new rules (and oh yes will shit break for some folks out there).

Once you understand the traffic patterns and design rules around what you want to allow in via the public side of the PaaS, you can flip the resource association to enforced mode. Once you flip that switch, the associated resources will have public access blocked both inbound and outbound. If you have IP whitelisting defined, those rules are ignored. If you checked off the “Allow Microsoft trusted services”, those rules are ignored. As I mentioned above, access through a Private Endpoint is never affected by NSPs as NSPs are concerned with traffic ingressing or egressing from the public sides of the PaaS resource.

It’s worth noting that a new value of the publicNetworkAccess property has been introduced. The publicNetworkAccess property is a property standard to PaaS (I haven’t seen one without it yet). Typically, the two values are either Enabled or Disabled. The property will be typically be set to Enabled if you’re allowing all public access or are restricting it to specific IP addresses, trusted services, specific resources, or service endpoints. It will be typically Disabled when you are blocking all public access and restricting it to Private Endpoints. Note that I use the word typically because this is Azure and there seem to always be exceptions. The new value introduced is SecuredByPerimeter. If you set the publicNetworkAccess property to this the resource is completely locked down to public traffic except for what is allowed in the NSP (even if no NSP is applied). If you plan on transitioning to NSPs fully for controlling public access (once all the resources you need have been onboarded and cross-NSP linking has been introduced) this is the setting you’ll want to go with.

Logging

The best feature of NSPs (in this dude’s opinion) is the logging feature. Like other Azure resources, NSPs support diagnostic settings. These are configured on a per Network Security Perimeter resource and include a plethora of logs categories. These diagnostic settings allow you to turn on detailed logging of traffic that comes in and out of an NSP. This provides you a standard log for all inbound and outbound traffic vs relying on the resource-level logs which can vary greatly per service (again, looking at you AI Search!).

Imagine being able to confirm that a user’s machine is accessing resource over the Microsoft public backbone vs a Private Endpoint and being able to see exactly what IP they’re using to do it. I can’t tell you how helpful this would have been over the past few years where forward web proxies were in use and were incorrectly configured for DNS affecting access to Private Endpoints.

This is such a damn cool feature that it is deserving of its own deep dive post. I’ll be adding this over the next few weeks.

Summing it up

For the purposes of this post, I really want you to take away an understanding of the NSP primitives, what they do, and how they’re related. Do some noodling on how you think you’d use these primitives and what use cases you might apply them to.

Here are some key points to walk away with:

Network Security Perimeters rules apply only to public traffic, this means traffic incoming to the service’s public IP on the Microsoft public backbone or leaving the service the Microsoft public backbone. They do not affect traffic coming in through a Private Endpoint.
Resources can be associated to Network Security Perimeters across subscriptions.
Leverage transition mode (learning mode in the API) to understand what traffic is coming in and going out of your resources over the public endpoints and how your access rules may effect them.
For resources that support NSPs (and are generally available) think about deploying those resources (it’s only a few as of the date of this post) with the publicAccessProperty set to SecuredByPerimeter at creation to ensure all public network access is controlled by a Network Security Perimeter. This will force you to use NSPs to control that traffic. Auditing or enforcing with Azure Policy would be nice control on top.
Do some experimentation with the logging provided by NSPs. It will truly blow your mind how useful the logs are to troubleshooting networking issues and identifying network security threats.
Keep in mind that NSPs will block diagnostic logs from being sent to a Log Analytics Workspace, Storage Account, or Event Hub for any resources within the NSPs. If you use a centralized Log Analytics Workspace model today, you are best off keeping NSPs in transition mode until cross-NSP linking becomes available.

Before I close this out, you’re probably wondering why I didn’t cover the Links resource. Well today, this resource doesn’t do anything and isn’t mentioned in the documentation. If you’re a close observer, you’ll notice the API properties of this resource hint to a possible upcoming feature (I ain’t spoiling anything here since the REST documentation is out there) that looks to provide a feature for NSP to NSP communication. We’ll have to wait and see!

In my next post I’ll walk through three common use cases for NSPs. I’ll cover how the problem was previously solved with service firewall controls and how it can be solved with NSPs. I’ll also walk through the configuration in the Portal and through Terraform. I’ll finish off this series (unless I think of anything else) with a deep dive into NSP logging.

See you next post!

Network Security Perimeters – The Problem They Solve

Posted on August 24, 2025 by mattfeltonma

This is part of my series on Network Security Perimeters:

Hello folks!

Last month a much anticipated new feature became available. No, it wasn’t AI-related (hence why little fanfare) but instead one of the biggest network security improvements in Azure. In early August, Microsoft announced that Network Security Perimeters were finally generally available. If you’ve ever had the pain of making PaaS (platform-as-a-service) to PaaS work in Azure, or had an organization that hadn’t yet fully adopted PrivateLink Private Endpoints, or even had a use case where public access was required and there wasn’t a way around it, then Network Security Perimeters (NSPs) will be one of your favorite new features.

This is gonna be lengthy, so I’m going to divide this content into 2 -3 separate posts. In this first post I’m going to cover a bit of history and the problem NSPs were created to solve.

Types of PaaS

Like any cloud platform, Azure has many PaaS-based services. I like to mentally divide these services into compute-based PaaS (where you upload code and control the actions the code performs) and service-based PaaS (where you upload data but don’t control the code that executes the actions). Examples of compute-based PaaS would be App Services, Functions, AKS, and the like. Service-based PaaS would be things like Storage, Key Vault, and AI Search. I’m sure there are other more effective ways to divide PaaS, but this is the way I’m gonna do it and you’ll have to like it for the purposes of this post.

Compute-based PaaS has historically been easier to control both inbound and outbound given because the feature set to do that was built directly into the product. These control mechanisms consisted of Private Endpoints or VNet injection to control inbound traffic and VNet integration and VNet injection to control outbound traffic.

Controlling service-based PaaS was a much different story.

Prior to the introduction of PrivateLink Private Endpoint support for service-based PaaS customers controlled inbound traffic destined for the public IP of the service instance using what I refer to as the service firewall. The service firewall has a few capabilities to control inbound traffic to the public IP address including:

IP-based whitelisting
Service-based whitelisting (via Trusted Microsoft Services)
Subnet whitelisting (via Service Endpoints)
Resource-based whitelisting

Not every service-based PaaS supports all of these capabilities. For example, resource-based whitelisting via the service firewall is only available in Storage today.

Service Firewall – IP Whitelisting

IP-based whitelisting is exactly what you think it is. You plug in an individual public IP or public IP CIDR (consider this a rule) and those sources are allowed through the service firewall and can create TCP connections. This feature was commonly used to allow specific customer IP addresses, often linked to forward web proxies, access to the service-based PaaS prior to the organization implementing Private Endpoints. The major consideration of this feature is there is a finite number of rules (400 last I checked) you can have. While this worked well for the forward web proxy use case, it would often be insufficient for PaaS to PaaS communication (such as Storage to Key Vault to retrieve a CMK for encryption or attempting to whitelist the entire Power BI service prior to its own vnet integration support).

Service Firewall – Service Whitelisting

Next up you had service-based whitelisting through toggling the “Trusted Microsoft Services” option. This toggle would allow all of the public IP addresses belonging a specific set of Microsoft services which differs on a per service-based PaaS basis. This means that toggling that switch for Storage allows a different set of services versus toggling that for Key Vault. The differing of what constitutes a trusted service made this option painful because the documentation on what’s trusted for each service has never been documented well. Additionally, this allows ALL public IPs from that service used by any Microsoft customer not just the public IPs used by your instances of these services. This particular risk was always a pain point for most organizational security teams. Unfortunately, in the past, this was required to be enabled to facilitate any service-based PaaS to service-based PaaS communication. Even worse is the listing of trusted services didn’t include the service you actually needed.

**Service Firewall – Trusted Microsoft Services exception**

Service Firewall – Virtual Network Service Endpoints

Next up you have Virtual Network Service Endpoints (or subnet whitelisting as I think of them). Service Endpoints were the predecessor to Private Endpoints. They allowed you to limit access to the public IP address of an Azure PaaS instance based on the virtual network (really the subnet in the virtual network) id. They do this by injecting specific routes into the virtual network’s subnet where they are deployed. When the traffic egresses the subnet, it is tagged with an identifier which is used in the ACL (access control list) of the PaaS instance. These routes can be seen when viewing the effective routes on a NIC (network interface card) of a virtual machine running in the subnet.

**Service Firewall – Service Endpoints Routes**

You would then whitelist the virtual network’s subnet on the PaaS instance allowing that traffic to flow the PaaS instance’s public IP address.

**Service Firewall – Service Endpoint Rules**

The major security issue with Service Endpoints is they create a wide open data exfiltration point. Any traffic leaving a compute within that subnet will traverse directly to the PaaS instance and will ignore any UDRs (since the UDRs are less specific) bypassing any customer NVA (network virtual appliance like a firewall). Unlike Private Endpoints, Service Endpoints aren’t limited to a specific instance of your PaaS but allow network access to all instances of that PaaS. This means an attacker could take advantage of a Service Endpoint to exfiltrate data to their malicious PaaS instance with zero network visibility to the customer. Gross. There were attempts to address this gap with Service Endpoint Policies which allow you to limit the egress to specific PaaS instances, but these never saw wider adoption than storage.

Operationally, these things are a complete shit show. First off, they are non-routeable outside the subnet so they do you no good from on-premises. The other pain point with them is customers will often implement them without understanding how they actually work causing confusion on why Azure traffic is routing the way it is or trying to figure out why traffic isn’t getting to where it needs to go.

These days service endpoints have very limited use cases and you should avoid them where possible in favor of Private Endpoints. The exception being cost optimization. There is an inbound and outbound network charge for Private Endpoints which can get considerable when talking about Azure Backup or Azure Site Recovery at scale. Service Endpoints do not have that same inbound/outbound network cost and can help to reduce costs in those circumstances.

While Service Endpoints don’t really have much to do with NSPs, I did want to cover them because of the amount of confusion around how they work (and how much I hate them) and because they have traditionally been an inbound network control.

Service Firewall – Resource Whitelisting

Last but not least, we have resource whitelisting. Resource whitelisting is a service-firewall capability unique to Azure Storage. It allows you to permit inbound network connectivity to Azure Storage to a specific instance of an Azure PaaS service, all instances of a PaaS service in a subscription, or all within the customer tenant. The resource then uses it managed identity and RBAC assignments to control what it can do with the resource after it creates its network connection. This was a good example of early attempt to authorize network access based on a service identity. It was commonly used when an instance of Azure Machine Learning (AML), AI Foundry Hubs, AI Search, and the like needed to connect to storage. It provided a more restricted method of network control vs the traditional Allow Trusted Microsoft Services option.

Ok… and what about outbound?

Another major issue you should notice is the above are all inbound. What the hell do you do about outbound controls for these service-based PaaS? Well, the answer has historically been jack shit for almost every PaaS service. The focus was all about controlling the inbound network traffic and authorization of the resource to hope no one does anything shady with the resource by having it make outbound calls to other services to exfiltrate data or do something else malicious. Not ideal right?

Some service-based PaaS services came up with creative solutions around this. Resources that fall under the Cognitive Services umbrella, such as the Azure OpenAI Service got data exfiltration controls. AI Search introduced the concept of Shared Private Access which didn’t really control the outbound access, but did allow you to further secure the downstream resources AI Search accessed. Each Product Group was doing their best within the confines of their bubble.

Network Security Perimeters to the rescue

You should now see the challenges that faced inbound and outbound network controls in service-based PaaS.

Sure you had lots of options for inbound controls, but they had issues at scale (IP whitelisting) or certain features were not available in all PaaS (resource-based whitelisting).
The configuration for inbound network control features lived as properties of the resource and could be configured differently by different teams resulting in access challenges.
Outbound, you had very few controls, and if there were controls, they differed on a per-service basis.
Some services would give great details as to the inbound traffic allowed or denied (such as Azure Storage) while other services didn’t give you any of that network information (I’m looking at you AI Search).

In my next post I’ll walk through how NSPs help to solve each of these issues and why you should be planning a migration over to them as soon as possible. I’ll walk through a few common examples of how NSPs can ease the burden using common service-based PaaS to service-based PaaS. I’ll also cover how they can be extremely useful in troubleshooting network connectivity caused by routing issues or even DNS issues.

See you next post!

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Tag Archives: networking

Microsoft Foundry – BYO AI Gateway – Part 3

The foundational architecture

AI Gateway Architecture

Foundry Architecture

Setting up the AI Gateway

Summing it up