Microsoft Foundry – APIM and Model Gateway Connections Part 2

Microsoft Foundry – APIM and Model Gateway Connections Part 2

Hello again! Today I’m going to continue my series on Microsoft Foundry’s new support for the BYO AI Gateway. In my past few posts I’ve walked through the evolution of Foundry and covered at a high level what an AI Gateway is and the problem this feature solves. In this post we’re gonna get down and dirty with the technical details on setting this up. Grab your coffee and put on your thinking music (for me that is some Blink and Third Eye Blind!).

Let’s get to it!

Current State Architecture

My customer base is primarily in the regulated industry so most of my customers are still at the experimentation state with the Foundry Agent Service. Given these customers have strict security requirements they are largely using the agent service with the standard agent configuration. In this configuration the outbound traffic (subsets of it, but that is a much larger conversation) can be tunneled through the customer virtual network for centralized logging, mediation, and facilitating access to private resources (again, with limitations today) through what the product group calls VNet injection but I’d say is more closely described as VNet integration via a delegated subnet. Threads (conversations in v2 agents) and agent metadata are stored in a Cosmos DB, vector stores created by an agent from tools such as the File Search tool are stored in AI Search, and files uploaded to the Foundry resource are stored in a Storage Account. These resources are all provisioned by the customer into the customer subscription and fully managed by the customer (RBAC, encryption, HA settings, etc). Private Endpoints for each resource are created within the customer’s virtual network and made accessible from the agent delegated subnet. The whole environment looks similar to what you see below.

Foundry Agent Service – Standard Agent Configuration

As I covered in my last post, in a generally available (aka fully supported and is recommended for production) agents can only consume models deployed to the Foundry account the agents exist in. This creates an issue for customers wanting to inject the governance, visibility, and operational improvements an AI Gateway can provide when it sits between the agent and the model. For now, customers are working around doing this using third-party agents. Downfall of that is these “external” agents live on compute customers have to manage and these agents can’t access many of the tools available to Foundry-native agents. This is the problem the BYO AI Gateway feature is attempting to fix.

No BYO Gateway vs BYO Gateway

Foundry resource architecture

Here is where the new connection type introduced in Foundry comes to the rescue. Before I dive into the details of that, I think it’s helpful to level set a bit on the resource hierarchy within Foundry. At the top is the top-level Azure resource referred to as the Foundry service which under the hood is a Cognitive Services account. The relevant resources for this discussion beneath that are projects, deployments, and connections. Projects are containers connections (at the management plane) for agents (at the data plane), deployments for models which are made available to all projects within the account, and connections which can also be created at the account level and shared across all projects.

Relevant resource hierarchy

For the purposes of this discussion, I’m going to focus on the connection objects. Connection objects can be created at the account level and project level. In the standard agent configuration, you’ll create a number of different connections out of the gates including connections to Cosmos, AI Search, and Azure Storage. Additional common connections could be to an App Insights instance for tracing or a Grounding With Bing Search resource. Connection objects will contain some type of pointer, like a URI and a credential. That credential is usually API Key, some Entra ID-based authentication mechanism, or general OAuth.

Connections are created at the account level when the Foundry account itself needs to access them. This could be for the usage of Content Understanding, to a Key Vault for storing connection secrets (API keys) in a customer subscription, an an App Insights instance used for tracing. From what I’ve observed, you will create connections at the account level if they need to be shared across all projects OR they’re used by the Foundry resource in general vs some type of project construct. These connections are also created at the project level. When you provision a standard agent for example, you’ll create connection objects to the Cosmos DB, Storage Account, and AI Search resources mentioned above. The new category of connections for this post will be created at the project level.

APIM and Model Gateway Connections

The BYO AI Gateway feature uses a new type of connection category of ApiManagement and ModelGateway. These objects are the glue that allow the Foundry agents to funnel requests for models through the AI Gateway the connections point to. When we’re connecting to an APIM instance, you should ideally use the ApiManagement category and when you’re connecting to a third-party category you’ll use the ModelGateway category.

As of the date of this blog post, these connection objects have the following schema (relevant properties to this discussion only):

name: The name of the connection (needs to be less than 60 characters in my testing)
properties: {
category: ApiManagement or ModelGateway
target: The URI you want the agent to connect to
authType: For ApiManagement this can be ApiKey or ProjectManagedIdentity
credentials: This will be populated with the value of the API key if using that authType
isSharedToAll: true or false if you want this shared across all projects
# ApiManagement category with static models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
# Models discussed in detail below
models: "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
}
# ApiManagement category with dynamic discovery
metadata: {
deploymentAPIVersion: ARM API version for CognitiveServices/accounts/deployments API calls
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
}
# ModelGateway category with static models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
# Models discussed in detail below
models: "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
}
# ModelGateway category with dynamic models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
deploymentAPIVersion: ARM API version for CognitiveServices/accounts/deployments API calls
modelDiscovery: "{\"deploymentProvider\":\"AzureOpenAI\",\"getModelEndpoint\":\"/deployments/{deploymentName}\",\"listModelsEndpoint\":\"/deployments\"}"
}

I’ll walk through each of these properties in as much detail as I’ve been able to glean from them with my testing.

The category property is self-explanatory. You either set to this to ApiManagement (if using APIM) or Model Gateway (if using a third-party AI Gateway like a Kong or LiteLLM).

The target property is the URI you want the agent to try to connect to. As an example, if I create an API on my APIM instance for the v1 OpenAPI named openai-v1 my target would look like “https://myapim.azure-api.net/openai-v1/v1”. As of the date of this blog post, you MUST use the azure-api-net FQDN for the APIM. If you try to do a custom domain you’ll get an error back telling you that it’s not supported. I got a request into the product group to lift this limitation. I’ll update this if that is done. For third-party model gateway, the URI would be similar.

The authType property is going to be either ApiKey or ProjectManagedIdentity for an APIM connection. ProjectManagedIdentity will authenticate to the upstream APIM using the agent’s project’s Entra ID managed identity. When using ProjectManagedIdentity you must also specify the audience property and set it to cognitive services.azure.com if connecting to a backend Foundry resource hosting models. Today, it doesn’t seem possible to pass the agent’s Entra ID agent identity that I’ve seen. For a model gateway connection this will either be ApiKey or OAuth. Details on the OAuth setup can be found in the samples GitHub (I haven’t mucked with it yet). If you’re using the authType of ApiKey you additional need to pass the credentials property which includes a property of key with the API key similar to what you see below.

authType: ApiKey
credentials = {
key = MYAPIKEY
}

I haven’t messed extensively with the isSharedToAll property as of yet. For my use case I set this to false so each project got its own connection object. You may be able to create this object at the account level and set the isSharedToAll property, but I haven’t tested that yet. If you have, def let me know if that works.

Ok, now on the property that can bring the most pain. Here we have the metadata property. This property is going to the main guts that makes this whole thing work. A few considerations, if doing this with Terraform or REST (can’t speak to Bicep or ARM), each of the properties I’m going to cover are CASE SENSITIVE. If you do the wrong casing, your connection object will not work. When connecting to an APIM or model gateway your can have Foundry either enumerate the models available (called dynamic discovery) or you can provide the exact models you want to expose (called static models).

Let’s first cover static models. Here is an example of me creating a connection to an APIM instance with static models using the authType or ProjectManagedIdentity. One thing to note is in my backend object in my APIM I’m appending /v1 to the backend path vs doing it in this connection object.

{
"id": "/subscriptions/X/resourceGroups/X/providers/Microsoft.CognitiveServices/accounts/X/projects/sampleproject1/connections/conn1apimgwstaticopenai-v1",
"name": "conn1apimgwstaticopenai-v1",
"properties": {
"audience": "https://cognitiveservices.azure.com",
"authType": "ProjectManagedIdentity",
"category": "ApiManagement",
"isSharedToAll": false,
"metadata": {
"deploymentInPath": "false",
"inferenceAPIVersion": null,
"models": "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
},
"target": "https://X.azure-api.net/openai-v1",
}

Since I’m using the v1 Azure OpenAI API, I don’t need to specify an inferenceAPIVersion. If I was using the classic API I’d need to specify the version (such as 2025-04-01-preview). Notice also I have set deploymentInPath to false. When set to true the connection will add the /deployments/deployment_name to the path. For the v1 API this isn’t required. Finally you got the models property. With a static model setup I list out the models I’m exposing to the connection. If you’re using Terraform, you MUST jsonencode the models property. If you don’t, it will not work. Static models is pretty helpful if you want to strictly control exactly what models the project is getting access to.

Let’s now switch over to dynamic discovery. Dynamic discovery requires you define a few additional operations inside of your API. The details can be found in this GitHub repo, but the basics of is you define an operation for a GET on a specific model and a LIST to find all the models available. These operations are management plane operations at the ARM API to retrieve deployment information. Here is an example of a setup with dynamic discovery using an APIM connection.

{
"id": "/subscriptions/X/resourceGroups/X/providers/Microsoft.CognitiveServices/accounts/X/projects/sampleproject1/connections/conn1apimgwdynamicopenai-v1",
"location": null,
"name": "conn1apimgwdynamicopenai-v1",
"properties": {
"audience": "https://cognitiveservices.azure.com",
"authType": "ProjectManagedIdentity",
"category": "ApiManagement",
"group": "AzureAI",
"isSharedToAll": false,
"metadata": {
"deploymentAPIVersion": "2024-10-01",
"deploymentInPath": "false",
"inferenceAPIVersion": null
},
"target": "https://X.azure-api.net/openai-v1",
},
"type": "Microsoft.CognitiveServices/accounts/projects/connections"
}

When doing the dynamic discovery, you’ll see the deploymentAPIVersion property set to the API version for the GET and LIST deployment operations of the ARM REST API. I added these operations into the API as after I imported the v1 OpenAI spec. You can see an example in Terraform I put together in my lab repo. Dynamic discovery is a great solution when you want to the developer to have access to any new deployments you may push to the Foundry resources.

I’m not going to run through the ModelGateway connection categories because they will largely emulate what you see above with some minor differences. The official Foundry samples GitHub repo has the gory details. I also have examples in Terraform available in my own repo (if you dare subject yourself to reading my code).

Ok, so now you understand the basics of setting up the connection and what you need to do on the APIM side. For more details on setting up APIM you can reference this official repo.

Summing It Up

Ok, so you now you understand the basic connection object, how to set it up, and how it works. I’m going to cut it here and continue in another post where I’ll dig into the dirty details of how it looks to use this because I don’t want to overload your brain (and mine) with a super long post.

Before I jet I will want to provide some critical resources:

  1. My AMAZING peer Piotr Karpala has put together a repository with examples of this pattern (and some 3rd-party integrations) with Bicep. The stuff in there is gold. He was also my late night buddy helping me work through the quirks of this integration late at night. Couldn’t have gotten it done without him (or at least would have broken many keyboards).
  2. The Product Group’s official samples and explanations of the setup are located here. I’d highly recommending referencing them because they will always have more up to date instructions than my blog.
  3. I’ve put together some Terraform samples for my own purposes which are you welcome to reference, loot for your own means, and laugh at my pathetic coding ability. Check out this one for the Foundry portion and this one for the APIM portion.

And here are your tips for this post:

  1. RTFM. Seriously, read the official documentation. Today, this integration is challenging to put in place. If you try to lone wolf it, let me know how many keyboards end up being thrown through your window.
  2. If you’re coding in Terraform or making REST calls to create these connections, remember CASE SENSITIVITY matters. If you do wrong case sensitivity, the resource will still create but it won’t work. You’ll get very frustrated trying to troubleshoot it.
  3. If you’re coding in Terraform don’t forget to use the jsonencode function on the models property. If you skip that, the resource will create but shit will not work.
  4. This is only supported for prompt agents today.
  5. Don’t forget this is public preview. So test it, but expect things to change and don’t throw this into production.

In the next post I’ll walk through how you can test the integration, some of the quirks and considerations for identity and authentication, and some of the neat APIM policy you can craft given some of the new information that is sent in the request.

See you next post!

Microsoft Foundry – The Evolution (Revisited)

Microsoft Foundry – The Evolution (Revisited)

Hi folks! In the past I did a series on the Azure OpenAI Service and Microsoft Foundry Hubs (FKA AI Foundry Hubs FKA AI Studio). Instead of going through and updating all those posts and losing the historical content and context (I don’t know about you, but I love have the historical context of a service) I’m instead going to preserve it as is and spin up a new series on the latest iteration of Microsoft Foundry. I’ll likely keep much of the general framework of the older series because it seemed to work. One additional piece I’ll be included in this series is some of the quirks of the service I’ve run into to potentially save you pain from having to troubleshoot it. For this first post, I’m going to start this off explaining how the service has involved. As always, my persona focus here is my fellow folks in the central IT and infrastructure space.

The history

Way back in 2023 the hype behind generative AI really started go insane. Microsoft managed to negotiate rights to host OpenAI’s models in Azure and introduced the Azure OpenAI Service. The demand across customers was insane where every business unit (BU) wanted it yesterday. Microsoft initially offered the service within the Cognitive Services framework under the Cognitive Services resource provider. This mean it inherited many of the controls native to Cognitive Services which included Private Endpoints, a limited set of outbound controls, support for API key and Entra ID authentication, and support for Azure RBAC for authorization. Getting the deployed was pretty straightforward with the hold-ups to deployment being more concerns about LLM security in general. Deployment typically looked like the architecture below.

Azure OpenAI Service

As folks started to build their AI applications, they tapped into other services under the Cognitive Services umbrella like Content Safety, Speech-to-Text, and the like. These services fit in nicely as they also fell under the Cognitive Services umbrella and had a similar architecture as the above, requiring deployment of the resource and the typical private endpoint and authentication/authorization (authN/authZ) configuration.

I like to think of this as stage 1 of the Microsoft’s AI offerings.

Microsoft then wanted to offer more models, including models they have built such and Phi and third-party models such as Mistral. This drove them to create a new resource called an AI Service resource. This resource fell under the Cognitive Services resource provider, and again inherited similar architectures as above. Beyond hosting third-party models, it also included and endpoint to consume OpenAI models and some of the pool of Cognitive Services. This is where we begin to see the collapse of Microsoft’s AI Services under a single top-level resource.

What about building AI apps though? This is where Foundry Hubs (FKA AI Studio) were introduced. The intent of Foundry Hubs were to be the one stop shop for developers to create their AI Apps. Here developers could experiment with LLMs using the playgrounds, build AI apps with Prompt Flow, build agents, or deploy 3rd party LLMs for Hugging Face. Foundry Hubs were a light overlay on top of the Azure Machine Learning (AML) service utilizing a new feature of AML built specifically for Foundry called AML Hubs. Foundry Hubs inherited a number of capabilities of AML such as its managed compute (to host 3rd party models and run prompt flows) and its managed virtual network (to host the managed compute).

Microsoft Foundry Hubs

While this worked, anyone who has built a secure AML deployment knows that shit ain’t easy. Getting the service working requires extensive knowledge of how its identity and networking configuration. This was a pain point for many customers in my experience. Many struggled to get it up and running due to the complexity.

Example of complexity of Microsoft Foundry IAM model

I think of the combination of AI Services and Microsoft Foundry Hubs as stage 2 of Microsoft’s journey.

Ok, shit was complicated, I ain’t gonna lie. Given this complexity and feedback from the customers, Microsoft got ambitious and decided to further consolidate and simplify. This introduced the concept of a new top-level resource called Microsoft Foundry Accounts. In public documentation and conversation this may be referred to as Foundry Projects or Foundry Resources. Since this is my blog I’m going to use my term which is Microsoft Foundry Accounts. With Microsoft Foundry accounts, Microsoft collapsed the AI Services and Foundry Hubs into a single top level resource. Not only did they consolidate these two resources, they also shifted Foundry Hubs from the Azure Machine Learning resource provider into the Cognitive Services resource provider. This move consolidated the Cognitive Services resource provider as the “AI” resource provider in my brain. It resulted in a new architecture which often looks something like the below.

Microsoft Foundry Accounts common architecture

This is what I like to refer to as stage 3, which is the current stage we are in with Microsoft’s AI offerings. We will continue to see this stage evolve which more features build and integrated into the Microsoft Foundry Account. I wouldn’t be surprised at all to see other services collapse into it as just another endpoint to a the singular resource.

Why do you care?

You might be asking, “Matt, why the hell do I care about this?” The reason you should care is because there are many customers who jumped into these products at different stages. I run across a ton of customers still playing in Foundry Hubs with only a vague understanding that Foundry Hubs are an earlier stage and they should begin transitioning to stage 3. This evolution is also helpful to understand because it gives an idea of the direction Microsoft is taking its generative AI services, which is key to how you should be planning you future of these services within Azure.

I’ll dive into far more detail in future posts about stage 3. I’ll share some of my learnings (and my many pains), some reference architectures that I’ve seen work, how I’ve seen customers successfully secure and scale usage of Foundry Accounts.

For now, I leave you with this evolution diagram I like to share with customers. For me, it really helps land the stages and the evolution, what is old and what is new, and what services I need to think about focusing on and which I should think about migrating off of.

Foundry evolution

Well folks, that wraps it up. Your takeaways today are:

  1. Assess which stage your implementation of generative AI is right now in Azure.
  2. Begin plans to migrate to stage 3 if you haven’t already. Know that there will be gaps in functionality with Foundry Hubs and Foundry Accounts. A good example is no more prompt flow. There are others, but many will eventually land in Foundry project.

See you next post!

Network Security Perimeters – NSPs for Troubleshooting

Network Security Perimeters – NSPs for Troubleshooting

This is part of my series on Network Security Perimeters:

  1. Network Security Perimeters – The Problem They Solve
  2. Network Security Perimeters – NSP Components
  3. Network Security Perimeters – NSPs in Action – Key Vault Example
  4. Network Security Perimeters – NSPs in Action – AI Workload Example
  5. Network Security Perimeters – NSPs for Troubleshooting

Hello folks! The past 3 months have been completely INSANE with customer work, building demos, experimentation and learning. It’s been awesome, but goddamn has it been grueling. I’m back to finally close out my series on Network Security Perimeters. In my past posts I’ve covered NSPs from an conceptional level, the components that make them up, and two separate examples. Today I’m going to cover my favorite use case for NSPs, and this is using them for troubleshooting.

The Setup

At a very high level most enterprises have an architecture similar what you see below. In this architecture network boundaries are setup around endpoints and services. These boundaries are erected to separate hardware and services based on the data stored in that environment, the security controls enforced in that environment, whether that environment has devices connected to the Internet, what type of trust level the humans and non-humans running in those environments have, and other similar variables. For the purposes of this post, we’re gonna keep it simple and stick to LAN (trusted) and DMZ (untrusted). DMZ is where devices connected to the Internet live and LAN is where devices restricted to the private network live.

Very basic network architecture

Environments like these typically restrict access to the Internet through a firewall or appliance/service that is performing a forward web proxy function. This allows enterprises to control what traffic leaves their private network through traditional layer 5-tuple means, deep packet inspection to inspect and control traffic at layer 7, and control access to specific websites based on the user or endpoint identity. Within the LAN there is typically a private DNS service that provides name resolution for internal domains. The DMZ has either separate dedicated DNS infrastructure for DNS caching and limited conditional fowarding or it utilizes some third-party hosted DNS service like a CloudFlare. The key takeaway from the DNS resolution piece is the machines within the LAN and DMZ use different DNS services and typically the DMZ is limited or completely incapable of resolving domains hosted on the internal DNS servers.

You’re likely thinking, “Cool Matt, thanks for the cloud 101. WTF is your point?” The place where this type of setup really bites customers is when consuming Private Endpoints. I can’t tell you how many times over the years I’ve been asked to help a customer struggling with Private Endpoints only to find out the problem is DNS related to this infrastructure. The solution is typically a simple proxy bypass, but getting to that resolution often takes hours of troubleshooting. I’m going to show you how NSPs can make troubleshooting this problem way easier.

The Problem

Before we dive into the NSP piece, let’s look at how the problem described above manifests. Take an organization that uses a high level architecture to integrate with Azure such as pictured below.

Same architecture as above but now with Azure connectivity

Here we have an organization that has connectivity with Azure configure through both an ExpressRoute with S2S VPN as fallback (I hate this fallback method, but that’s a blog for another day). In the ideal world, VM1 hits Service PaaS 1 on that Private Endpoint with the traffic being sent through the VPN or ExpressRoute connection. To do that, the DNS server in the LAN must properly resolve to the private IP address of the Private Endpoint deployed for Service PaaS 1 (check out my series on DNS if you’re unfamiliar with how that works). Let’s assume the enterprise has properly configured DNS so VM1 resolves to the correct IP address.

Now it’s Monday morning and you are the team that manages Azure. You get a call from a user complaining they can’t connect to the Private Endpoint for their Azure Storage account for blob access and gives you the typical “Azure is broken!”. You hop on a call with the user and ask the user to do some nslookups from their endpoint which return the correct private IP address. You even have the user run a curl against the FQDN and still you get back the correct IP address.

This is typically the scenario that at some point gets escalated beyond support and someone from the account team says, “Hey Matt, can you look at this?” I’ll hop on a call, get a lowdown of what the customer is doing, double-check what the customer checked, maybe take a glance at routing and Network Security Groups and then jump to what is almost always the problem in these scenarios. The next question out of my mouth is, “Do you have a proxy?”

So why does this matter? This question is important whenever we consider HTTP/HTTPS traffic because it is almost always sent through through an appliance or service that is performing the architectural function of a forward web proxy before it egresses to the Internet as I covered earlier. This could be a service hosted within the organization’s boundry or it could be a third-party service like a ZScaler. Where it’s hosted isn’t super important (but can play a role in DNS), the key thing to understand is the enterprise using one and is the application being used to access the Azure PaaS service using it.

When HTTP/HTTPS traffic is configured to be proxied, the connection from the endpoint is made to the proxy service. The proxy service examines the endpoint’s request, executes any controls configured, and initiates a connection to the outbound service. This last piece is what we care about because to make a connection, the proxy service needs to do its own DNS query which means it will use the DNS server configured in the proxy. This is typically the problem because as I covered earlier, this DNS service doesn’t typically have resolution to internal domains, which would include resolution to the privatelink domains used by Azure PaaS services.

DNS resolution when using proxy

When you did curl (without specifying proxy settings) or nslookup the machine was hitting the internal DNS service which does have the ability to resolve privatelink domains giving you the false sense that everything looks good from a DNS perspective. The resolution to this problem is to work with the proxy team to put in the appropriate proxy bypass so the endpoint will connect directly to the Azure PaaS (thus using its own DNS service).

All of this sounds simple, right? The reality is getting to the point of identifying the issue was the proxy tends to take hours, if not days, and tons of people from a wide array of teams across the enterprise. This means a lot of money spent diagnosing and resolving the issue.

What if there was an easier way? In comes NSPs.

NSPs to the rescue!

If you were a good Azure citizen, you would have wrapped this service in an NSP (assuming that service has been onboarded by Microsoft to NSPs) and turned on logging as I covered in my prior posts. If you had done that, you could have leveraged the NSP logs (the NSPAccessLogs table) to identify incoming traffic being blocked by the NSP. Below is an example of what the log entry looks like.

NSP Log Sample

In the above log entry I get detail as to the operation the user attempted to perform (in this instance listing the Keys in a Key Vault), the effect of the NSP (traffic is denied), and the category of traffic (Public). If we go back to the troubleshooting steps from earlier, I may have been able to identify this problem WAY earlier and with far less people involved even if the Azure resource platform logs obfuscated the full IP address or didn’t list it at all. I can’t stress the value this presents, especially having a standardized log format. The amount of hours my customers could have saved by enabling and using these logs makes me sad.

Even more uses!

Beyond troubleshooting the proxy problem, there are any scenarios where this comes in super handy. Another such example is PaaS to PaaS traffic. Often times the documentation around when a PaaS talks to another PaaS is not clear. It may not be obvious that one PaaS is trying to communicate with another over the Microsoft public backbone. This is another area NSPs can help because this traffic can also be logged and used for troubleshooting

Troubleshooting PaaS to PaaS

We’re not done yet! Some Azure compute services can integrate into a customer virtual network using a combination of Private Endpoints for inbound traffic and regional VNet integration for outbound access (traffic initiated by the PaaS and destined for customer endpoints or endpoints in the public IP space). A good example is Azure App Services or the new API Management v2 SKUs. Often times, I’ll work with customers who think they enabled this correctly but only actually enabled Private Endpoints and missed configuring regional Vnet integration causing outbound traffic to leave the Microsoft public backbone and hit the PaaS over public IPs or misconfigured DNS for the virtual network the PaaS service has been integrated with. NSP logs can help here as well.

NSPs helping to diagnose regional VNet integration issues

Summing it up

Here are the key takeways for you for this post:

  1. Enable NSPs wherever they are supported. If you’re not comfortable enforcing them, at least turn them on for the logging.
  2. Don’t forget NSPs capture both the inbound AND outbound traffic. You’d be amazed how many Azure PaaS services (service based) can make outbound network calls that you probably aren’t tracking or controlling.
  3. Like platform logs, NSP logs are not simply a security tool. Don’t lock them away from operations behind a SIEM. Make them available to both security and operations so everyone can benefit.
  4. Remember that enforcing NSPs WILL block the associated resources from delivering their logs to a Log Analytics Workspace, Storage Account, or Event Hub. If you’re using a centralized Log Analytics Workspace model, you’ll want to leave your NSPs in transition mode for now.

NSPs are more than just a tool to block and get visibility into incoming and outbound traffic for security purposes, but also an important tool in your toolbox to help with day-to-day operational headaches. If you’re not using NSPs for supported services today, you should be. There is absolutely zero reason not to do it, and your late night troubleshooting sessions will only consume 1 Mountain Dew vs 10!

That’s it for me. Off to snowblow!

Network Security Perimeters – NSPs in Action – AI Workload Example

Network Security Perimeters – NSPs in Action – AI Workload Example

This is part of my series on Network Security Perimeters:

  1. Network Security Perimeters – The Problem They Solve
  2. Network Security Perimeters – NSP Components
  3. Network Security Perimeters – NSPs in Action – Key Vault Example
  4. Network Security Perimeters – NSPs in Action – AI Workload Example
  5. Network Security Perimeters – NSPs for Troubleshooting

UPDATE 2/23/2026 – NSP support for Microsoft Foundry resources is generally available!

Hello again! Today I’ll be covering another NSP (Network Security Perimeters) use case, this time focused on AI (gotta drive traffic, am I right?). This will be the fourth entry in my NSP series. If you haven’t read at least the first and second post, you’ll want to do that before jumping into this one because, unlike my essays back in college, I won’t be padding the page count by repeating myself. Let’s get to it!

Use Case Background

Over the past year I’ve worked with peers helping a number of customers get a quick and simple RAG (retrieval augmented generation) workload into PoC (proof-of-concept). The goal of these PoCs were often to validate that the LLMs (large language models) could provide some level of business value when supplementing them with corporate data through a RAG-based pattern. Common use cases included things like building a chatbot for support staff which was supplemented with support’s KB (knowledge base) or chatbot for a company’s GRC (governance risk and compliance) team which was supplemented with corporate security policies and controls. You get the gist of it.

In the Azure realm this pattern is often accomplished using three core services. These services include the Azure OpenAI Service (now more typically AI Foundry), AI Search, and Azure Storage. In this pattern AI Search acts as the as the search index and optional vector database, Azure Storage stores the data in blob storage before it’s chunked and placed inside AI Search, and Azure OpenAI or AI Foundry hosts the LLM. Usage of this pattern requires the data be chunked (think chopped up into smaller parts before it’s stored as a record in a database while still maintaining the important context of the data). There are many options for chunking which are far beyond the scope of this post (and can be better explained by much smarter people), but in Azure there are three services (that I’m aware of anyway) that can help with chunking vs doing it manually. These include:

  1. Azure AI Document Intelligence’s layout model and chunking features
  2. Azure OpenAI / AI Foundry’s chat with your data
  3. Azure AI Search’s skillsets and built-in vectorization

Of these three options, the most simple (and point and click) options are options 2 and 3. Since many of these customers had limited Azure experience and very limited time, these options tended to serve for initial PoCs that then graduated to more complex chunking strategies such as the use of option 1.

The customer base that was asking for these PoCs fell into one or more of the these categories:

  1. Limited staff, resources, and time
  2. Limited Azure knowledge
  3. Limited Azure presence (no hybrid connectivity, no DNS infrastructure setup for support of Private Endpoints

All of these customers had minimum set of security requirements that included basic network security controls.

RAG prior to NSPs

While there are a few different ways to plumb these services together, these PoCs would typically have the services establish network flows as pictured below. There are variations to this pattern where the consumer may be going through some basic ChatBot app, but in many cases consumers would interact direct with the Azure OpenAI / AI Foundry Chat Playground (again, quick and dirty).

Network flows with minimalist RAG pattern

As you can see above, there is a lot of talk between the PaaS. Let’s tackle that before we get into human access. PaaS communication almost exclusively happens through the Microsoft public backbone (some services have special features as I’ll talk about in a minute). This means control of that inbound traffic is going to be done through the PaaS service firewall and trusted Azure service exception for Azure OpenAI / AI Foundry, AI Search, and Azure Storage (optionally using resource exception for storage). If you’re using the AI Search Standard or above SKU you get access to the Shared Private Access feature which allows you to inject a managed Private Endpoint (this is a Private Endpoint that gets provisioned into a Microsoft-managed virtual network allowing connectivity to a resource in your subscription) into a Microsoft-managed virtual network where AI Search compute runs giving it the ability to reach the resource using a Private Endpoint. While cool, this is more cost and complexity.

Outbound access controls are limited in this pattern. There are some data exfiltration controls that can be used for Azure OpenAI / AI Foundry which are inherited from the Cognitive Services framework which I describe in detail in this post. AI Search and Azure Storage don’t provide any native outbound network controls that I’m aware of. This lack of outbound network controls was a sore point for customers in these patterns.

For inbound network flows from human actors (or potentially non-human if there is an app between the consumer and the Azure OpenAI / AI Foundry service) you were limited to the service firewall’s IP whitelist feature. Typically, you would whitelist the IP addresses of forward web proxy in use by the company or another IP address where company traffic would egress to the Internet.

RAG design network controls prior to NSPs

Did this work? Yeah it did, but oh boy, it was never simple to approved by organizational security teams. While IP whitelisting is pretty straightforward to explain to a new-to-Azure customer, the same can’t be said for the trusted services exception, shared private access, and resource exceptions. The lack of outbound network controls for AI Search and Storage went over like a lead balloon every single time. Lastly, the lack of consistent log schema and sometimes subpar network-based logging (I’m looking at you AI Search) and complete lack of outbound network traffic logs made the conversations even more difficult.

Could NSPs make this easier? Most definitely!

RAG with NSPs

NSPs remove every single one of the pain points described above. With an NSP you get:

  1. One tool for controlling both inbound and outbound network controls (kinda)
  2. Standardized log schema for network flows
  3. Logging of outbound network calls

We go from the mess above to the much more simple design pictured below.

The design using NSPs

In this new design we create a Network Security Perimeter with a single profile. In this profile there is an access rule which allows customer egress IP addresses for human users or non-human (in case users interact with an app which interacts with LLM). Each resource is associated to that profile within the NSP which allows non-human traffic between PaaS services since it’s all within the same NSP. No additional rules are required which prevents the PaaS services from accepting or initiating any network flows outside of what the access rules and communication with each other within the NSP.

In this design you control your inbound IP access with a single access rule and you get a standard manner to manage outbound access. No more worries about whether the product group baked in an outbound network control, every service in the NSP gets one. Logging? Hell yeah we got your logging for both inbound and outbound in a standard schema.

Once it’s setup you get you can monitor both inbound and outbound network calls using the NSPAccessLogs. It’s a great way to understand under the hood how these patterns work because the NSP logs surface the source resource, destination resource, and the operation being performed as seen below.

NSP logs surfacing operations

One thing to note, at least in East US 2 where I did my testing, outbound calls that are actually allowed since all resources are within the NSP falsley record as hitting the DenyAll rule. Looking back at my notes, this has been an issue since back in March 2025 so maybe that’s just the way it records or the issue hasn’t yet been remediated.

The other thing to note is when I initially set this all up I got an error in both AI Foundry’s chunking/loading method and AI Search’s. The error complains that an additional header of xms_az_nwperimid was passed and the consuming app wouldn’t allow it. Oddly enough, a second attempt didn’t hit the same error. If you run into this error, try again and open a support ticket so whatever feature on the backend is throwing that error can be cleaned up.

Summing it up

So yeah… NSPs make PaaS to PaaS flows like this way easier for all customers. It especially makes implementing basic network security controls far more simple for customers new to Azure that may not have a mature platform landing zone sitting around.

Here are your takeaways for today:

  1. NSPs give you standard inbound/outbound network controls for PaaS and standardized log schema.
  2. NSPs are especially beneficial to new customers who need to execute quickly with basic network security controls.
  3. Take note as of the date of this blog Azure OpenAI Service support for NSPs in public preview. You will need to enable the preview flag on the subscription before you go mucking with it in a POC environment. Do not use it in production until it’s generally available. Instructions are in the link.
  4. I did basic testing for this post testing ingestion, searching, and submitting prompts that reference the extra data source property. Ensure you do your own more robust testing before you go counting on this working for every one of your scenarios.
  5. If you want to muck around with it yourself, you can use the code in this repo to deploy a similar lab as I’ve built above. Remember to enable the preview flag and wait a good day before attempting to deploy the code.

Well folks, that wraps up this post. In my final post on NSPs, I’ll cover a use case for NSPs to help assist with troubleshooting common connectivity issues.

Thanks!