Microsoft Foundry – APIM and Model Gateway Connections Part 2

Microsoft Foundry – APIM and Model Gateway Connections Part 2

Hello again! Today I’m going to continue my series on Microsoft Foundry’s new support for the BYO AI Gateway. In my past few posts I’ve walked through the evolution of Foundry and covered at a high level what an AI Gateway is and the problem this feature solves. In this post we’re gonna get down and dirty with the technical details on setting this up. Grab your coffee and put on your thinking music (for me that is some Blink and Third Eye Blind!).

Let’s get to it!

Current State Architecture

My customer base is primarily in the regulated industry so most of my customers are still at the experimentation state with the Foundry Agent Service. Given these customers have strict security requirements they are largely using the agent service with the standard agent configuration. In this configuration the outbound traffic (subsets of it, but that is a much larger conversation) can be tunneled through the customer virtual network for centralized logging, mediation, and facilitating access to private resources (again, with limitations today) through what the product group calls VNet injection but I’d say is more closely described as VNet integration via a delegated subnet. Threads (conversations in v2 agents) and agent metadata are stored in a Cosmos DB, vector stores created by an agent from tools such as the File Search tool are stored in AI Search, and files uploaded to the Foundry resource are stored in a Storage Account. These resources are all provisioned by the customer into the customer subscription and fully managed by the customer (RBAC, encryption, HA settings, etc). Private Endpoints for each resource are created within the customer’s virtual network and made accessible from the agent delegated subnet. The whole environment looks similar to what you see below.

Foundry Agent Service – Standard Agent Configuration

As I covered in my last post, in a generally available (aka fully supported and is recommended for production) agents can only consume models deployed to the Foundry account the agents exist in. This creates an issue for customers wanting to inject the governance, visibility, and operational improvements an AI Gateway can provide when it sits between the agent and the model. For now, customers are working around doing this using third-party agents. Downfall of that is these “external” agents live on compute customers have to manage and these agents can’t access many of the tools available to Foundry-native agents. This is the problem the BYO AI Gateway feature is attempting to fix.

No BYO Gateway vs BYO Gateway

Foundry resource architecture

Here is where the new connection type introduced in Foundry comes to the rescue. Before I dive into the details of that, I think it’s helpful to level set a bit on the resource hierarchy within Foundry. At the top is the top-level Azure resource referred to as the Foundry service which under the hood is a Cognitive Services account. The relevant resources for this discussion beneath that are projects, deployments, and connections. Projects are containers connections (at the management plane) for agents (at the data plane), deployments for models which are made available to all projects within the account, and connections which can also be created at the account level and shared across all projects.

Relevant resource hierarchy

For the purposes of this discussion, I’m going to focus on the connection objects. Connection objects can be created at the account level and project level. In the standard agent configuration, you’ll create a number of different connections out of the gates including connections to Cosmos, AI Search, and Azure Storage. Additional common connections could be to an App Insights instance for tracing or a Grounding With Bing Search resource. Connection objects will contain some type of pointer, like a URI and a credential. That credential is usually API Key, some Entra ID-based authentication mechanism, or general OAuth.

Connections are created at the account level when the Foundry account itself needs to access them. This could be for the usage of Content Understanding, to a Key Vault for storing connection secrets (API keys) in a customer subscription, an an App Insights instance used for tracing. From what I’ve observed, you will create connections at the account level if they need to be shared across all projects OR they’re used by the Foundry resource in general vs some type of project construct. These connections are also created at the project level. When you provision a standard agent for example, you’ll create connection objects to the Cosmos DB, Storage Account, and AI Search resources mentioned above. The new category of connections for this post will be created at the project level.

APIM and Model Gateway Connections

The BYO AI Gateway feature uses a new type of connection category of ApiManagement and ModelGateway. These objects are the glue that allow the Foundry agents to funnel requests for models through the AI Gateway the connections point to. When we’re connecting to an APIM instance, you should ideally use the ApiManagement category and when you’re connecting to a third-party category you’ll use the ModelGateway category.

As of the date of this blog post, these connection objects have the following schema (relevant properties to this discussion only):

name: The name of the connection (needs to be less than 60 characters in my testing)
properties: {
category: ApiManagement or ModelGateway
target: The URI you want the agent to connect to
authType: For ApiManagement this can be ApiKey or ProjectManagedIdentity
credentials: This will be populated with the value of the API key if using that authType
isSharedToAll: true or false if you want this shared across all projects
# ApiManagement category with static models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
# Models discussed in detail below
models: "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
}
# ApiManagement category with dynamic discovery
metadata: {
deploymentAPIVersion: ARM API version for CognitiveServices/accounts/deployments API calls
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
}
# ModelGateway category with static models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
# Models discussed in detail below
models: "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
}
# ModelGateway category with dynamic models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
deploymentAPIVersion: ARM API version for CognitiveServices/accounts/deployments API calls
modelDiscovery: "{\"deploymentProvider\":\"AzureOpenAI\",\"getModelEndpoint\":\"/deployments/{deploymentName}\",\"listModelsEndpoint\":\"/deployments\"}"
}

I’ll walk through each of these properties in as much detail as I’ve been able to glean from them with my testing.

The category property is self-explanatory. You either set to this to ApiManagement (if using APIM) or Model Gateway (if using a third-party AI Gateway like a Kong or LiteLLM).

The target property is the URI you want the agent to try to connect to. As an example, if I create an API on my APIM instance for the v1 OpenAPI named openai-v1 my target would look like “https://myapim.azure-api.net/openai-v1/v1”. As of the date of this blog post, you MUST use the azure-api-net FQDN for the APIM. If you try to do a custom domain you’ll get an error back telling you that it’s not supported. I got a request into the product group to lift this limitation. I’ll update this if that is done. For third-party model gateway, the URI would be similar.

The authType property is going to be either ApiKey or ProjectManagedIdentity for an APIM connection. ProjectManagedIdentity will authenticate to the upstream APIM using the agent’s project’s Entra ID managed identity. When using ProjectManagedIdentity you must also specify the audience property and set it to cognitive services.azure.com if connecting to a backend Foundry resource hosting models. Today, it doesn’t seem possible to pass the agent’s Entra ID agent identity that I’ve seen. For a model gateway connection this will either be ApiKey or OAuth. Details on the OAuth setup can be found in the samples GitHub (I haven’t mucked with it yet). If you’re using the authType of ApiKey you additional need to pass the credentials property which includes a property of key with the API key similar to what you see below.

authType: ApiKey
credentials = {
key = MYAPIKEY
}

I haven’t messed extensively with the isSharedToAll property as of yet. For my use case I set this to false so each project got its own connection object. You may be able to create this object at the account level and set the isSharedToAll property, but I haven’t tested that yet. If you have, def let me know if that works.

Ok, now on the property that can bring the most pain. Here we have the metadata property. This property is going to the main guts that makes this whole thing work. A few considerations, if doing this with Terraform or REST (can’t speak to Bicep or ARM), each of the properties I’m going to cover are CASE SENSITIVE. If you do the wrong casing, your connection object will not work. When connecting to an APIM or model gateway your can have Foundry either enumerate the models available (called dynamic discovery) or you can provide the exact models you want to expose (called static models).

Let’s first cover static models. Here is an example of me creating a connection to an APIM instance with static models using the authType or ProjectManagedIdentity. One thing to note is in my backend object in my APIM I’m appending /v1 to the backend path vs doing it in this connection object.

{
"id": "/subscriptions/X/resourceGroups/X/providers/Microsoft.CognitiveServices/accounts/X/projects/sampleproject1/connections/conn1apimgwstaticopenai-v1",
"name": "conn1apimgwstaticopenai-v1",
"properties": {
"audience": "https://cognitiveservices.azure.com",
"authType": "ProjectManagedIdentity",
"category": "ApiManagement",
"isSharedToAll": false,
"metadata": {
"deploymentInPath": "false",
"inferenceAPIVersion": null,
"models": "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
},
"target": "https://X.azure-api.net/openai-v1",
}

Since I’m using the v1 Azure OpenAI API, I don’t need to specify an inferenceAPIVersion. If I was using the classic API I’d need to specify the version (such as 2025-04-01-preview). Notice also I have set deploymentInPath to false. When set to true the connection will add the /deployments/deployment_name to the path. For the v1 API this isn’t required. Finally you got the models property. With a static model setup I list out the models I’m exposing to the connection. If you’re using Terraform, you MUST jsonencode the models property. If you don’t, it will not work. Static models is pretty helpful if you want to strictly control exactly what models the project is getting access to.

Let’s now switch over to dynamic discovery. Dynamic discovery requires you define a few additional operations inside of your API. The details can be found in this GitHub repo, but the basics of is you define an operation for a GET on a specific model and a LIST to find all the models available. These operations are management plane operations at the ARM API to retrieve deployment information. Here is an example of a setup with dynamic discovery using an APIM connection.

{
"id": "/subscriptions/X/resourceGroups/X/providers/Microsoft.CognitiveServices/accounts/X/projects/sampleproject1/connections/conn1apimgwdynamicopenai-v1",
"location": null,
"name": "conn1apimgwdynamicopenai-v1",
"properties": {
"audience": "https://cognitiveservices.azure.com",
"authType": "ProjectManagedIdentity",
"category": "ApiManagement",
"group": "AzureAI",
"isSharedToAll": false,
"metadata": {
"deploymentAPIVersion": "2024-10-01",
"deploymentInPath": "false",
"inferenceAPIVersion": null
},
"target": "https://X.azure-api.net/openai-v1",
},
"type": "Microsoft.CognitiveServices/accounts/projects/connections"
}

When doing the dynamic discovery, you’ll see the deploymentAPIVersion property set to the API version for the GET and LIST deployment operations of the ARM REST API. I added these operations into the API as after I imported the v1 OpenAI spec. You can see an example in Terraform I put together in my lab repo. Dynamic discovery is a great solution when you want to the developer to have access to any new deployments you may push to the Foundry resources.

I’m not going to run through the ModelGateway connection categories because they will largely emulate what you see above with some minor differences. The official Foundry samples GitHub repo has the gory details. I also have examples in Terraform available in my own repo (if you dare subject yourself to reading my code).

Ok, so now you understand the basics of setting up the connection and what you need to do on the APIM side. For more details on setting up APIM you can reference this official repo.

Summing It Up

Ok, so you now you understand the basic connection object, how to set it up, and how it works. I’m going to cut it here and continue in another post where I’ll dig into the dirty details of how it looks to use this because I don’t want to overload your brain (and mine) with a super long post.

Before I jet I will want to provide some critical resources:

  1. My AMAZING peer Piotr Karpala has put together a repository with examples of this pattern (and some 3rd-party integrations) with Bicep. The stuff in there is gold. He was also my late night buddy helping me work through the quirks of this integration late at night. Couldn’t have gotten it done without him (or at least would have broken many keyboards).
  2. The Product Group’s official samples and explanations of the setup are located here. I’d highly recommending referencing them because they will always have more up to date instructions than my blog.
  3. I’ve put together some Terraform samples for my own purposes which are you welcome to reference, loot for your own means, and laugh at my pathetic coding ability. Check out this one for the Foundry portion and this one for the APIM portion.

And here are your tips for this post:

  1. RTFM. Seriously, read the official documentation. Today, this integration is challenging to put in place. If you try to lone wolf it, let me know how many keyboards end up being thrown through your window.
  2. If you’re coding in Terraform or making REST calls to create these connections, remember CASE SENSITIVITY matters. If you do wrong case sensitivity, the resource will still create but it won’t work. You’ll get very frustrated trying to troubleshoot it.
  3. If you’re coding in Terraform don’t forget to use the jsonencode function on the models property. If you skip that, the resource will create but shit will not work.
  4. This is only supported for prompt agents today.
  5. Don’t forget this is public preview. So test it, but expect things to change and don’t throw this into production.

In the next post I’ll walk through how you can test the integration, some of the quirks and considerations for identity and authentication, and some of the neat APIM policy you can craft given some of the new information that is sent in the request.

See you next post!

Network Security Perimeters – NSPs in Action – AI Workload Example

Network Security Perimeters – NSPs in Action – AI Workload Example

This is part of my series on Network Security Perimeters:

  1. Network Security Perimeters – The Problem They Solve
  2. Network Security Perimeters – NSP Components
  3. Network Security Perimeters – NSPs in Action – Key Vault Example
  4. Network Security Perimeters – NSPs in Action – AI Workload Example
  5. Network Security Perimeters – NSPs for Troubleshooting

UPDATE 2/23/2026 – NSP support for Microsoft Foundry resources is generally available!

Hello again! Today I’ll be covering another NSP (Network Security Perimeters) use case, this time focused on AI (gotta drive traffic, am I right?). This will be the fourth entry in my NSP series. If you haven’t read at least the first and second post, you’ll want to do that before jumping into this one because, unlike my essays back in college, I won’t be padding the page count by repeating myself. Let’s get to it!

Use Case Background

Over the past year I’ve worked with peers helping a number of customers get a quick and simple RAG (retrieval augmented generation) workload into PoC (proof-of-concept). The goal of these PoCs were often to validate that the LLMs (large language models) could provide some level of business value when supplementing them with corporate data through a RAG-based pattern. Common use cases included things like building a chatbot for support staff which was supplemented with support’s KB (knowledge base) or chatbot for a company’s GRC (governance risk and compliance) team which was supplemented with corporate security policies and controls. You get the gist of it.

In the Azure realm this pattern is often accomplished using three core services. These services include the Azure OpenAI Service (now more typically AI Foundry), AI Search, and Azure Storage. In this pattern AI Search acts as the as the search index and optional vector database, Azure Storage stores the data in blob storage before it’s chunked and placed inside AI Search, and Azure OpenAI or AI Foundry hosts the LLM. Usage of this pattern requires the data be chunked (think chopped up into smaller parts before it’s stored as a record in a database while still maintaining the important context of the data). There are many options for chunking which are far beyond the scope of this post (and can be better explained by much smarter people), but in Azure there are three services (that I’m aware of anyway) that can help with chunking vs doing it manually. These include:

  1. Azure AI Document Intelligence’s layout model and chunking features
  2. Azure OpenAI / AI Foundry’s chat with your data
  3. Azure AI Search’s skillsets and built-in vectorization

Of these three options, the most simple (and point and click) options are options 2 and 3. Since many of these customers had limited Azure experience and very limited time, these options tended to serve for initial PoCs that then graduated to more complex chunking strategies such as the use of option 1.

The customer base that was asking for these PoCs fell into one or more of the these categories:

  1. Limited staff, resources, and time
  2. Limited Azure knowledge
  3. Limited Azure presence (no hybrid connectivity, no DNS infrastructure setup for support of Private Endpoints

All of these customers had minimum set of security requirements that included basic network security controls.

RAG prior to NSPs

While there are a few different ways to plumb these services together, these PoCs would typically have the services establish network flows as pictured below. There are variations to this pattern where the consumer may be going through some basic ChatBot app, but in many cases consumers would interact direct with the Azure OpenAI / AI Foundry Chat Playground (again, quick and dirty).

Network flows with minimalist RAG pattern

As you can see above, there is a lot of talk between the PaaS. Let’s tackle that before we get into human access. PaaS communication almost exclusively happens through the Microsoft public backbone (some services have special features as I’ll talk about in a minute). This means control of that inbound traffic is going to be done through the PaaS service firewall and trusted Azure service exception for Azure OpenAI / AI Foundry, AI Search, and Azure Storage (optionally using resource exception for storage). If you’re using the AI Search Standard or above SKU you get access to the Shared Private Access feature which allows you to inject a managed Private Endpoint (this is a Private Endpoint that gets provisioned into a Microsoft-managed virtual network allowing connectivity to a resource in your subscription) into a Microsoft-managed virtual network where AI Search compute runs giving it the ability to reach the resource using a Private Endpoint. While cool, this is more cost and complexity.

Outbound access controls are limited in this pattern. There are some data exfiltration controls that can be used for Azure OpenAI / AI Foundry which are inherited from the Cognitive Services framework which I describe in detail in this post. AI Search and Azure Storage don’t provide any native outbound network controls that I’m aware of. This lack of outbound network controls was a sore point for customers in these patterns.

For inbound network flows from human actors (or potentially non-human if there is an app between the consumer and the Azure OpenAI / AI Foundry service) you were limited to the service firewall’s IP whitelist feature. Typically, you would whitelist the IP addresses of forward web proxy in use by the company or another IP address where company traffic would egress to the Internet.

RAG design network controls prior to NSPs

Did this work? Yeah it did, but oh boy, it was never simple to approved by organizational security teams. While IP whitelisting is pretty straightforward to explain to a new-to-Azure customer, the same can’t be said for the trusted services exception, shared private access, and resource exceptions. The lack of outbound network controls for AI Search and Storage went over like a lead balloon every single time. Lastly, the lack of consistent log schema and sometimes subpar network-based logging (I’m looking at you AI Search) and complete lack of outbound network traffic logs made the conversations even more difficult.

Could NSPs make this easier? Most definitely!

RAG with NSPs

NSPs remove every single one of the pain points described above. With an NSP you get:

  1. One tool for controlling both inbound and outbound network controls (kinda)
  2. Standardized log schema for network flows
  3. Logging of outbound network calls

We go from the mess above to the much more simple design pictured below.

The design using NSPs

In this new design we create a Network Security Perimeter with a single profile. In this profile there is an access rule which allows customer egress IP addresses for human users or non-human (in case users interact with an app which interacts with LLM). Each resource is associated to that profile within the NSP which allows non-human traffic between PaaS services since it’s all within the same NSP. No additional rules are required which prevents the PaaS services from accepting or initiating any network flows outside of what the access rules and communication with each other within the NSP.

In this design you control your inbound IP access with a single access rule and you get a standard manner to manage outbound access. No more worries about whether the product group baked in an outbound network control, every service in the NSP gets one. Logging? Hell yeah we got your logging for both inbound and outbound in a standard schema.

Once it’s setup you get you can monitor both inbound and outbound network calls using the NSPAccessLogs. It’s a great way to understand under the hood how these patterns work because the NSP logs surface the source resource, destination resource, and the operation being performed as seen below.

NSP logs surfacing operations

One thing to note, at least in East US 2 where I did my testing, outbound calls that are actually allowed since all resources are within the NSP falsley record as hitting the DenyAll rule. Looking back at my notes, this has been an issue since back in March 2025 so maybe that’s just the way it records or the issue hasn’t yet been remediated.

The other thing to note is when I initially set this all up I got an error in both AI Foundry’s chunking/loading method and AI Search’s. The error complains that an additional header of xms_az_nwperimid was passed and the consuming app wouldn’t allow it. Oddly enough, a second attempt didn’t hit the same error. If you run into this error, try again and open a support ticket so whatever feature on the backend is throwing that error can be cleaned up.

Summing it up

So yeah… NSPs make PaaS to PaaS flows like this way easier for all customers. It especially makes implementing basic network security controls far more simple for customers new to Azure that may not have a mature platform landing zone sitting around.

Here are your takeaways for today:

  1. NSPs give you standard inbound/outbound network controls for PaaS and standardized log schema.
  2. NSPs are especially beneficial to new customers who need to execute quickly with basic network security controls.
  3. Take note as of the date of this blog Azure OpenAI Service support for NSPs in public preview. You will need to enable the preview flag on the subscription before you go mucking with it in a POC environment. Do not use it in production until it’s generally available. Instructions are in the link.
  4. I did basic testing for this post testing ingestion, searching, and submitting prompts that reference the extra data source property. Ensure you do your own more robust testing before you go counting on this working for every one of your scenarios.
  5. If you want to muck around with it yourself, you can use the code in this repo to deploy a similar lab as I’ve built above. Remember to enable the preview flag and wait a good day before attempting to deploy the code.

Well folks, that wraps up this post. In my final post on NSPs, I’ll cover a use case for NSPs to help assist with troubleshooting common connectivity issues.

Thanks!

Azure Authorization – Azure ABAC (Attribute-based Access Control)

This is part of my series on Azure Authorization.

  1. Azure Authorization – The Basics
  2. Azure Authorization – Azure RBAC Basics
  3. Azure Authorization – actions and notActions
  4. Azure Authorization – Resource Locks and Azure Policy denyActions
  5. Azure Authorization – Azure RBAC Delegation
  6. Azure Authorization – Azure ABAC (Attribute-based Access Control)

Welcome back fellow geeks.

I do a lot of learning and educational sessions with my customer base. The volume pretty much demands reusable content which means I gotta build decks and code samples… and worse maintain them. The maintenance piece typically consists of me mentally promising myself to update the content and kicking the can down the road for a few months. Eventually, I get around to updating the content.

This month I was doing some updates to my content around Azure Authorization and decided to spend a bit more time with Azure ABAC (Attribute-based access control). For those of you unfamiliar with Azure ABAC, well it’s no surprise because the use cases are so very limited as of today. Limited as the use cases are, it’s a worthwhile functionality to understand because Microsoft does use it in its products and you may have use cases where it makes sense.

The Dream of ABAC

Let’s first touch briefly on the differences between (RBAC) role-based access control and (ABAC) attribute-based access control. Attribute-based access control has been the dream for the security industry for as long as I can remember. RBAC has been the predominant authorization mechanism in a majority of applications over the years. The challenge with RBAC is it has typically translated to basic group membership where an application authorizes a user solely on whether or not the user is in a group. Access to these groups would typically come through some type of request for membership and implementation by a central governance team. Those processes have tended to be not super user friendly and the access has tended to be very course-grained.

ABAC meanwhile promised more fine-grained access based upon attributes of the security principal, resource, or whatever your mind can dream up. Sounds awesome right? Well it is, but it largely remained a dream in the mainstream world with a few attempts such as Windows Dynamic Access Control (Before you comment, yeah I get you may have had some cool apps doing this stuff years ago and that is awesome, but let’s stick with the majority). This began to change when cloud came around with the introduction of more modern protocols and standards such as SAML, OIDC, and OAuth. These protocols provide more flexibility with how the identity provider packages attributes about the user in the token delivered to the service provider/resource provider/what have you.

When it came to the Azure cloud, Microsoft went the traditional RBAC path for much of the platform. User or group gets placed in Azure RBAC role and user(s) gets access. I explain Azure RBAC in my other posts on RBAC. There is a bit of flexibility on the Entra ID side for the initial access token via Entra ID Conditional Access, but RBAC in the Azure realm. This was the story for many years of Azure.

In 2021 Microsoft decided something more flexible was needed and introduced Azure ABAC. The world rejoiced… right? Nah, not really. While the introduction of ABAC was awesome, its scope of use was and still is extremely limited. As of the date of this blog, ABAC is only usable for Azure Storage blob and queue operations. All is not lost though, there are some great use cases for this feature so it’s important to understand how it works.

How does ABAC work?

Alright, history lesson and complaining about limited scope aside, let’s now explore how the feature works.

ABAC is facilitated through an additional property on Azure RBAC Role Assignment resources. I’m going to assume you understand the ins and out of role assignments. If you don’t, check out my prior post on the topic. In its most simple sense, an Azure RBAC role assignment is the association of a role to a security principal granting that principal the permissions defined in the role over a particular scope of resources. As I’ve covered previously, role assignments are an Azure resource that have defined sets of properties. The properties we care about for the scope of this discussion are the conditionVersion and condition properties. The conditionVersion property will always have a value of 2.0 for now. The condition property is where we work our ABAC magic.

The condition property is made up of a series of conditions which each consist of an action and one or more expressions. The logic for conditions is kinda weird, so I’m walk you through it using some of the examples from documentation as well as complex condition I throw together. First, let’s look at the general structure.

Structure of conditions used in ABAC

In the above image you can see the basic building blocks of a condition. Looks super confusing and complicated right? I know it did to me at first. Thankfully, the kind souls who write the public documentation broke this down in a more friendly programming-like way.

Far more simple explanation of conditions

In each condition property we first have the action line where the logic looks to see if the action being performed by the security principal doesn’t (note the exclamation point which negates whats in the parentheses) match the action we’re applying the conditions to. You’ll commonly see a line like:

!(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})

This line is saying if the action isn’t blobs/read (which would be data plane call to read the contents of the blob) then the line should evaluate to true. If it evaluates to true, then the access is allowed and the expressions are not evaluated any further.

After this line we have the expression which is only evaluated when the first line evaluates to false (which in the example I just covered would mean the security principal is trying to read the content of a blob). The expressions support four categories of what Microsoft refers to as condition features. There are currently four features in various states of GA (general availability) and preview (refer to the documentation for those details). These four categories include:

  • Requests
  • Environment
  • Resource
  • Principal (security principal)

These four categories give you a ton of flexibility. Requests covers the details of the request to storage, for example such as limiting a user to specific blob prefixes based on the prefix within the request. Environment can be used to limit the user to accessing the resource from a specific Private Link Private Endpoint over Private Link in general (think defense-in-depth here). The resource feature exposes properties of the resource being accessed, which I find the most flexible thing to be blob index tags. Lastly, we have security principal and this is where you can muck around with custom security attributes in Entra ID (very cool feature if you haven’t touched it).

In a given condition we can have multiple expressions and within the condition property we can string together multiple conditions with AND and OR logic. I’m a big believer in going big or going home, so let’s take a look at a complex condition.

Diving into the Deep End

Let’s say I have a whole bunch of data I need to make available via a blobs in an Azure Storage Account. I have a strict requirement to use a single storage account and the blobs I’m going to store have different data classifications denoted by a blob index tag key named access_level. Blobs without this key are accessible by everyone while blobs classified high, medium, or low are only accessible by users with approval for the same or higher access levels (example: user with high access level can access high, medium, low, and data with no access level). Lastly, I have a requirement that data at the high access level can only be accessed during business hours.

I use a custom security attribute in Entra ID called accesslevel under an attribute set named organization to denote a user’s approved access level.

Here is how that policy would break down.

My first condition is built to allow users to read any blobs that don’t have the access_level tag.

# Condition that allows users within scope of the assignment access to documents that do not have an access level tag
(
  (
    # If the action being performed doesn't match blobs/read then result in true and allow access
    !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})
  )
  OR 
  (
    # If the blob doesn't have a blob index tag with a key of access_level then allow access
    NOT @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags&$keys$&] ForAnyOfAnyValues:StringEquals {'access_level'}
  )
)

If the blob does have an access tag, I want to start incorporating my logic. The next condition I include allows users with the accesslevel security attribute set to high to read blobs with a blob index tag of access_level equal to low or medium. I also also allow them to read blobs tagged with high if it’s between 9AM and 5PM EST.

# Condition that allows users within scope of the assignment to access medium and low tagged data if they have a custom 
# security attribute of accesslevel set to high. High data can also be read within working hours
OR
(
 (
   # If the action being performed doesn't match blobs/read then result in true and allow access
   !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})
 )
 OR 
 (
   # If the blob has an index tag of access_level with a value of medium or low allow the user access if they have a custom security
   # attribute of organization_accesslevel set to high
   @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags:access_level<$key_case_sensitive$>] ForAnyOfAnyValues:StringEquals {'medium', 'low'}
   AND
   @Principal[Microsoft.Directory/CustomSecurityAttributes/Id:organization_accesslevel] StringEquals 'high'
 )
 OR
 (
   # If the blob has an index tag of access_level with a value of high allow the user access if they have a custom security
   # attribute of organization_accesslevel set to high and it's within working hours
   @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags:access_level<$key_case_sensitive$>] ForAnyOfAnyValues:StringEquals {'high'}
   AND
   @Principal[Microsoft.Directory/CustomSecurityAttributes/Id:organization_accesslevel] StringEquals 'high'
   AND
   @Environment[UtcNow] DateTimeGreaterThan '2025-06-09T12:00:00.0Z'
   AND
   @Environment[UtcNow] DateTimeLessThan '2045-06-09T21:00:00.0Z'
 )
)

Next up is users with medium access level. These users are granted access to data tagged medium or low.

# Condition that allows users within scope of the assignment to access medium and low tagged data if they have a custom 
# security attribute of accesslevel set to medium
OR
(
  (
    # If the action being performed doesn't match blobs/read then result in true and allow access
    !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})
  )
  OR 
  (
    # If the blob has an index tag of access_level with a value of medium or low allow the user access if they have a custom security
    # attribute of organization_accesslevel set to medium
    @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags:access_level<$key_case_sensitive$>] ForAnyOfAnyValues:StringEquals {'medium', 'low'}
    AND
    @Principal[Microsoft.Directory/CustomSecurityAttributes/Id:organization_accesslevel] StringEquals 'medium'
 )
)

Finally, I allow users with low access level to access data tagged as low.

# Condition that allows users within scope of the assignment to access low tagged data if they have a custom 
# security attribute of accesslevel set to low
OR
(
 (
   # If the action being performed doesn't match blobs/read then result in true and allow access
   !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})
 )
 OR 
 (
   # If the blob has an index tag of access_level with a value of low allow the user access if they have a custom security
   # attribute of organization_accesslevel set to low
   @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags:access_level<$key_case_sensitive$>] ForAnyOfAnyValues:StringEquals {'low'}
   AND
   @Principal[Microsoft.Directory/CustomSecurityAttributes/Id:organization_accesslevel] StringEquals 'low'
 )
)

Notice how I separated each condition using OR. If the first condition resolves to false, then the next condition is evaluated until all access is granted or all conditions are exhausted. Neat right?

Summing it up

So why should you care about this if its use case is so limited? Well, you should care because that is ABAC’s use case today, and it would be expanded in the future. Furthermore, ABAC allows you to be more granular in how you grant access to data in Azure Storage (again, blob or queue only). You likely have use cases where this can provide another layer of security to further constrain a security principal’s access. You’ll also see these conditions used in Microsoft’s products such as AI Foundry.

The other reason it’s helpful to understand this language used for the condition, is conditions are expanding into other services such as Azure RBAC Delegation (which if you aren’t using you should be). While the language can be complex, it does make sense if you muck around with it a bit.

A final bit of guidance here, don’t try to write conditions by hand. Use the visual builder in the Azure Portal as seen below. It will help you get some basic conditions in place that you can further modify directly via the code view.

Azure Portal Condition Builder

Next time you’re locking down an Azure storage account, think about whether or not you can further restrict humans and non-humans alike based on the attributes discussed today. The main places I’ve seen this used are for user profiles, further restricting user access to specific subsets of data (similar to the one I walked through above), or even adding an additional layer of network security baked directly into the role assignment itself.

See you next post!

Generative AI in Azure for the Generalist – Prompt and Response Logging with API Management

Hello folks!

The rate of change in tech is the most crazy I’ve experienced in my career. What you knew yesterday is quickly replaced with major changes a week or two later. The generative AI space is one of those areas that seems to change on a daily basis, and with these changes comes updated and new patterns and products. Given some major changes over the past few months, I’ve decided to kick off a new blog series that will cover generative AI in Azure for the generalist. The focus will be on folks like myself that sit squarely in the generalist vertical. In this series I’ll cover new topics as well as revisiting topics I’ve covered in the past and how they have changed.

In spirit of that latter point, tonight I’ll be covering an AWESOME new feature in Azure API Management (APIM).

The Background

I’ve talked pretty extensively about APIM’s role in the generative AI space where it provides the features and functionality of the architectural component of a Generative AI Gateway (GenAI Gateway). So what is a GenAI Gateway? Well, you see, someone at Forrester/Gartner needed to create a new phrase that vendors could adopt and sell existing products under, they had a pitch meeting, and yadda yadda yadda. But seriously, in its most simple sense a GenAI Gateway is essentially an API Gateway with additional functionality and features specific to the challenges of doing Generative AI at scale. These challenges can include fine-grained authorization, rate limiting, usage tracking, load balancing, caching, additional logging and monitoring and more.

Common GenAI Gateway functionality

Cloud providers jumped at the chance to add this functionality to their existing native API Gateway products. Microsoft began integrating this functionality into APIM first with load balancing, then with throttling based upon token usage and token tracking for charge backs and sharing model quota across an enterprise, and semantic caching for cost reduction and improved response times. One of the areas that was somewhat of a gap was prompt and response logging.

Back in 2023 I wrote an article about the challenges of prompt and response logging when using a generative AI gateway pattern, and specifically some of the challenges around when APIM was used as the gateway. The history of how folks tried to tackle the issue is pretty interesting context to understand how we ended up where we were.

Before I jump into that history, it’s worth understanding why you should care about prompt and response logging. Those cares are typically grouped in two buckets:

  1. Operational
  2. Security

In the operational bucket we care about these things because they provide great insight into how our users are using these tools to identify commonly asked questions. For example, if we see a question pop up a lot, maybe it’s something we need to add to a user-facing FAQ. Or perhaps we build a workflow into our app that checks commonly asked questions and provides an answer before we call an LLM in order to save some costs and time. There are many creative uses to having these things saved and available.

In the security bucket we care because we want to ensure the LLMs are used responsibly. We don’t want people abusing the LLMs and getting instructions on how to malicious things and we also want to monitor them to ensure we don’t see odd behavior that might be indicative of an attacker who may have compromised a chat bot. Lastly, we capture this because it’s only a matter of a time before some government somewhere in the world pushes legislation that requires us to. It’s coming folks.

Now let’s talk the history of how folks tried to solve this problem.

First, we tried logging requests and responses to Application Insights using the built-in integration with APIM. This worked great until the max tokens for prompts grew too large such that requests and responses started getting truncated. Next, we tried using APIM’s integration with Event Hub (logger) in combination with complex custom APIM policy to parse the request and response, extract the prompt and completion, and deliver to an Event Hub for it to get picked up by some type of automated function and stored in some type of backend data store like a CosmosDB. This worked for a short time where folks were largely experimenting with how the LLMs (large language models) worked with their data but started to fall apart when these LLMs were baked into a chat bot handed out to users (they were also a nightmare to maintain due to frequent API changes to the structure of requests and responses). The reason for this is chat bots demand streaming based completions which deliver the tokens as they generated (which seems more human like) vs the user waiting for the entire completion to be generated. APIM would end up buffering the response and breaking the user experience. To solve this problem, folks were introducing custom code to do the parsing outside of APIM (such as this creative solution by my peer Shaun Callighan). Writing custom code, running it somewhere, and integrating it into APIM was a tough pill to swallow. Most of my customer base either accepted prompt and response logging would be dependent on the developer baking it into their application or they would simply accept not getting that information for the time being.

What’s New

Kind of a shitty situation to be in, right? Well, I have good news for you. Last week the APIM Product Group (PG) released a stellar new feature to support prompt and response logging (both streaming and non-streaming) with a few clicks of the mouse (or slight modifications of code). This morning I had a chance to muck around with it and I wanted to get out this quick article to share with folks the basics of setting this up and provide a bit of detail into how it works (I’ll be updating post this as I experiment more).

The feature is now available directly as an additional log emitted by the APIM instance via the diagnostic settings. This means you can stream these logs to a Log Analytics Workspace, Azure Storage Account, or on to an Event Hub where you send it to any place your heart desires.

Setup

Setting this feature up is pretty cake and requires only a few steps to get it done.

First up you’ll need to enable the additional log in diagnostic settings as seen below.

New diagnostic setting

Once the new diagnostic setting is enabled, you then need to enable it for your API that represents your instance of the an Azure OpenAI resource(s) or Azure AI Foundry (FKA Azure AI Service) instance hosting your LLMs.

Enabling feature in API

Once the feature is enabled in both places, the events should begin to get captured in around 15 or so minutes. I chose to send mine to a Log Analytics Workspace and had a new table named ApiManagementGatewayLlmLogs appear (took about 15 minutes to finally appear) which contains events related to my operations against the LLMs. Each log entry represents a 32KB chunk of the request and response for up to 2MB. The SequenceNumber field is used to denote the order of the chunks as seen in the image below with the CorrelationId field requesting the unique identifier for each request and response.

Expanding an event gives you the ability to review the prompt and response in full detail. This particular request spanned three separate events (sequence 0-2) with the first sequence (0) containing the prompt, completion, and total tokens and the second sequence (1) including the prompt and last sequence (2) containing the model’s response.

Example prompt and response logging event

I tested with both a multi-modal model (gpt-4o) and a reasoning model (o1) and both sets of events were captured. I haven’t seen an authoritative list for which models are supported, but when I do I’ll update this post with a link.

I’m also waiting to hear back from the PG as to how APIM determines it’s a call to an LLM. My guess is by operation name, but waiting on that response as well. I haven’t tested other operations such as creating embeddings yet, so if you do, feel free to reply to this post with your findings. If I’m able to get a full list of operations supported by this logging, I’ll update the post.

Wrapping It Up

That about sums up this quick post. My main goal here was to publicize this new feature because it’s a real game changer for APIM and addresses a major pain point of Generative AI Gateways in general. It’s been really cool to see this from the beginning, and I’m not sure about other folks, but I love understanding the journey a technology takes, the new problems that pop up, and the solutions that solve those problems. It really helps give context to why the solution looks the way it does.

See you next post!