AI Foundry – Identity, Authentication and Authorization

Posted on January 27, 2025 by mattfeltonma

This is a part of my series on AI Foundry:

Updates:

3/17/2025 – Updated diagrams to include new identities and RBAC roles that are recommended as a minimum

Yes, I’m going to re-use the outline from my Azure OpenAI series. You wanna fight about it? This means we’re going to now talk about one of the most important (as per usual) and complicated (oh so complicated) topic in AI Foundry: identity, authentication, and authorization. If you haven’t read my prior two posts, you should take a few minutes and read through them. They’ll give you the baseline you’ll need to get the most out of this post. So put on your coffee, break out the metal clips to keep your eyes open Clockwork Orange-style, and prepare for a dip into the many ways identity, authN, and authZ are handled within the service.

As I covered in my first post Foundry is made up of a ton of different services. Each of these services plays a part in features within Foundry, some may support multiple forms of authentication, and most will be accessed by the many types of identities used within the product. Understanding how each identity is used will be critical in getting authorization right. Missing Azure RBAC role assignments is the number one most common misconfiguration (right above networking, which is also complicated as we’ll see in a future post).

Let’s start first with identity. There will generally be four types of identities used in AI Foundry. These identities will be a combination of human identities and non-human identities. Your humans will be your AI Engineers, developers, and central IT and will use their Entra ID user identities. Your non-humans will include the AI Foundry hub, project, and compute you provision for different purposes. In general, identities are used in the following way (this is not inclusive of all things, just the ones I’ve noticed):

Humans
- Entra ID Users
  - Actions within Azure Portal
  - Actions within AI Foundry Studio
    - Running a prompt flow from the GUI
    - Using the Chat Playground to send prompts to an LLM
    - Running the Chat-With-Your-Data workflow within the Chat Playground
    - Creating a new project within a hub
  - Actions using Azure CLI such as sending an inference to a managed online endpoint that supports Entra ID authentication
Non-Humans
- AI Foundry Hub Managed Identity
  - Accessing the Azure Key Vault associated with the Foundry instance to create secrets or pull secrets when AI Foundry connections are created using credentials versus Entra ID
  - Modify properties of the default Azure Storage Account such as setting CORS policies
  - Creating managed private endpoints for hub resources if a managed virtual network is used
- AI Foundry Project Managed Identity
  - Accessing the Azure Key Vault associated with the Foundry instance to create secrets or pull secrets when AI Foundry connections are created using credentials versus Entra ID
  - Creating blob containers for project where project artifacts such as logs and metrics are stored
  - Creating file share for project where project artifacts such as user-created Prompt Flow files are stored
- Compute
  - Pulling container image from Azure Container Registry when deploying prompt flows that require custom environments
  - Accessing default storage account project blob container to pull data needed to boot
  - Much much more in this category. Really depends on what you’re doing

Alright, so you understand the identities that will be used and you have a general idea of how they’ll be used to perform different tasks within the Foundry ecosystem. Let’s now talk authentication.

Authentication in Foundry isn’t too complicated (in comparison to identity and authorization). Authenticating to the Azure Portal and the Foundry Studio is always going to be Entra ID-based authentication. Authentication to other Azure resources from the Foundry is where it can get interesting. As I covered in my prior post, Foundry will typically support two methods of authentication: Entra ID and API key based (or credentials as you’ll see it often referred to as in Foundry). If at all possible, you’ll want to lean into Entra ID-based authentication whenever you access a resource from Foundry. As we’ll see in the next section around authorization, this will have benefits. Besides authorization, you’ll also get auditability because the logs will show the actual security principal that accessed the resource.

If you opt to use credential-based authentication for your connections to Azure resources, you’ll lose out in a few different areas. When credential-based authentication is used, users will access connected resources within Foundry using the keys stored in the Foundry connection object. This means the user assumes whatever permissions the key has (which is typically all data-plane permissions but could be more restrictive in instances like a SAS token). Besides the authorization challenges, you’ll also lose out on traceability. AI Foundry (and the underlining Azure Machine Learning) has some authorization (via Azure RBAC roles) that is used to control access to connections, but very little in the way auditing who exercised what connection when. For these reasons, you should use Entra ID where possible.

Ready for authorization? Nah, not yet. Before we get into authorization, it’s important to understand that these identities can be used in generally two ways: direct or indirect (on-behalf-of). For example, let’s say you run a Prompt Flow from AI Foundry interface, while the code runs on a serverless compute provisioned in a Microsoft managed network (more on that in a future post), the identity context it uses to access downstream resources is actually yours. Now if you deploy that same prompt to a managed online-endpoint, the code will run on that endpoint and use the managed identity assigned to the compute instance. Not so simple is it?

So how do you know which identity will be used? Observe my general guidance from up above. If you’re running things from the GUI, likely your identity, if you’re deploying stuff to compute likely the identity associated with the compute. The are exceptions to the rule. For example, when you attempt to upload data for fine-tuning or using the on-your-own-data feature in the Chat Playground, and your default storage account is behind a private endpoint your identity will be used to access the data, but the managed identity associated with the project is used to access the private endpoint resource. Why it needs access to the Private Endpoint? I got no idea, it just does. If you don’t, good luck to you poor soul because you’re going to have hell of time troubleshooting it.

Another interesting deviation is when using the Chat Playground Chat With Your Data feature. If you opt to add your data and build the index directly within AI Foundry, there will be a mixed usage of the user identity, AI Search managed identity (which communicates with the embedding models deployed in the AI Services or Azure OpenAI instance to create the vector representations of the chunks in the index), and AI Services or Azure OpenAI managed identity (creates index and data sources in AI Search). It can get very complex.

The image below represents most of the flows you’ll come across.

**The many AI Foundry authentication flows and identity patterns**

Okay, now authorization? Yes, authorization. I’m not one for bullshitting, so I’ll just tell you up front authorization in Foundry can be hard. It gets even harder when you lock down networking because often the error messages you will receive are the same for blocked traffic and failed authorization. The complexities of authorization is exactly why I spent so much time explaining identity and authentication to you. I wish I could tell you every permission in every scenario, but it would take many, many, posts to do that. Instead, I’d advise you to do (sometimes I fail to do this) which is RTFM (go ahead and Google that). This particular product group has made strong efforts to document required permissions, so your first stop should always be the Foundry public documentation. In some instances, you will also need to access the Azure Machine Learning documentation (again, this is built on top of AML) because there are sometimes assumptions that you’ll do just that because you should know this is a feature its inheriting from AML (yeah, not fair but it’s reality).

In general, at an absolute minimum, the permissions assigned to the identities below will get you started as of the date of this post (updated 3/17/2025).

As I covered in my prior posts, the AI Foundry Hub can use either a system-assigned or user-assigned managed identity. You won’t hear me say this often, but just use the system-assigned managed identity if you can for the hub. The required permissions will be automatically assigned and it will be one less thing for you to worry about. Using the permissions listed above should work for a user-assigned managed identity as well (this is on my backburner to re-validate).

A project will always use a system assigned managed identity. The only permission listed above that you’ll need to manually grant is Reader over the Private Endpoint for the default storage account. This is only required if you’re using private endpoint for your default storage account. There may be additional permissions required by the project depending on the activities you are performing and data you are accessing.

On the user side the permissions above will put you in a good place for your typical developer or AI engineer to use most of the features within Foundry. If you’re interacting with other resources (such as an AI Search Index when using the on-your-own-data feature) you’ll need to ensure the user is granted appropriate permissions to those resources as well (typically Search Service Contributor – management plane to list indexes and create indexes and Search Index Data Contributor – data plane create and view records within an index. If your user is fine-tuning a model that is deployed within the Azure OpenAI or AI Service instance, they may additionally need the Azure OpenAI Service Contributor role (to upload the file via Foundry for fine-tuning). Yeah, lots of scenarios and lots of varying permissions for the user, but that covers the most common ones I’ve run into.

Lastly, we have the compute identities. There is no standard here. If you’ve deployed a prompt flow to a managed identity, the compute will need the permissions to connect to the resources behind the connections (again assuming Entra-ID is configured for the connection, if using credential Azure Machine Learning Workspace Secrets Reader on the project is likely sufficient). Using a prompt flow that requires a custom environment may require an image be pushed to the Azure Container Registry which the compute will pull so it will need the Acr Pull RBAC role assignment on the registry.

Complicated right? What happens when stuff doesn’t work? Well, in that scenario you need to look at the logs (both Azure Activity Log and diagnostic logging for the relevant service such as blob, Search, OpenAI and the like). That will tell you what the user is failing to do (again, only if you’re using Entra ID for your connections) and help you figure out what needs to be added from a permissions perspective. If you’re using credentials for your connections, the most common issue with them is with the default storage account where the storage account has had the storage access keys disabled.

Here are the key things I want you to take away from this:

Know the identity being used. If you don’t know which identity is being used, you’ll never get authorization right. Use the of the downstream service logs if you’re unsure. Remember, management plane stuff in Azure Activity Log and data plane stuff in diagnostic logs.
Use Entra ID authentication where possible. Yeah it will make your Azure RBAC a bit more challenging, but you can scope the access AND understand who the hell is doing what.
RTFM where possible. Most of this is buried in the public documentation (sometimes you need to put on your Indiana Jones hat). Remember that if you don’t find it in Foundry documentation, look to Azure Machine Learning.
Use the above information as general guide to get the basic environment setup. You’ll build from that basic foundation.

Alrighty folks, your eyes are likely heavy. I hope this helps a few souls out there who are struggling with getting this product up and running. If you know me, you know I’m no fan boy, but this particular product is pretty damn awesome to get us non-devs immediately getting value from generative AI. It may take some effort to get this product running, but it’s worth it!

Thanks and see you next post!

AI Foundry – Credential vs Identity Data Stores

Posted on January 13, 2025 by mattfeltonma

This is a part of my series on AI Foundry:

Hello again folks. Today, I’m going to continue my series on AI Foundry. I’ve been scratching my head on how best to tackle this series, because the service consists of so many foundational services plumbed together into a larger solution so there is a lot to talk about. The product can be complicated when implementing it with all the security bells and whistles. Getting it right requires a solid baseline understanding of the foundational components security capabilities (such as Azure Storage, Azure Key Vault, etc) and how these components work together for the purposes of AI Foundry.

**The many components of an AI Foundry deployment**

For the purposes of this post, I’m going to focus in on Azure Storage, specifically the storage account associated with the AI Foundry Hub. I will refer to this storage account as the default storage account. As I covered in my first post, AI Foundry is built on top of Azure Machine Learning. Like Azure Machine Learning, AI Foundry uses the default storage account to store artifacts created by the AI Foundry hub and projects. This includes files for the Prompt Flows you create, files used by the compute provisioned in the managed virtual network, and other artifacts related to the functionality of the product. This storage account is shared across the AI Foundry hub and all projects created within it.

The default storage account is critical to the functionality and if you muck up the identity or networking configuration, the product simply won’t work. The errors you’ll receive won’t always indicate an obvious problem with your storage account configuration. To help you avoid mucking up the identity portion, I’m going to use this post to explain your options for identity integration with the default storage account.

AI Foundry uses workspace connection resources to connect to external resources outside of the workspace. This includes the default storage account, AOAI (Azure OpenAI Service) or AI Service instance, and the like. When you create a connection in AI Foundry, you configure how the workspace should authenticate to the resource (determined by the authType property of the connection) when called by a user. This will most commonly be either Entra ID or an API key. In the example below, you see I have a connection object for an AI Search instance set to use Entra authentication by configuring the authType to AAD.

 {
      "id": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.MachineLearningServices/workspaces/aifhaifoundryeus296/connections/connaisaifoundryeus296",
      "location": null,
      "name": "connmysearchservice",
      "properties": {
        "authType": "AAD",
        "category": "CognitiveSearch",
        "createdByWorkspaceArmId": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.MachineLearningServices/workspaces/aifhaifoundryeus296",
        "error": "Network Service does not have permission to check resource /subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.Search/searchServices/aisaifoundryeus296 details. Please consider grant Azure Machine Learning (appId: 0736f41a-0425-4b46-bdb5-1563eff02385) read or contributor access to connected resource.",
        "expiryTime": null,
        "group": "AzureAI",
        "isSharedToAll": true,
        "metadata": {
          "ApiType": "Azure",
          "ApiVersion": "2024-05-01-preview",
          "DeploymentApiVersion": "2023-11-01",
          "ResourceId": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.Search/searchServices/mysearchservice"
        },
        "peRequirement": "NotApplicable",
        "peStatus": "NotApplicable",
        "sharedUserList": [],
        "target": "https://mysearchservice.search.windows.net",
        "useWorkspaceManagedIdentity": false
      },
      "systemData": {
        "createdAt": "2025-01-12T23:19:01.8005674Z",
        "createdBy": "d34d51b2-34b4-45d9-b6a8-cc5422eb400a",
        "createdByType": "Application",
        "lastModifiedAt": "2025-01-12T23:19:01.8005674Z",
        "lastModifiedBy": "d34d51b2-34b4-45d9-b6a8-cc5422eb400a",
        "lastModifiedByType": "Application"
      },
      "tags": null,
      "type": "Microsoft.MachineLearningServices/workspaces/connections"
    }

Creating this connection allows me to use the AI Search instance within the AI Foundry hub and projects such as using it within the ChatPlayground Chat With Your Data feature. When the connection object is called, an Entra ID identity will be used. This could be the user’s identity, it could a project’s managed identity, or it could even be a managed-online endpoint’s managed identity. In all cases, the identity will be an Entra ID identity that can be authenticated to the tenant and the actions it is authorized to do are determined by its Azure RBAC assignments. It’s critical to understand that if you choose Entra ID-based authentication, you need to have proper permissions in place.

When a new AI Foundry hub is created, it will either create new storage account or integrate with an existing storage account to be used as the default storage account. During setup via the Portal, in the identity section you’ll see the option to choose credential-based or identity-based authentication when connecting to the default storage account. By default, credential-based access will be used. If you are provisioning via Terraform (which as of right now will require you to use the AzApi resource provider) you would set the properties.systemDatastoresAuthMode property to either accesskey or identity. As of the date of this blog, this property still is not documented in the REST API documentation that I could find, however, it will work when referencing it with API version Microsoft.MachineLearningServices/workspaces@2024-10-01-preview.

So why would you choose identity-based access if you have to additionally provision the relevant security principals with access via RBAC? Before I answer that, let me do a quick recap on authorization in Azure. As I cover in my series on Azure authorization, services like storage have both a management plane and data plane. While the management plane is always Entra ID-based authentication and Azure RBAC, the data plane for most services (storage included) can use either Entra ID/Azure RBAC or API keys (via Storage Access Keys and SAS tokens). Usage of any type of static key typically grants the security principal using the key complete access to the data plane. Additionally, determining who is using the key at any given time is mostly impossible. For that reason, choosing to use Entra ID/Azure RBAC should be your preference wherever possible. Entra ID will give your traceability back to the security principal that touched the resource and Azure RBAC will give you the ability to assign granular permissions across the data plane.

If you instead select credential-based authentication a few things happen. When the new AI Foundry hub is created the connections made to the default storage account will be configured to use a SAS token. Any security principal with read access to the workspace can use that connection information for the storage account from within an AI Foundry project to connect to the storage account using those credentials. This means no audibility about what user is doing what with the storage account. This goes for any connection you share across projects that use an API key. Not good.

**Default storage account configured to use credential-based authentication**

It’s worth understanding the Key Vault resource used by AI Foundry in this scenario. When selecting credential-based authentication for the default storage account, the storage access keys for the storage account are stored in the Key Vault. Both the AI Foundry hub and projects under the hub are granted access to the secrets via Key Vault access policies. Yuck and yuck. Users do not get access to the Key Vault itself. Foundry simply enables them to exercise the use of the credential via permissions over teh connection object within the Foundry hub or project. When using identity-based authentication and Entra ID for your connections, the Azure Key Vault will be used minimally (such as being used if you deploy a model from the model catalog to managed online endpoint and select key-based authentication) to none.

Hopefully at this point I’ve sold you on the benefits of using the identity-based authentication to the default storage account (and Entra ID for connected resources). As a quick recap, if you care about least privilege and audibility, you’ll choose identity-based authentication. The main consideration of choosing identity-based authentication for the default storage account is that you need to get Azure RBAC right or else shit will break. Oh yes will it break.

If you configure your AI Foundry instance with a SMI (system-assigned managed identity) for the hub and projects, the required permissions on the default storage account will be granted for these identities. This includes:

Hub identity
- Storage Blob Data Contributor
- Storage File Data Privileged Contributor
Project identity
- Storage Account Contributor
- Storage Blob Data Contributor
- Storage File Data Privileged Contributor
- Storage Table Data Contributor

If you’re nosy like I am, you’ll notice the Azure RBAC assignments for both identities for the hub and project have an ABAC condition attached (yes an actual use case!). I plan on covering ABAC conditions in depth in my authorization series, but essentially they are a way of scoping the access to an attribute of the security principal, resource, or session. Within AI Foundry, they are used to limit the managed identities to accessing the blob containers specific to their underlining AML workspace. This helps to prevent the managed identity of one project from accessing artifacts produced by another project. For example, here are the conditions associated with my hub’s managed identity:

(
 (
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action'})
 )
 OR 
 (
  @Resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringStartsWithIgnoreCase '67b8ddaa-f77e-4d12-b9ca-440326274da9'
 )
)

If you opt to use a UMI (user-assigned managed identity) for the AI Foundry hub you’ll need to manually grant these permissions to the UMI prior to provisioning the hub. You should try to include these conditions.

As I mentioned earlier, there are three primary sets of identities that hit the resources for an AI Foundry. These include the hub/project identity, user identity, and compute identities. If you opt to use identity-based authentication to the storage account, you will need to ensure you grant your users appropriate permissions on the storage account. When a user does something like create a prompt flow, the user’s identity context is used to access the file endpoint in the storage account to create a file share that will contain prompt flows they create.

This typically includes:

Storage Blob Data Contributor
Storage File Data Privileged Contributor

If you’re spinning up a managed-online endpoint, you will need to grant that managed identity (if using an UMI, these are automatically added if using an SMI):

Storage Blob Data Reader
Storage Blob Data Contributor

The last thing I want to mention is specific to if you creating Private Endpoints for your default storage account (which for a secure AI Foundry, you should be). Ensure you grant each AI Foundry project managed identity Reader over the private endpoints (both file and blob) for the default storage account. This is required when previewing data from the AI Foundry Portal for use cases like uploading data for fine-tuning a model. I’m not sure where this requirement comes from, but if you don’t include it, your users will run into weird permission errors when attempting to upload data to the default storage account from within AI Foundry.

Let’s sum things up:

The default storage account configuration is critical to successful use of the product. Muck up authorization and prepared for pain.
Use identity-based authentication for connectivity to the default storage account. This will ensure auditability for who accesses what.
Use Entra ID authentication for your AI Foundry connections wherever possible. This will give you auditability and the ability to scope permissions via Azure RBAC.
If you using identity-based authentication, ensure you put in place the right permissions for the hub/project (done automatically if using SMI), user, and compute identities.
If you’re having trouble with users uploading data for fine-tuning via AI Foundry, your project is probably missing the read permissions over the default storage account private endpoints.
If you’re having trouble provisioning a managed online endpoint that is using an UMI, you are probably missing permissions on the default storage account.

That wraps up this post. Thanks folks!

AI Foundry – The Basics

Posted on January 8, 2025 by mattfeltonma

This is a part of my series on AI Foundry:

Happy New Year! Over the last few months of 2024, I was buried in AI Foundry (FKA Azure AI Studio. Hey marketing needs to do something a few times a year.) proof-of-concepts with my best buddy Jose. The product is very new, very powerful, but also very complex to get up and running in a regulated environment. Through many hours labbing scenarios out and troubleshooting with customers, I figured it was time to share some of what I learned.

So what is AI Foundry? If you want the Microsoft explanation of what it is, read the documentation. Here you’ll get the Matt Felton opinion on what it is (god help you). AI Foundry is a toolset intended to help AI Engineers build Generative AI applications. It allows them to interact with the LLMs (large language models) and build complex workflows (via Prompt Flows) in a no code (like Chat Playground) or low code (prompt flow interface) environment. As you can imagine, this is an attractive tool to get the people who know Generative AI quick access to the LLMs so use cases can be validated before expensive development cycles are spent building the pretty front-end and code required to make it a real application.

Before we get into the guts of the use cases Jose and I have run into, I want to start with the basics of how the hell this service is setup. This will likely require a post or two, so grab your coffee and get ready for a crash course.

AI Foundry is built on top of AML (Azure Machine Learning). If you’re ever built out a locked down AML instance, you have some understanding of the many services that work together to provide the service. AI Foundry inherits these components and provides a sleek user interface on top and typically requires additional resources like Azure OpenAI and AI Search (for RAG use cases). Like AML there are lots and lots of pieces that you need to think about, plan for, and implement in the correct manner to make the secured instance of AI Foundry work.

One specific feature of AML plays an important role within AI Foundry and that is the concept of a hub workspace. A hub is an AML workspace that centrally manages the security, networking, compute resources, and quota for children AML workspaces. These child AML workspaces are referred to as projects. The whole goal of the hub is to make it easier for your various business units to do the stuff they need to do with AML/AI Foundry without having to manage the complex pieces like the security and networking. My guidance would be to think about giving each business unit a Foundry Hub that they can group projects of similar environment (prod or non-prod) and data sensitivity.

**General relationship between AI Foundry Hub and AI Foundry Projects**

Ok, so you get the basic gist of this. When you deploy an AI Foundry instance, behind the scenes an AML workspace designated as a hub is created. Each project you create is a “child” AML workspace of the hub workspace and will inherit some resources from the hub. Now that you’re grounded in that basic piece, let’s talk about the individual components.

**The many resources involved with a secured AI Foundry instance**

As you can see in the above image, there are a lot of components of this solution that you will likely use if you want to deploy an AI Foundry instance that has the necessary security and networking controls. Let me give you a quick and dirty explanation of each component. I’ll dive deeper into the identity, authorization, and networking aspects of these components in future posts.

Managed identities
- There are lots of managed identities in use with this product. There is a managed identity for the hub, a managed identity for the project, and managed identities for the various compute. One of the challenges of AI Foundry is knowing which managed identity (and if not a managed identity, the user’s identity) is being used to access a specific resource.
Azure Storage Account
- Just like in AML, there is a default storage account associated with the workspace. Unlike with a traditional AML workspace you may be familiar with, the hub feature allows all project workspaces to leverage the same storage account. The storage account is used by the workspaces to store artifacts like logs in blob storage and artifacts like prompt flows in file storage. The hub and projects isolate their data to specific containers (for blob) and folders (for file) with Azure ABAC (holy f*ck a use case for this feature finally) setup such that the managed identities for the workspaces can only access containers/folders for data related to their specific workspace.
Azure Key Vault
- The Azure Key Vault will store any keys used for connections created from within the AI Foundry project. This could be keys for the default storage account or keys used for API access to models you deploy from the model catalog.
Azure Container Registry
- While this is deemed optional, I’d recommend you plan on deploying it. When deploy a prompt flow there uses certain functionality to the managed compute, the container image used isn’t the default runtime and an ACR instance will be spun up automatically without all the security controls you’ll likely want.
Azure OpenAI Service
- This is used for deployment of OpenAI and some Microsoft chat and embedding models
Azure AI Service
- This can be used as an alternative to the Azure OpenAI Service. It has some additional functionality beyond just hosting the models such as speed and the like.
Azure AI Search
- This will be used for anything RAG related. Most likely you’ll see it used with the Chat With Your Data feature of the Chat Playground.
Managed Network
- This is used to host the compute instances, serverless endpoints and managed online endpoints spun up for compute operations within AI Foundry. I’ll do a deep dive into networking within the service in a future post.
Azure Firewall
- If building a secured instance, you’re going to use a managed virtual network that is locked down to all Internet access via outbound rules. Under the hood an Azure Firewall instance (standard for almost all use cases) will be spun up in the managed virtual network. You interact with it through the creation of the outbound rules and can’t directly administer it. However, you will be paying for it.
Role Assignments
- So so so many role assignments. I’ll cover these in a future post.
Azure Private DNS
- Used heavily for interacting with the AI Foundry instance and models/endpoints you deploy. I’ll cover this in an upcoming post.

Are you frightened yet? If not you should be! Don’t worry though, over this series I’ll walk you through the pain points of getting this service up and running. Once you get past the complex configuration, it’s a crazy valuable service that you’ll have a high demand for from your business units.

In the next post I’ll walk through the complexities of authorization within AI Foundry.

Azure AI Studio – Chat Playground and API Management

Posted on September 12, 2024 by mattfeltonma

This is part of my series on GenAI Services in Azure:

Hello again folks!

Today, I’m going to be posting my first post in a series on Azure AI Studio. I’ll let the true AI professionals give you the gory details and features of the service. The way my small brain thinks of the service is a platform built on top of AML (Azure Machine Learning) to make building applications that use Generative AI more developer-friendly. You can build and test applications, deploy third-party models, and organize applications into “projects” which can be secured to a specific project team but share resources across an organization via the concept of a hub. I’ll cover more on those pieces in a future blog post, but for today I want to focus on a pattern I was messing around that I think would be appealing to most folks.

One of the neat features of AI Studio is the Chat Playground. The Chat Playground is a web interface for interacting with models you have deployed to Azure AI Studio. You can send prompts and receive completions, adjust parameters such as temperature, and even get a code sample of the code being run by the web interface. The models that can e deployed include OpenAI models deployed to an AOAI (Azure OpenAI Service) instance or third-party models like Meta’s Llama deployed to a serverless endpoint or self-managed compute (in AML called managed online endpoint). For the purposes of this post I’m going to be focusing on OpenAI models deployed to an AOAI instance.

You’re probably looking at this and thinking, “Yeah that is cool… a similar functionality exists in Azure OpenAI Studio and it does the same thing.” That’s correct it does, but for many organizations using the Azure OpenAI Studio’s Chat Playground isn’t an option for a number of different reasons both operational and security-related.

From an operational perspective, the Azure OpenAI Studio’s Chat Playground is designed to communicate directly with the endpoint for an AOAI instance. As I’ve covered in previous posts, this can be problematic. One reason is you’re limited to the quota within the instance which could cause you to hit limits quickly if you direct a whole ton of users to it. Typically, you will load balance across multiple instances deployed to multiple regions across multiple subscriptions as I discuss in my post on load balancing AOAI. The other problem is dealing with internal chargebacks. If I have multiple BUs (business units) hammering away at an instance, I don’t have any easy to determine who which folks in what BU consumed what. While metrics are token usage are captured in the metrics streamed from an instance, there is no way to associate that usage with an individual.

On the security side, communicating directly with the AOAI instance means I can’t review the prompts and responses being sent and received by the service. Many regulated organizations have requirements for these to be captured for review to ensure the service is being used appropriately and sensitive data isn’t being sent that hasn’t been approved to be sent. Additionally, availability of the AOAI instance could be affected by one user going nuts and consuming the full quota.

The challenges outlined above have driven many customers to insert a control point. The industry seems determined to coin this architectural component a Gen AI Gateway so I’ll play along. For you fellow old folks, all a Gen AI Gateway really is an API Gateway with some Gen AI-related features slapped on top of it. It sits between the front-facing user application and the models processing the prompts and responses. The GenAI-specific features available within the gateway help to address the operational and security challenges I’ve outlined above. If you’re curious about the specifics on this, you can check out my post on load balancing, logging, tracking token usage, rate limiting, and extracting useful information from the conversation such as prompts and responses.

**Example design and process flow of a Gen AI Gateway**

In the image above I’ve included an example of how APIM (Azure API Management) could be used to provide such functionality. Within the customer base I work with at Microsoft, many customers have built something that functions similar to what you see above. A design like this helps to address the operational and security challenges I’ve outlined above.

Wonderful right? Now what the **** does this have to do with AI Studio’s Chat Playground? Well, unlike the Azure OpenAI Studio’s Chat Playground, AI Studio’s offering does support modifying the endpoint to point to your generative AI gateway. How you do this isn’t super intuitive, but it does work. Whether you go this route is totally up to you. Ok, disclaimer is done, let’s talk about how you do this.

One thing to understand about using AI Studio’s Chat Playground is it works the same way that Azure OpenAI Studio’s version works in regards to where the TCP connections are sourced from when making calls to the model. As can be seen in the Fiddler capture below, the TCP connections made when you submit a prompt from the Chat Playground are sourced from the user’s endpoint.

**Fiddler capture showing Chat Completion coming from user endpoint**

This makes our life much easier because we likely control the path that user’s packet takes and the DNS the user uses which means we can direct that user’s packet to a Gen AI Gateway. For the purposes of this post, my goal is to funnel these prompts and completions through an APIM instance I have in place which has some APIM policy snippets that do some checks and balances and a call a small app (based off an awesome solution assembled by my buddy Shaun Callighan) which logs prompts and responses and calculates token metrics. The data processed by the app are then sent to an Event Hub, processed by Stream Analytics, and dumped into CosmosDB.

**APIM between Chat Playground and AOAI**

When you want to connect to an AOAI instance from AI Studio’s Chat Playground you add it as a connection. These connections can created at the hub level (think of this as a logical container for the projects) and then shared across projects. When adding the connection you can browse for the instance you want to connect to or enter manually.

**Adding a connection to an AOAI instance**

If you were to do that you won’t be able to create a deployment of a model or access a deployment of a model deployed in the instances behind it. This is because AI Studio is making calls to the Azure management plane to enumerate the deployments within the instance. Since there isn’t an AOAI with the hostname of your AOAI instance, you’ll be unable to add deployments or pick a deployment from the Chat Playground.

To work around this, you need to add a connection to one of your AOAI instances. This will be your “stub” instance that we’ll modify the endpoint of to point to API Management. If you’re load balancing across multiple AOAI instances behind APIM, you need to ensure that you’ve already created your model deployments and you’ve named them consistently across all of the AOAI instances you’re load balancing to. In the image below, I modify the endpoint to point to my APIM instance. The azure-openai-log-helper path is added to send it to a specific API I have setup on APIM that handles logging. For your environment, you’ll likely just need the hostname.

Now before you go running and trying to use the Chat Playground, you’ll have to make a change to the APIM policy. Since the user’s browser is being told to make the call to this endpoint from a different domain (AI Studio’s domain) we need to ensure there is a CORS policy in place on the APIM instance to allow for this, otherwise it will be blocked by APIM. If you forget about this policy you’ll get a back a 200 from the APIM instance but nothing will be in the response.

Your CORS policy could look like the below:

        <cors>
            <allowed-origins>
                <origin>https://ai.azure.com/</origin>
                <origin>https://ai.azure.com</origin>
            </allowed-origins>
            <allowed-methods preflight-result-max-age="300">
                <method>POST</method>
                <method>OPTIONS</method>
            </allowed-methods>
            <allowed-headers>
                <header>authorization</header>
                <header>content-type</header>
                <header>request-id</header>
                <header>traceparent</header>
                <header>x-ms-client-request-id</header>
                <header>x-ms-useragent</header>
            </allowed-headers>
        </cors>

Once you’ve modified your APIM policy with the CORS update, you’ll be good to go! Your requests will now flow through APIM for all the GenAI Gateway goodness.

**Chat Completion from AI Studio Chat Playground flowing through APIM**

When messing with this I ran into a few things I want to call out:

Do not forget the CORS policy. If you run into a 200 response from APIM with no content, it’s probably the CORS snippet.
If you have a validate-jwt snippet in your APIM policy that includes validating the claim includes cognitivesservices, remove that. The claim passed by AI Studio includes a trailing forward slash which won’t likely match what you get back if you’re using the MSAL library in code. You could certainly include some logic to handle it, but honestly the security benefit is so little from checking the claim just make it easy on yourself and remove the check for the claim. Keep the check that validate-jwt snippet but restrict it to checking the tenant ID in the token.
Chat Playground will pass the content property as the prompt as an array (this is the more modern approach to allow for multi-modal models like GPT-4o which can handle images and audio). If you have an APIM policy in place to parse the request body and extract information you’ll need to update it to also handle when content is passed as an array.
Chat Playground allows for the user to submit an image along with text in the prompt. Ensure your APIM policy is capable of handling prompts like that. Dealing with human users being able to submit images to an LLM and ensuring you’re reviewing that image for DLP and calculating token consumption for streaming Chat Completion is a whole other blog topic that I’m not going to do today. Key thing is you want to account for that. Block images or ensure your policy is capable of handling it if you’re deploying 4o or 4 Vision.

Well folks that sums up this post. I realize this solution is a bit funky, and I’m not gonna tell you to use it. I’m simply putting it out there as an option if you have a business need strong enough to provide a ChatGPT-style solution but don’t have the bandwidth or time to whip up your own application.

Enjoy!

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Tag Archives: security