AI Foundry – Identity, Authentication and Authorization

This is a part of my series on AI Foundry:

Updates:

  • 3/17/2025 – Updated diagrams to include new identities and RBAC roles that are recommended as a minimum

Yes, I’m going to re-use the outline from my Azure OpenAI series. You wanna fight about it? This means we’re going to now talk about one of the most important (as per usual) and complicated (oh so complicated) topic in AI Foundry: identity, authentication, and authorization. If you haven’t read my prior two posts, you should take a few minutes and read through them. They’ll give you the baseline you’ll need to get the most out of this post. So put on your coffee, break out the metal clips to keep your eyes open Clockwork Orange-style, and prepare for a dip into the many ways identity, authN, and authZ are handled within the service.

As I covered in my first post Foundry is made up of a ton of different services. Each of these services plays a part in features within Foundry, some may support multiple forms of authentication, and most will be accessed by the many types of identities used within the product. Understanding how each identity is used will be critical in getting authorization right. Missing Azure RBAC role assignments is the number one most common misconfiguration (right above networking, which is also complicated as we’ll see in a future post).

Azure AI Foundry Components

Let’s start first with identity. There will generally be four types of identities used in AI Foundry. These identities will be a combination of human identities and non-human identities. Your humans will be your AI Engineers, developers, and central IT and will use their Entra ID user identities. Your non-humans will include the AI Foundry hub, project, and compute you provision for different purposes. In general, identities are used in the following way (this is not inclusive of all things, just the ones I’ve noticed):

  • Humans
    • Entra ID Users
      • Actions within Azure Portal
      • Actions within AI Foundry Studio
        • Running a prompt flow from the GUI
        • Using the Chat Playground to send prompts to an LLM
        • Running the Chat-With-Your-Data workflow within the Chat Playground
        • Creating a new project within a hub
      • Actions using Azure CLI such as sending an inference to a managed online endpoint that supports Entra ID authentication
  • Non-Humans
    • AI Foundry Hub Managed Identity
      • Accessing the Azure Key Vault associated with the Foundry instance to create secrets or pull secrets when AI Foundry connections are created using credentials versus Entra ID
      • Modify properties of the default Azure Storage Account such as setting CORS policies
      • Creating managed private endpoints for hub resources if a managed virtual network is used
    • AI Foundry Project Managed Identity
      • Accessing the Azure Key Vault associated with the Foundry instance to create secrets or pull secrets when AI Foundry connections are created using credentials versus Entra ID
      • Creating blob containers for project where project artifacts such as logs and metrics are stored
      • Creating file share for project where project artifacts such as user-created Prompt Flow files are stored
    • Compute
      • Pulling container image from Azure Container Registry when deploying prompt flows that require custom environments
      • Accessing default storage account project blob container to pull data needed to boot
      • Much much more in this category. Really depends on what you’re doing

Alright, so you understand the identities that will be used and you have a general idea of how they’ll be used to perform different tasks within the Foundry ecosystem. Let’s now talk authentication.

The many identities of AI Foundry

Authentication in Foundry isn’t too complicated (in comparison to identity and authorization). Authenticating to the Azure Portal and the Foundry Studio is always going to be Entra ID-based authentication. Authentication to other Azure resources from the Foundry is where it can get interesting. As I covered in my prior post, Foundry will typically support two methods of authentication: Entra ID and API key based (or credentials as you’ll see it often referred to as in Foundry). If at all possible, you’ll want to lean into Entra ID-based authentication whenever you access a resource from Foundry. As we’ll see in the next section around authorization, this will have benefits. Besides authorization, you’ll also get auditability because the logs will show the actual security principal that accessed the resource.

If you opt to use credential-based authentication for your connections to Azure resources, you’ll lose out in a few different areas. When credential-based authentication is used, users will access connected resources within Foundry using the keys stored in the Foundry connection object. This means the user assumes whatever permissions the key has (which is typically all data-plane permissions but could be more restrictive in instances like a SAS token). Besides the authorization challenges, you’ll also lose out on traceability. AI Foundry (and the underlining Azure Machine Learning) has some authorization (via Azure RBAC roles) that is used to control access to connections, but very little in the way auditing who exercised what connection when. For these reasons, you should use Entra ID where possible.

Ready for authorization? Nah, not yet. Before we get into authorization, it’s important to understand that these identities can be used in generally two ways: direct or indirect (on-behalf-of). For example, let’s say you run a Prompt Flow from AI Foundry interface, while the code runs on a serverless compute provisioned in a Microsoft managed network (more on that in a future post), the identity context it uses to access downstream resources is actually yours. Now if you deploy that same prompt to a managed online-endpoint, the code will run on that endpoint and use the managed identity assigned to the compute instance. Not so simple is it?

So how do you know which identity will be used? Observe my general guidance from up above. If you’re running things from the GUI, likely your identity, if you’re deploying stuff to compute likely the identity associated with the compute. The are exceptions to the rule. For example, when you attempt to upload data for fine-tuning or using the on-your-own-data feature in the Chat Playground, and your default storage account is behind a private endpoint your identity will be used to access the data, but the managed identity associated with the project is used to access the private endpoint resource. Why it needs access to the Private Endpoint? I got no idea, it just does. If you don’t, good luck to you poor soul because you’re going to have hell of time troubleshooting it.

Another interesting deviation is when using the Chat Playground Chat With Your Data feature. If you opt to add your data and build the index directly within AI Foundry, there will be a mixed usage of the user identity, AI Search managed identity (which communicates with the embedding models deployed in the AI Services or Azure OpenAI instance to create the vector representations of the chunks in the index), and AI Services or Azure OpenAI managed identity (creates index and data sources in AI Search). It can get very complex.

The image below represents most of the flows you’ll come across.

The many AI Foundry authentication flows and identity patterns

Okay, now authorization? Yes, authorization. I’m not one for bullshitting, so I’ll just tell you up front authorization in Foundry can be hard. It gets even harder when you lock down networking because often the error messages you will receive are the same for blocked traffic and failed authorization. The complexities of authorization is exactly why I spent so much time explaining identity and authentication to you. I wish I could tell you every permission in every scenario, but it would take many, many, posts to do that. Instead, I’d advise you to do (sometimes I fail to do this) which is RTFM (go ahead and Google that). This particular product group has made strong efforts to document required permissions, so your first stop should always be the Foundry public documentation. In some instances, you will also need to access the Azure Machine Learning documentation (again, this is built on top of AML) because there are sometimes assumptions that you’ll do just that because you should know this is a feature its inheriting from AML (yeah, not fair but it’s reality).

In general, at an absolute minimum, the permissions assigned to the identities below will get you started as of the date of this post (updated 3/17/2025).

As I covered in my prior posts, the AI Foundry Hub can use either a system-assigned or user-assigned managed identity. You won’t hear me say this often, but just use the system-assigned managed identity if you can for the hub. The required permissions will be automatically assigned and it will be one less thing for you to worry about. Using the permissions listed above should work for a user-assigned managed identity as well (this is on my backburner to re-validate).

A project will always use a system assigned managed identity. The only permission listed above that you’ll need to manually grant is Reader over the Private Endpoint for the default storage account. This is only required if you’re using private endpoint for your default storage account. There may be additional permissions required by the project depending on the activities you are performing and data you are accessing.

On the user side the permissions above will put you in a good place for your typical developer or AI engineer to use most of the features within Foundry. If you’re interacting with other resources (such as an AI Search Index when using the on-your-own-data feature) you’ll need to ensure the user is granted appropriate permissions to those resources as well (typically Search Service Contributor – management plane to list indexes and create indexes and Search Index Data Contributor – data plane create and view records within an index. If your user is fine-tuning a model that is deployed within the Azure OpenAI or AI Service instance, they may additionally need the Azure OpenAI Service Contributor role (to upload the file via Foundry for fine-tuning). Yeah, lots of scenarios and lots of varying permissions for the user, but that covers the most common ones I’ve run into.

Lastly, we have the compute identities. There is no standard here. If you’ve deployed a prompt flow to a managed identity, the compute will need the permissions to connect to the resources behind the connections (again assuming Entra-ID is configured for the connection, if using credential Azure Machine Learning Workspace Secrets Reader on the project is likely sufficient). Using a prompt flow that requires a custom environment may require an image be pushed to the Azure Container Registry which the compute will pull so it will need the Acr Pull RBAC role assignment on the registry.

Complicated right? What happens when stuff doesn’t work? Well, in that scenario you need to look at the logs (both Azure Activity Log and diagnostic logging for the relevant service such as blob, Search, OpenAI and the like). That will tell you what the user is failing to do (again, only if you’re using Entra ID for your connections) and help you figure out what needs to be added from a permissions perspective. If you’re using credentials for your connections, the most common issue with them is with the default storage account where the storage account has had the storage access keys disabled.

Here are the key things I want you to take away from this:

  1. Know the identity being used. If you don’t know which identity is being used, you’ll never get authorization right. Use the of the downstream service logs if you’re unsure. Remember, management plane stuff in Azure Activity Log and data plane stuff in diagnostic logs.
  2. Use Entra ID authentication where possible. Yeah it will make your Azure RBAC a bit more challenging, but you can scope the access AND understand who the hell is doing what.
  3. RTFM where possible. Most of this is buried in the public documentation (sometimes you need to put on your Indiana Jones hat). Remember that if you don’t find it in Foundry documentation, look to Azure Machine Learning.
  4. Use the above information as general guide to get the basic environment setup. You’ll build from that basic foundation.

Alrighty folks, your eyes are likely heavy. I hope this helps a few souls out there who are struggling with getting this product up and running. If you know me, you know I’m no fan boy, but this particular product is pretty damn awesome to get us non-devs immediately getting value from generative AI. It may take some effort to get this product running, but it’s worth it!

Thanks and see you next post!

AI Foundry – Credential vs Identity Data Stores

This is a part of my series on AI Foundry:

Hello again folks. Today, I’m going to continue my series on AI Foundry. I’ve been scratching my head on how best to tackle this series, because the service consists of so many foundational services plumbed together into a larger solution so there is a lot to talk about. The product can be complicated when implementing it with all the security bells and whistles. Getting it right requires a solid baseline understanding of the foundational components security capabilities (such as Azure Storage, Azure Key Vault, etc) and how these components work together for the purposes of AI Foundry.

The many components of an AI Foundry deployment

For the purposes of this post, I’m going to focus in on Azure Storage, specifically the storage account associated with the AI Foundry Hub. I will refer to this storage account as the default storage account. As I covered in my first post, AI Foundry is built on top of Azure Machine Learning. Like Azure Machine Learning, AI Foundry uses the default storage account to store artifacts created by the AI Foundry hub and projects. This includes files for the Prompt Flows you create, files used by the compute provisioned in the managed virtual network, and other artifacts related to the functionality of the product. This storage account is shared across the AI Foundry hub and all projects created within it.

The default storage account is critical to the functionality and if you muck up the identity or networking configuration, the product simply won’t work. The errors you’ll receive won’t always indicate an obvious problem with your storage account configuration. To help you avoid mucking up the identity portion, I’m going to use this post to explain your options for identity integration with the default storage account.

AI Foundry uses workspace connection resources to connect to external resources outside of the workspace. This includes the default storage account, AOAI (Azure OpenAI Service) or AI Service instance, and the like. When you create a connection in AI Foundry, you configure how the workspace should authenticate to the resource (determined by the authType property of the connection) when called by a user. This will most commonly be either Entra ID or an API key. In the example below, you see I have a connection object for an AI Search instance set to use Entra authentication by configuring the authType to AAD.

 {
      "id": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.MachineLearningServices/workspaces/aifhaifoundryeus296/connections/connaisaifoundryeus296",
      "location": null,
      "name": "connmysearchservice",
      "properties": {
        "authType": "AAD",
        "category": "CognitiveSearch",
        "createdByWorkspaceArmId": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.MachineLearningServices/workspaces/aifhaifoundryeus296",
        "error": "Network Service does not have permission to check resource /subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.Search/searchServices/aisaifoundryeus296 details. Please consider grant Azure Machine Learning (appId: 0736f41a-0425-4b46-bdb5-1563eff02385) read or contributor access to connected resource.",
        "expiryTime": null,
        "group": "AzureAI",
        "isSharedToAll": true,
        "metadata": {
          "ApiType": "Azure",
          "ApiVersion": "2024-05-01-preview",
          "DeploymentApiVersion": "2023-11-01",
          "ResourceId": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.Search/searchServices/mysearchservice"
        },
        "peRequirement": "NotApplicable",
        "peStatus": "NotApplicable",
        "sharedUserList": [],
        "target": "https://mysearchservice.search.windows.net",
        "useWorkspaceManagedIdentity": false
      },
      "systemData": {
        "createdAt": "2025-01-12T23:19:01.8005674Z",
        "createdBy": "d34d51b2-34b4-45d9-b6a8-cc5422eb400a",
        "createdByType": "Application",
        "lastModifiedAt": "2025-01-12T23:19:01.8005674Z",
        "lastModifiedBy": "d34d51b2-34b4-45d9-b6a8-cc5422eb400a",
        "lastModifiedByType": "Application"
      },
      "tags": null,
      "type": "Microsoft.MachineLearningServices/workspaces/connections"
    }

Creating this connection allows me to use the AI Search instance within the AI Foundry hub and projects such as using it within the ChatPlayground Chat With Your Data feature. When the connection object is called, an Entra ID identity will be used. This could be the user’s identity, it could a project’s managed identity, or it could even be a managed-online endpoint’s managed identity. In all cases, the identity will be an Entra ID identity that can be authenticated to the tenant and the actions it is authorized to do are determined by its Azure RBAC assignments. It’s critical to understand that if you choose Entra ID-based authentication, you need to have proper permissions in place.

When a new AI Foundry hub is created, it will either create new storage account or integrate with an existing storage account to be used as the default storage account. During setup via the Portal, in the identity section you’ll see the option to choose credential-based or identity-based authentication when connecting to the default storage account. By default, credential-based access will be used. If you are provisioning via Terraform (which as of right now will require you to use the AzApi resource provider) you would set the properties.systemDatastoresAuthMode property to either accesskey or identity. As of the date of this blog, this property still is not documented in the REST API documentation that I could find, however, it will work when referencing it with API version Microsoft.MachineLearningServices/workspaces@2024-10-01-preview.

Credential or Identity-based access

So why would you choose identity-based access if you have to additionally provision the relevant security principals with access via RBAC? Before I answer that, let me do a quick recap on authorization in Azure. As I cover in my series on Azure authorization, services like storage have both a management plane and data plane. While the management plane is always Entra ID-based authentication and Azure RBAC, the data plane for most services (storage included) can use either Entra ID/Azure RBAC or API keys (via Storage Access Keys and SAS tokens). Usage of any type of static key typically grants the security principal using the key complete access to the data plane. Additionally, determining who is using the key at any given time is mostly impossible. For that reason, choosing to use Entra ID/Azure RBAC should be your preference wherever possible. Entra ID will give your traceability back to the security principal that touched the resource and Azure RBAC will give you the ability to assign granular permissions across the data plane.

Management plane versus data plane

If you instead select credential-based authentication a few things happen. When the new AI Foundry hub is created the connections made to the default storage account will be configured to use a SAS token. Any security principal with read access to the workspace can use that connection information for the storage account from within an AI Foundry project to connect to the storage account using those credentials. This means no audibility about what user is doing what with the storage account. This goes for any connection you share across projects that use an API key. Not good.

Default storage account configured to use credential-based authentication

It’s worth understanding the Key Vault resource used by AI Foundry in this scenario. When selecting credential-based authentication for the default storage account, the storage access keys for the storage account are stored in the Key Vault. Both the AI Foundry hub and projects under the hub are granted access to the secrets via Key Vault access policies. Yuck and yuck. Users do not get access to the Key Vault itself. Foundry simply enables them to exercise the use of the credential via permissions over teh connection object within the Foundry hub or project. When using identity-based authentication and Entra ID for your connections, the Azure Key Vault will be used minimally (such as being used if you deploy a model from the model catalog to managed online endpoint and select key-based authentication) to none.

Hopefully at this point I’ve sold you on the benefits of using the identity-based authentication to the default storage account (and Entra ID for connected resources). As a quick recap, if you care about least privilege and audibility, you’ll choose identity-based authentication. The main consideration of choosing identity-based authentication for the default storage account is that you need to get Azure RBAC right or else shit will break. Oh yes will it break.

If you configure your AI Foundry instance with a SMI (system-assigned managed identity) for the hub and projects, the required permissions on the default storage account will be granted for these identities. This includes:

  • Hub identity
    • Storage Blob Data Contributor
    • Storage File Data Privileged Contributor
  • Project identity
    • Storage Account Contributor
    • Storage Blob Data Contributor
    • Storage File Data Privileged Contributor
    • Storage Table Data Contributor

If you’re nosy like I am, you’ll notice the Azure RBAC assignments for both identities for the hub and project have an ABAC condition attached (yes an actual use case!). I plan on covering ABAC conditions in depth in my authorization series, but essentially they are a way of scoping the access to an attribute of the security principal, resource, or session. Within AI Foundry, they are used to limit the managed identities to accessing the blob containers specific to their underlining AML workspace. This helps to prevent the managed identity of one project from accessing artifacts produced by another project. For example, here are the conditions associated with my hub’s managed identity:

(
 (
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action'})
 )
 OR 
 (
  @Resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringStartsWithIgnoreCase '67b8ddaa-f77e-4d12-b9ca-440326274da9'
 )
)

If you opt to use a UMI (user-assigned managed identity) for the AI Foundry hub you’ll need to manually grant these permissions to the UMI prior to provisioning the hub. You should try to include these conditions.

As I mentioned earlier, there are three primary sets of identities that hit the resources for an AI Foundry. These include the hub/project identity, user identity, and compute identities. If you opt to use identity-based authentication to the storage account, you will need to ensure you grant your users appropriate permissions on the storage account. When a user does something like create a prompt flow, the user’s identity context is used to access the file endpoint in the storage account to create a file share that will contain prompt flows they create.

This typically includes:

  • Storage Blob Data Contributor
  • Storage File Data Privileged Contributor

If you’re spinning up a managed-online endpoint, you will need to grant that managed identity (if using an UMI, these are automatically added if using an SMI):

  • Storage Blob Data Reader
  • Storage Blob Data Contributor

The last thing I want to mention is specific to if you creating Private Endpoints for your default storage account (which for a secure AI Foundry, you should be). Ensure you grant each AI Foundry project managed identity Reader over the private endpoints (both file and blob) for the default storage account. This is required when previewing data from the AI Foundry Portal for use cases like uploading data for fine-tuning a model. I’m not sure where this requirement comes from, but if you don’t include it, your users will run into weird permission errors when attempting to upload data to the default storage account from within AI Foundry.

Let’s sum things up:

  • The default storage account configuration is critical to successful use of the product. Muck up authorization and prepared for pain.
  • Use identity-based authentication for connectivity to the default storage account. This will ensure auditability for who accesses what.
  • Use Entra ID authentication for your AI Foundry connections wherever possible. This will give you auditability and the ability to scope permissions via Azure RBAC.
  • If you using identity-based authentication, ensure you put in place the right permissions for the hub/project (done automatically if using SMI), user, and compute identities.
  • If you’re having trouble with users uploading data for fine-tuning via AI Foundry, your project is probably missing the read permissions over the default storage account private endpoints.
  • If you’re having trouble provisioning a managed online endpoint that is using an UMI, you are probably missing permissions on the default storage account.

That wraps up this post. Thanks folks!

AI Foundry – The Basics

This is a part of my series on AI Foundry:

Happy New Year! Over the last few months of 2024, I was buried in AI Foundry (FKA Azure AI Studio. Hey marketing needs to do something a few times a year.) proof-of-concepts with my best buddy Jose. The product is very new, very powerful, but also very complex to get up and running in a regulated environment. Through many hours labbing scenarios out and troubleshooting with customers, I figured it was time to share some of what I learned.

So what is AI Foundry? If you want the Microsoft explanation of what it is, read the documentation. Here you’ll get the Matt Felton opinion on what it is (god help you). AI Foundry is a toolset intended to help AI Engineers build Generative AI applications. It allows them to interact with the LLMs (large language models) and build complex workflows (via Prompt Flows) in a no code (like Chat Playground) or low code (prompt flow interface) environment. As you can imagine, this is an attractive tool to get the people who know Generative AI quick access to the LLMs so use cases can be validated before expensive development cycles are spent building the pretty front-end and code required to make it a real application.

Before we get into the guts of the use cases Jose and I have run into, I want to start with the basics of how the hell this service is setup. This will likely require a post or two, so grab your coffee and get ready for a crash course.

AI Foundry is built on top of AML (Azure Machine Learning). If you’re ever built out a locked down AML instance, you have some understanding of the many services that work together to provide the service. AI Foundry inherits these components and provides a sleek user interface on top and typically requires additional resources like Azure OpenAI and AI Search (for RAG use cases). Like AML there are lots and lots of pieces that you need to think about, plan for, and implement in the correct manner to make the secured instance of AI Foundry work.

One specific feature of AML plays an important role within AI Foundry and that is the concept of a hub workspace. A hub is an AML workspace that centrally manages the security, networking, compute resources, and quota for children AML workspaces. These child AML workspaces are referred to as projects. The whole goal of the hub is to make it easier for your various business units to do the stuff they need to do with AML/AI Foundry without having to manage the complex pieces like the security and networking. My guidance would be to think about giving each business unit a Foundry Hub that they can group projects of similar environment (prod or non-prod) and data sensitivity.

General relationship between AI Foundry Hub and AI Foundry Projects

Ok, so you get the basic gist of this. When you deploy an AI Foundry instance, behind the scenes an AML workspace designated as a hub is created. Each project you create is a “child” AML workspace of the hub workspace and will inherit some resources from the hub. Now that you’re grounded in that basic piece, let’s talk about the individual components.

The many resources involved with a secured AI Foundry instance

As you can see in the above image, there are a lot of components of this solution that you will likely use if you want to deploy an AI Foundry instance that has the necessary security and networking controls. Let me give you a quick and dirty explanation of each component. I’ll dive deeper into the identity, authorization, and networking aspects of these components in future posts.

  • Managed identities
    • There are lots of managed identities in use with this product. There is a managed identity for the hub, a managed identity for the project, and managed identities for the various compute. One of the challenges of AI Foundry is knowing which managed identity (and if not a managed identity, the user’s identity) is being used to access a specific resource.
  • Azure Storage Account
    • Just like in AML, there is a default storage account associated with the workspace. Unlike with a traditional AML workspace you may be familiar with, the hub feature allows all project workspaces to leverage the same storage account. The storage account is used by the workspaces to store artifacts like logs in blob storage and artifacts like prompt flows in file storage. The hub and projects isolate their data to specific containers (for blob) and folders (for file) with Azure ABAC (holy f*ck a use case for this feature finally) setup such that the managed identities for the workspaces can only access containers/folders for data related to their specific workspace.
  • Azure Key Vault
    • The Azure Key Vault will store any keys used for connections created from within the AI Foundry project. This could be keys for the default storage account or keys used for API access to models you deploy from the model catalog.
  • Azure Container Registry
    • While this is deemed optional, I’d recommend you plan on deploying it. When deploy a prompt flow there uses certain functionality to the managed compute, the container image used isn’t the default runtime and an ACR instance will be spun up automatically without all the security controls you’ll likely want.
  • Azure OpenAI Service
    • This is used for deployment of OpenAI and some Microsoft chat and embedding models
  • Azure AI Service
    • This can be used as an alternative to the Azure OpenAI Service. It has some additional functionality beyond just hosting the models such as speed and the like.
  • Azure AI Search
    • This will be used for anything RAG related. Most likely you’ll see it used with the Chat With Your Data feature of the Chat Playground.
  • Managed Network
    • This is used to host the compute instances, serverless endpoints and managed online endpoints spun up for compute operations within AI Foundry. I’ll do a deep dive into networking within the service in a future post.
  • Azure Firewall
    • If building a secured instance, you’re going to use a managed virtual network that is locked down to all Internet access via outbound rules. Under the hood an Azure Firewall instance (standard for almost all use cases) will be spun up in the managed virtual network. You interact with it through the creation of the outbound rules and can’t directly administer it. However, you will be paying for it.
  • Role Assignments
    • So so so many role assignments. I’ll cover these in a future post.
  • Azure Private DNS
    • Used heavily for interacting with the AI Foundry instance and models/endpoints you deploy. I’ll cover this in an upcoming post.

Are you frightened yet? If not you should be! Don’t worry though, over this series I’ll walk you through the pain points of getting this service up and running. Once you get past the complex configuration, it’s a crazy valuable service that you’ll have a high demand for from your business units.

In the next post I’ll walk through the complexities of authorization within AI Foundry.

Azure Authorization – Azure RBAC Delegation

This is part of my series on Azure Authorization.

  1. Azure Authorization – The Basics
  2. Azure Authorization – Azure RBAC Basics
  3. Azure Authorization – actions and notActions
  4. Azure Authorization – Resource Locks and Azure Policy denyActions
  5. Azure Authorization – Azure RBAC Delegation
  6. Azure Authorization – Azure ABAC (Attribute-based Access Control)

Hello again fellow geeks!

I typically avoid doing multiple posts in a month for sanity purposes, but quite possibly one of the most awesome Azure RBAC features has gone generally available under the radar. Azure RBAC Delegation is going to be one of those features that after reading this post, you will immediately go and implement. For those of you coming from AWS, this is going to be your IAM Permissions Boundary-like feature. It addresses one of the major risks to Azure’s authorization model and fills a big gap the platform has had in the authorization space.

Alright, enough hype let’s get to it.

Before I dive into the new feature, I encourage you to read through the prior posts in my Azure Authorization series. These posts will help you better understand how this feature fits into the bigger picture.

As I covered in my Azure RBAC Basics post, only security principals with sufficient permissions in the Microsoft.Authorization resourced provider can create new Azure RBAC Role Assignments. By default, once a security principal is granted that permission it can then assign any Azure RBAC Role to itself or any other security principal within the Entra ID tenant to it’s immediate scope of access and all child scopes due to Azure RBAC’s inheritance model (tenant root -> management group -> subscription -> resource group -> resource). This means a human assigned a role with sufficient permissions (such as an IAM support team) could accidentally / maliciously assign another privileged role to themselves or someone else and wreak havoc. Makes for a not good late night for anyone on-call.

While the human risk exists, the greater risk is with non-human identities. When an organization passes beyond the click-ops and imperative (az cli, PowerShell, etc) stage and moves on to the declarative stage with IaC (infrastructure-as-code), delivery of those IaC templates (ARM, Bicep, Terraform) are put through a CI/CD (continuous integration / continuous delivery) pipeline. To deploy the code to the cloud platform, these pipeline’s compute need an identity to authenticate to the cloud management plane. In Azure, this is accomplished through a service principal or managed identity. That identity must be granted the specific permissions it needs which is done through Azure RBAC Role assignments.

In the ideal world, as much as can be is put through the pipeline, including role assignments. This means the pipeline needs to be able to create Azure RBAC Role assignments which means it needs permissions for the Microsoft.Authorization resource provider (or relevant built-in roles with Owner being common).

To mitigate the risk of one giant blast radius with one pipeline, organizations will often create multiple pipelines with separate pipelines for production and non-production, pipelines for platform components (networking, logging, etc) and others for the workload (possibly one for workload components such as an Event Hub and another separate pipeline for code). Pipelines will be given separate security principals with permissions at different scopes with Central IT typically owning pipelines at higher scopes (management groups) and business units owning pipelines at lower scopes (subscription or resource group).

Example of multiple pipelines and identities

At the end of the day you end up with lots of pipelines and lots of non-humans that hold the Owner role at a given scope. This multiplies the risk of any one of those pipeline identities being misused to grant someone or something permissions beyond what it needs. Organizations typically mitigate through this automated and manual gates which can get incredibly complex at scale.

This is where Azure RBAC Delegation really shines. It allows you to wrap restrictions around how a security principal can exercise its Microsoft.Authorization permissions. These restrictions can include:

  • Restricting to a specific set of Azure RBAC Roles
  • Restricting to a specific security principal type (user, service principal, group)
  • Restricting whether it can create new assignments, update existing assignments, or delete assignments
  • Restricting it to a specific set of security principals (specific set of groups, users, etc)

So how does it do it? Well if you read my prior post, you’ll remember I mentioned the new property included in Azure RBAC Role assignments called conditions. RBAC Delegation uses this new property to wrap those restrictions around the role. Let’s look at an example role using one of the new built-in role Microsoft has introduced called the Role Based Access Control Administrator.

Let’s take a look at the role definition first.

{
    "id": "/subscriptions/b3b7aae7-c6c1-4b3d-bf0f-5cd4ca6b190b/providers/Microsoft.Authorization/roleDefinitions/f58310d9-a9f6-439a-9e8d-f62e7b41a168",
    "properties": {
        "roleName": "Role Based Access Control Administrator",
        "description": "Manage access to Azure resources by assigning roles using Azure RBAC. This role does not allow you to manage access using other ways, such as Azure Policy.",
        "assignableScopes": [
            "/"
        ],
        "permissions": [
            {
                "actions": [
                    "Microsoft.Authorization/roleAssignments/write",
                    "Microsoft.Authorization/roleAssignments/delete",
                    "*/read",
                    "Microsoft.Support/*"
                ],
                "notActions": [],
                "dataActions": [],
                "notDataActions": []
            }
        ]
    }
}

In the above role definition, you can see that the role has been granted only permissions necessary to create, update, and delete role assignments. This is more restrictive than an Owner or User Access Administrator which have a broader set of permissions in the Microsoft.Authorization resource provider. It makes for a good candidate for a business unit pipeline role versus Owner since business units shouldn’t need to be managing Azure Policy or Azure RBAC Role Definitions. That responsibility should within Central IT’s scope.

You do not have to use this built-in role or can certainly design your own. For example, the role above does not include the permissions to manage a resource lock and this might be something you want the business unit to be able to manage. This feature is also supported for custom role. In the example below, I’ve cloned the Role Based Access Control Administrator but added additional permissions to manage resource locks.

{
    "id": "/subscriptions/b3b7aae7-c6c1-4b3d-bf0f-5cd4ca6b190b/providers/Microsoft.Authorization/roleDefinitions/dd681d1a-8358-4080-8a37-9ea46c90295c",
    "properties": {
        "roleName": "Privileged Test Role",
        "description": "This is a test role to demonstrate RBAC delegation",
        "assignableScopes": [
            "/subscriptions/b3b7aae7-c6c1-4b3d-bf0f-5cd4ca6b190b"
        ],
        "permissions": [
            {
                "actions": [
                    "Microsoft.Authorization/roleAssignments/write",
                    "Microsoft.Authorization/roleAssignments/delete",
                    "*/read",
                    "Microsoft.Support/*",
                    "Microsoft.Authorization/locks/read",
                    "Microsoft.Authorization/locks/write",
                    "Microsoft.Authorization/locks/delete"
                ],
                "notActions": [],
                "dataActions": [],
                "notDataActions": []
            }
        ]
    }
}

If I were to attempt to create a new role assignment for this custom role, I’ve given the ability to associate the role with a set of conditions.

Adding conditions to a custom role

Let’s take an example where I want a security principal to be able to create new role assignments but only for the built-in role of Virtual Machine Contributor. The condition in my role would like the below:

    (
        (
            !(ActionMatches{'Microsoft.Authorization/roleAssignments/write'})
        )
        OR 
        (
            @Request[Microsoft.Authorization/roleAssignments:RoleDefinitionId] ForAnyOfAnyValues:GuidEquals {9980e02c-c2be-4d73-94e8-173b1dc7cf3c}
            AND
            @Request[Microsoft.Authorization/roleAssignments:PrincipalType] StringEqualsIgnoreCase 'User'
        )
    )
    AND
    (
        (
            !(ActionMatches{'Microsoft.Authorization/roleAssignments/delete'})
        )
        OR 
        (
            @Request[Microsoft.Authorization/roleAssignments:PrincipalType] StringEqualsIgnoreCase 'User'
        )
    )

Yes, I know the conditional language is ugly as hell. Thankfully, you won’t have to write this yourself which I will demonstrate in a few. First, I want to walk you through the conditional language.

Azure RBAC conditional language

When using RBAC Delegation, you can associate one or more conditions to an Azure RBAC role assignment. Like most conditional logic in programming, you can combine conditions with AND and OR. Within each condition you have an action and expression. When the condition is evaluated the action is first checked for a match. In the example above I have !(ActionMatches{‘Microsoft.Authorization/roleAssignments/write’}) which will match any permission that isn’t write which will cause the condition to result in true and will allow the access without evaluating the expression below. If the action is write, then this evaluated to false and the expression is then evaluated. In the example above I have two expressions. The first expression checks whether the request I’m making to create a role assignment is for the role definition id for the Virtual Machine Contributor. The second expression checks to see if the principal type in the role assignment is of type user. If either of these evaluate to false, then access is denied. If it evaluates to true, it moves on to the next condition which limits the security principal to deleting role assignments assigned to users.

No, you do not need to learn this syntax. Microsoft has been kind enough to provide a GUI-based conditions tool which you can use to build your conditions and view as code you can include in your templates.

GUI-based condition builder

Pretty cool right? The public documentation walks through a number of different scenarios where you can use this, so I’d encourage you to read it to spur ideas beyond the pipeline example I have given in this post. However, the real value out of this feature is stricter control of what how those pipeline identities can affect authorization.

So what are you takeaways for this post?

  • Get this feature implemented YESTERDAY. This is minimal overhead with massive security return.
  • Use the GUI-based condition builder to build your conditions and then copy the JSON into your code.
  • Take some time to learn the conditional syntax. It’s used in other places in Azure RBAC and will likely continue to grow in usage.
  • Start off using the built-in Role Based Access Control Administrator role. If your business units need more than what is in there (such as managing resource locks) clone it and add those permissions.

Well folks, I hope you got some value out of this post. Add a to-do to get this in place in your Azure estate as soon as possible!