Azure Authorization – Azure ABAC (Attribute-based Access Control)

Posted on June 9, 2025 by mattfeltonma

This is part of my series on Azure Authorization.

Welcome back fellow geeks.

I do a lot of learning and educational sessions with my customer base. The volume pretty much demands reusable content which means I gotta build decks and code samples… and worse maintain them. The maintenance piece typically consists of me mentally promising myself to update the content and kicking the can down the road for a few months. Eventually, I get around to updating the content.

This month I was doing some updates to my content around Azure Authorization and decided to spend a bit more time with Azure ABAC (Attribute-based access control). For those of you unfamiliar with Azure ABAC, well it’s no surprise because the use cases are so very limited as of today. Limited as the use cases are, it’s a worthwhile functionality to understand because Microsoft does use it in its products and you may have use cases where it makes sense.

The Dream of ABAC

Let’s first touch briefly on the differences between (RBAC) role-based access control and (ABAC) attribute-based access control. Attribute-based access control has been the dream for the security industry for as long as I can remember. RBAC has been the predominant authorization mechanism in a majority of applications over the years. The challenge with RBAC is it has typically translated to basic group membership where an application authorizes a user solely on whether or not the user is in a group. Access to these groups would typically come through some type of request for membership and implementation by a central governance team. Those processes have tended to be not super user friendly and the access has tended to be very course-grained.

ABAC meanwhile promised more fine-grained access based upon attributes of the security principal, resource, or whatever your mind can dream up. Sounds awesome right? Well it is, but it largely remained a dream in the mainstream world with a few attempts such as Windows Dynamic Access Control (Before you comment, yeah I get you may have had some cool apps doing this stuff years ago and that is awesome, but let’s stick with the majority). This began to change when cloud came around with the introduction of more modern protocols and standards such as SAML, OIDC, and OAuth. These protocols provide more flexibility with how the identity provider packages attributes about the user in the token delivered to the service provider/resource provider/what have you.

When it came to the Azure cloud, Microsoft went the traditional RBAC path for much of the platform. User or group gets placed in Azure RBAC role and user(s) gets access. I explain Azure RBAC in my other posts on RBAC. There is a bit of flexibility on the Entra ID side for the initial access token via Entra ID Conditional Access, but RBAC in the Azure realm. This was the story for many years of Azure.

In 2021 Microsoft decided something more flexible was needed and introduced Azure ABAC. The world rejoiced… right? Nah, not really. While the introduction of ABAC was awesome, its scope of use was and still is extremely limited. As of the date of this blog, ABAC is only usable for Azure Storage blob and queue operations. All is not lost though, there are some great use cases for this feature so it’s important to understand how it works.

How does ABAC work?

Alright, history lesson and complaining about limited scope aside, let’s now explore how the feature works.

ABAC is facilitated through an additional property on Azure RBAC Role Assignment resources. I’m going to assume you understand the ins and out of role assignments. If you don’t, check out my prior post on the topic. In its most simple sense, an Azure RBAC role assignment is the association of a role to a security principal granting that principal the permissions defined in the role over a particular scope of resources. As I’ve covered previously, role assignments are an Azure resource that have defined sets of properties. The properties we care about for the scope of this discussion are the conditionVersion and condition properties. The conditionVersion property will always have a value of 2.0 for now. The condition property is where we work our ABAC magic.

The condition property is made up of a series of conditions which each consist of an action and one or more expressions. The logic for conditions is kinda weird, so I’m walk you through it using some of the examples from documentation as well as complex condition I throw together. First, let’s look at the general structure.

**Structure of conditions used in ABAC**

In the above image you can see the basic building blocks of a condition. Looks super confusing and complicated right? I know it did to me at first. Thankfully, the kind souls who write the public documentation broke this down in a more friendly programming-like way.

**Far more simple explanation of conditions**

In each condition property we first have the action line where the logic looks to see if the action being performed by the security principal doesn’t (note the exclamation point which negates whats in the parentheses) match the action we’re applying the conditions to. You’ll commonly see a line like:

!(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})

This line is saying if the action isn’t blobs/read (which would be data plane call to read the contents of the blob) then the line should evaluate to true. If it evaluates to true, then the access is allowed and the expressions are not evaluated any further.

After this line we have the expression which is only evaluated when the first line evaluates to false (which in the example I just covered would mean the security principal is trying to read the content of a blob). The expressions support four categories of what Microsoft refers to as condition features. There are currently four features in various states of GA (general availability) and preview (refer to the documentation for those details). These four categories include:

Requests
Environment
Resource
Principal (security principal)

These four categories give you a ton of flexibility. Requests covers the details of the request to storage, for example such as limiting a user to specific blob prefixes based on the prefix within the request. Environment can be used to limit the user to accessing the resource from a specific Private Link Private Endpoint over Private Link in general (think defense-in-depth here). The resource feature exposes properties of the resource being accessed, which I find the most flexible thing to be blob index tags. Lastly, we have security principal and this is where you can muck around with custom security attributes in Entra ID (very cool feature if you haven’t touched it).

In a given condition we can have multiple expressions and within the condition property we can string together multiple conditions with AND and OR logic. I’m a big believer in going big or going home, so let’s take a look at a complex condition.

Diving into the Deep End

Let’s say I have a whole bunch of data I need to make available via a blobs in an Azure Storage Account. I have a strict requirement to use a single storage account and the blobs I’m going to store have different data classifications denoted by a blob index tag key named access_level. Blobs without this key are accessible by everyone while blobs classified high, medium, or low are only accessible by users with approval for the same or higher access levels (example: user with high access level can access high, medium, low, and data with no access level). Lastly, I have a requirement that data at the high access level can only be accessed during business hours.

I use a custom security attribute in Entra ID called accesslevel under an attribute set named organization to denote a user’s approved access level.

Here is how that policy would break down.

My first condition is built to allow users to read any blobs that don’t have the access_level tag.

# Condition that allows users within scope of the assignment access to documents that do not have an access level tag
(
  (
    # If the action being performed doesn't match blobs/read then result in true and allow access
    !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})
  )
  OR 
  (
    # If the blob doesn't have a blob index tag with a key of access_level then allow access
    NOT @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags&$keys$&] ForAnyOfAnyValues:StringEquals {'access_level'}
  )
)

If the blob does have an access tag, I want to start incorporating my logic. The next condition I include allows users with the accesslevel security attribute set to high to read blobs with a blob index tag of access_level equal to low or medium. I also also allow them to read blobs tagged with high if it’s between 9AM and 5PM EST.

# Condition that allows users within scope of the assignment to access medium and low tagged data if they have a custom 
# security attribute of accesslevel set to high. High data can also be read within working hours
OR
(
 (
   # If the action being performed doesn't match blobs/read then result in true and allow access
   !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})
 )
 OR 
 (
   # If the blob has an index tag of access_level with a value of medium or low allow the user access if they have a custom security
   # attribute of organization_accesslevel set to high
   @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags:access_level<$key_case_sensitive$>] ForAnyOfAnyValues:StringEquals {'medium', 'low'}
   AND
   @Principal[Microsoft.Directory/CustomSecurityAttributes/Id:organization_accesslevel] StringEquals 'high'
 )
 OR
 (
   # If the blob has an index tag of access_level with a value of high allow the user access if they have a custom security
   # attribute of organization_accesslevel set to high and it's within working hours
   @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags:access_level<$key_case_sensitive$>] ForAnyOfAnyValues:StringEquals {'high'}
   AND
   @Principal[Microsoft.Directory/CustomSecurityAttributes/Id:organization_accesslevel] StringEquals 'high'
   AND
   @Environment[UtcNow] DateTimeGreaterThan '2025-06-09T12:00:00.0Z'
   AND
   @Environment[UtcNow] DateTimeLessThan '2045-06-09T21:00:00.0Z'
 )
)

Next up is users with medium access level. These users are granted access to data tagged medium or low.

# Condition that allows users within scope of the assignment to access medium and low tagged data if they have a custom 
# security attribute of accesslevel set to medium
OR
(
  (
    # If the action being performed doesn't match blobs/read then result in true and allow access
    !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})
  )
  OR 
  (
    # If the blob has an index tag of access_level with a value of medium or low allow the user access if they have a custom security
    # attribute of organization_accesslevel set to medium
    @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags:access_level<$key_case_sensitive$>] ForAnyOfAnyValues:StringEquals {'medium', 'low'}
    AND
    @Principal[Microsoft.Directory/CustomSecurityAttributes/Id:organization_accesslevel] StringEquals 'medium'
 )
)

Finally, I allow users with low access level to access data tagged as low.

# Condition that allows users within scope of the assignment to access low tagged data if they have a custom 
# security attribute of accesslevel set to low
OR
(
 (
   # If the action being performed doesn't match blobs/read then result in true and allow access
   !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'} AND !SubOperationMatches{'Blob.List'})
 )
 OR 
 (
   # If the blob has an index tag of access_level with a value of low allow the user access if they have a custom security
   # attribute of organization_accesslevel set to low
   @Resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags:access_level<$key_case_sensitive$>] ForAnyOfAnyValues:StringEquals {'low'}
   AND
   @Principal[Microsoft.Directory/CustomSecurityAttributes/Id:organization_accesslevel] StringEquals 'low'
 )
)

Notice how I separated each condition using OR. If the first condition resolves to false, then the next condition is evaluated until all access is granted or all conditions are exhausted. Neat right?

Summing it up

So why should you care about this if its use case is so limited? Well, you should care because that is ABAC’s use case today, and it would be expanded in the future. Furthermore, ABAC allows you to be more granular in how you grant access to data in Azure Storage (again, blob or queue only). You likely have use cases where this can provide another layer of security to further constrain a security principal’s access. You’ll also see these conditions used in Microsoft’s products such as AI Foundry.

The other reason it’s helpful to understand this language used for the condition, is conditions are expanding into other services such as Azure RBAC Delegation (which if you aren’t using you should be). While the language can be complex, it does make sense if you muck around with it a bit.

A final bit of guidance here, don’t try to write conditions by hand. Use the visual builder in the Azure Portal as seen below. It will help you get some basic conditions in place that you can further modify directly via the code view.

Next time you’re locking down an Azure storage account, think about whether or not you can further restrict humans and non-humans alike based on the attributes discussed today. The main places I’ve seen this used are for user profiles, further restricting user access to specific subsets of data (similar to the one I walked through above), or even adding an additional layer of network security baked directly into the role assignment itself.

See you next post!

Deploying Resources Across Multiple Azure tenants

Posted on March 6, 2025 by mattfeltonma

Hello fellow geeks! Today I’m going to take a break from my AI Foundry series and save your future self some time by walking you through a process I had to piece together from disparate links, outdated documentation, and experimentation. Despite what you hear, I’m a nice guy like that!

The Problem

Recently, I’ve been experimenting with AVNM (Azure Virtual Network Manager) IPAM (IP address management) solution which is currently in public preview. In the future I’ll do a blog series on the product, but today I’m going to focus on some of the processes I went through to get a POC (proof-of-concept) working with this feature across two tenants. The scenario was a management and managed tenant concept where the management tenant is the authority for the pools of IP addresses the managed tenant can draw upon for the virtual networks it creates.

Let’s first level set on terminology. When I say Azure tenant, what I’m really talking about is the Entra ID tenant the Microsoft Azure subscriptions are associated with. Every subscription in Azure can be associated with only one Entra ID tenant. Entra ID provides identity and authentication services to associated Azure subscriptions. Note that I excluded authorization, because Azure has its own authorization engine in Azure RBAC (role-based access control).

**Relationship between Entra ID tenant and Azure resources**

Without deep diving into AVNM, its IPAM feature uses the concepts of “pools” which are collections of IP CIDR blocks. Pools can have a parent and child relationship where one large pool can be carved into smaller pools. Virtual networks in the same regions as the pool can be associated with these pools (either before or after creation of the virtual network) to draw down upon the CIDR block associated with the range. You also have the option of creating an object called a Static CIDR which can be used to represent the consumption of IP space on-premises or another cloud. For virtual networks, as resources are provisioned into the virtual networks IPAM will report how many of the allocated IP addresses in a specific pool are being used. This allows you to track how much IP space you’ve consumed across your Azure estate.

My primary goal in this scenario was to create a virtual network in TenantB that would draw down on an AVNM address pool in TenantA. This way I could emulate a parent company managing the IP allocation and usage across its many child companies which could be spread across multiple Azure tenants. To this I needed to

Create an AVNM instance in TenantA
Setup the cross-tenant AVNM feature in both tenants.
Create a multi-tenant service principal in TenantB.
Create a stub service principal in TenantA representing the service principal in TenantB.
Grant the stub service principal the IPAM Pool User Azure RBAC role.
Create a new virtual network in TenantB and reference the IPAM pool in TenantA.

My architecture would similar to image below.

With all that said, I’m now going to get into the purpose of this post which is focusing steps 3, 4, and 6.

Multi-Tenant Service Principals

Service principals are objects in Entra ID used to represent non-human identities. They are similar to an AWS IAM user but cannot be used for interactive login. The whole purpose is non-interactive login by a non-human. Yes, even the Azure Managed Identity you’ve been using is a service principal under the hood.

Unlike Entra ID users, service principals can’t be added to another tenant through the Entra B2B feature. To make a service principal available across multiple tenants you need to create what is referred to as a multi-tenant service principal. A multi-tenant service principal exist has an authoritative tenant (I’ll refer to this as the trusted tenant) where the service principal exists as an object with a credential. The service principal has an attribute named appid which is a unique GUID representing the app across all of Entra. Other tenants (I’ll refer to these as trusting tenants) can then create a stub service principal in their tenant by specifying this appid at creation. Entra ID will represent this stub service principal in the trusted tenant as an Enterprise Application within Entra.

**Multi-Tenant Service Principal in Trusted Tenant**

For my use case I wanted to have my multi-tenant service principal stored authoritatively TenantB (the managed tenant) because that is where I would be deploying my virtual network via Terraform. I had an existing service principal I was already using so I ran the command below to update the existing service principal to support multi-tenancy. The signInAudience attribute is what dictates whether a service principal is single-tenant (AzureADMyOrg) or multi-tenant (AzureADMultipleOrgs).

az login --tenant <TENANTB_ID>
az ad app update --id "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --set signInAudience=AzureADMultipleOrgs

Once my service principal was updated to a multi-tenant service principal I next had to provision it into TenantA (management tenant) using the command below.

az login --tenant <TENANTA_ID>
az ad sp create --id "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX"

The id parameter in each command is the appId property of the service principal. By creating a new service principal in TenantA with the same appId I am creating the stub service principal for my service principal in TenantB.

Many of the guides you’ll find online will tell you that you need to grant Admin Consent. I did not find this necessary. I’m fairly certain it’s not necessary because the service principal does not need any tenant-wide permissions and won’t be acting on behalf of any user. Instead, it will exercise its direct permissions against the ARM API (Azure Resource Manager) based on the Azure RBAC role assignments created for it.

Once these commands were run, the service principal appeared as an Enterprise Application in TenantA. From there I was able to log into TenantA and create an RBAC role assignment associating the IPAM Pool user role to the service principal.

Creating The New VNet in TenantB with Terraform… and Failing

At this point I was ready to create the new virtual network in TenantB with an address space allocated from an IPAM pool in TenantA. Like any sane human being writing Terraform code, my first stop was to the reference document for the AzureRm provider. Sadly my spirits were quickly crushed (as often happens with that provider) the provider module (4.21.1) for virtual networks does not yet support the ipamPoolPrefixAllocations property. I chatted with the product group and support for it will be coming soon.

When the AzureRm provider fails (as it feels like it often does with any new feature), my fallback was to AzApi provider. Given that the AzApi is a very light overlay on top of the ARM REST API, I was confident I’d be able to use it with the proper ARM REST API version to create my virtual network. I wrote my code and ran my terraform apply only to run into an error.

Forbidden({"error":{"code":"LinkedAuthorizationFailed","message":"The client has permission to perform action 'Microsoft.Network/networkManagers/ipamPools/associateResourcesToPool/action' on scope '/subscriptions/97515654-3331-440d-8cdf-XXXXXXXXXXXX/resourceGroups/rg-demo-avnm/providers/Microsoft.Network/virtualNetworks/vnettesting', however the current tenant '6c80de31-d5e4-4029-93e4-XXXXXXXXXXXX' is not authorized to access linked subscription '11487ac1-b0f2-4b3a-84fa-XXXXXXXXXXXX'."}})

When performing cross-tenant activities via the ARM API, the platform needs to authenticate the security principal to both Entra tenants. From a raw REST API call this can be accomplished by adding the x-ms-authorization-auxiliary header to the headers in the API call. In this header you include a bearer token for the second Entra ID tenant that you need to authenticate to.

Both the AzureRm and AzApi providers support this feature through the auxiliary_tenant_ids property of the provider. Passing that property will cause REST calls to be made to the Entra ID login points for each tenant to obtain an access token. The tenant specified in the auxiliary_tenant_ids has its bearer token passed in the API calls in the x-ms-authorization-auxiliary header. Well, that’s the way it’s supposed to work. However, after some Fiddler captured I noticed it was not happening with AzApi v2.1.0 and 2.2.0. After some Googling I turned up this Github repository issue where someone was reporting this as far back as February 2024. It was supposed resolved in v1.13.1, but I noticed a person posting just a few weeks ago that it was still broken. My testing also seemed to indicate it is still busted.

What to do now? My next thought was to use the AzureRm provider and pass an ARM template using the azurerm_resource_group_deployment module. I dug deep into the recesses of my brain to surface my ARM template skills and I whipped up a template. I typed in terraform apply and crossed my fingers. My Fiddler capture showed both access tokens being retrieved (YAY!) and being passed in the underlining API call, but sadly I was foiled again. I had forgotten that ARM templates to not support referencing resources outside the Entra ID tenant the deployment is being pushed to. Foiled again.

My only avenue left was a REST API call. For that I used the az rest command (greatest thing since sliced bread to hit ARM endpoints). Unlike PowerShell, the az CLI does not support any special option for auxiliary tenants. Instead, I need to run az login to each tenant and store the second tenant’s bearer token in a variable.

az login --service-principal --username "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --password "CLIENT_SECRET" --tenant "<TENANTB_ID>"

az login --service-principal --username "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --password "CLIENT_SECRET" --tenant "<TENANTA_ID>"

auxiliaryToken=$(az account get-access-token \
--resource=https://management.azure.com/ \
--tenant "<TENANTA_ID>" \
--query accessToken -o tsv)

Once I had my bearer tokens, the next step was to pass my REST API call.

az rest --method put \
--uri "https://management.azure.com/subscriptions/97515654-3331-440d-8cdf-XXXXXXXXXXXX/resourceGroups/rg-demo-avnm/providers/Microsoft.Network/virtualNetworks/vnettesting?api-version=2022-07-01" \
--headers "x-ms-authorization-auxiliary=Bearer ${auxiliaryToken}" \
--body '{
"location": "centralus",
"properties": {
"addressSpace": {
"ipamPoolPrefixAllocations": [
{
"numberOfIpAddresses": "100",
"pool": {
"id": "/subscriptions/11487ac1-b0f2-4b3a-84fa-XXXXXXXXXXXX/resourceGroups/rg-avnm-test/providers/Microsoft.Network/networkManagers/test/ipamPools/main"
}
}
]
}
}
}'

I received a 200 status code which meant my virtual network was created successfully. Sure enough the new virtual network in TenantB and in TenantA I saw the virtual network associated to the IPAM pool.

Summing It Up

Hopefully the content above saves someone from wasting far too much time trying to get cross tenant stuff to work in a non-interactive manner. While my end solution isn’t what I’d prefer to do, it was my only option due to the issues with the Terraform providers. Hopefully, the issue with the Az Api provider is remediated soon. For AVNM IPAM specifically, AzureRm providers will be here soon so the usage of Az Api will likely not be necessary.

What I hope you took out of this is a better understanding of how cross tenant actions like this work under the hood from an identity, authentication, and authorization perspective. You should also have a better understanding of what is happening (or not happening) under the hood of those Terraform providers we hold so dear.

TLDR;

When performing cross tenant deployments here is your general sequence of events:

Create a multi-tenant service principal in Tenant A.
Create a stub service principal in Tenant B.
Grant the stub service principal in Tenant B the appropriate Azure RBAC permissions.
Obtain an access token from both Tenant A and Tenant B.
Package one of the access tokens in the x-ms-authorization-auxiliary header in your request and make your request. You can use the az rest command like I did above or use another tool. Key thing is to make sure you pass it in addition to the standard Authorization header.
???
Profit!

Thanks for reading!

AI Foundry – Identity, Authentication and Authorization

Posted on January 27, 2025 by mattfeltonma

This is a part of my series on AI Foundry:

Updates:

3/17/2025 – Updated diagrams to include new identities and RBAC roles that are recommended as a minimum

Yes, I’m going to re-use the outline from my Azure OpenAI series. You wanna fight about it? This means we’re going to now talk about one of the most important (as per usual) and complicated (oh so complicated) topic in AI Foundry: identity, authentication, and authorization. If you haven’t read my prior two posts, you should take a few minutes and read through them. They’ll give you the baseline you’ll need to get the most out of this post. So put on your coffee, break out the metal clips to keep your eyes open Clockwork Orange-style, and prepare for a dip into the many ways identity, authN, and authZ are handled within the service.

As I covered in my first post Foundry is made up of a ton of different services. Each of these services plays a part in features within Foundry, some may support multiple forms of authentication, and most will be accessed by the many types of identities used within the product. Understanding how each identity is used will be critical in getting authorization right. Missing Azure RBAC role assignments is the number one most common misconfiguration (right above networking, which is also complicated as we’ll see in a future post).

Let’s start first with identity. There will generally be four types of identities used in AI Foundry. These identities will be a combination of human identities and non-human identities. Your humans will be your AI Engineers, developers, and central IT and will use their Entra ID user identities. Your non-humans will include the AI Foundry hub, project, and compute you provision for different purposes. In general, identities are used in the following way (this is not inclusive of all things, just the ones I’ve noticed):

Humans
- Entra ID Users
  - Actions within Azure Portal
  - Actions within AI Foundry Studio
    - Running a prompt flow from the GUI
    - Using the Chat Playground to send prompts to an LLM
    - Running the Chat-With-Your-Data workflow within the Chat Playground
    - Creating a new project within a hub
  - Actions using Azure CLI such as sending an inference to a managed online endpoint that supports Entra ID authentication
Non-Humans
- AI Foundry Hub Managed Identity
  - Accessing the Azure Key Vault associated with the Foundry instance to create secrets or pull secrets when AI Foundry connections are created using credentials versus Entra ID
  - Modify properties of the default Azure Storage Account such as setting CORS policies
  - Creating managed private endpoints for hub resources if a managed virtual network is used
- AI Foundry Project Managed Identity
  - Accessing the Azure Key Vault associated with the Foundry instance to create secrets or pull secrets when AI Foundry connections are created using credentials versus Entra ID
  - Creating blob containers for project where project artifacts such as logs and metrics are stored
  - Creating file share for project where project artifacts such as user-created Prompt Flow files are stored
- Compute
  - Pulling container image from Azure Container Registry when deploying prompt flows that require custom environments
  - Accessing default storage account project blob container to pull data needed to boot
  - Much much more in this category. Really depends on what you’re doing

Alright, so you understand the identities that will be used and you have a general idea of how they’ll be used to perform different tasks within the Foundry ecosystem. Let’s now talk authentication.

Authentication in Foundry isn’t too complicated (in comparison to identity and authorization). Authenticating to the Azure Portal and the Foundry Studio is always going to be Entra ID-based authentication. Authentication to other Azure resources from the Foundry is where it can get interesting. As I covered in my prior post, Foundry will typically support two methods of authentication: Entra ID and API key based (or credentials as you’ll see it often referred to as in Foundry). If at all possible, you’ll want to lean into Entra ID-based authentication whenever you access a resource from Foundry. As we’ll see in the next section around authorization, this will have benefits. Besides authorization, you’ll also get auditability because the logs will show the actual security principal that accessed the resource.

If you opt to use credential-based authentication for your connections to Azure resources, you’ll lose out in a few different areas. When credential-based authentication is used, users will access connected resources within Foundry using the keys stored in the Foundry connection object. This means the user assumes whatever permissions the key has (which is typically all data-plane permissions but could be more restrictive in instances like a SAS token). Besides the authorization challenges, you’ll also lose out on traceability. AI Foundry (and the underlining Azure Machine Learning) has some authorization (via Azure RBAC roles) that is used to control access to connections, but very little in the way auditing who exercised what connection when. For these reasons, you should use Entra ID where possible.

Ready for authorization? Nah, not yet. Before we get into authorization, it’s important to understand that these identities can be used in generally two ways: direct or indirect (on-behalf-of). For example, let’s say you run a Prompt Flow from AI Foundry interface, while the code runs on a serverless compute provisioned in a Microsoft managed network (more on that in a future post), the identity context it uses to access downstream resources is actually yours. Now if you deploy that same prompt to a managed online-endpoint, the code will run on that endpoint and use the managed identity assigned to the compute instance. Not so simple is it?

So how do you know which identity will be used? Observe my general guidance from up above. If you’re running things from the GUI, likely your identity, if you’re deploying stuff to compute likely the identity associated with the compute. The are exceptions to the rule. For example, when you attempt to upload data for fine-tuning or using the on-your-own-data feature in the Chat Playground, and your default storage account is behind a private endpoint your identity will be used to access the data, but the managed identity associated with the project is used to access the private endpoint resource. Why it needs access to the Private Endpoint? I got no idea, it just does. If you don’t, good luck to you poor soul because you’re going to have hell of time troubleshooting it.

Another interesting deviation is when using the Chat Playground Chat With Your Data feature. If you opt to add your data and build the index directly within AI Foundry, there will be a mixed usage of the user identity, AI Search managed identity (which communicates with the embedding models deployed in the AI Services or Azure OpenAI instance to create the vector representations of the chunks in the index), and AI Services or Azure OpenAI managed identity (creates index and data sources in AI Search). It can get very complex.

The image below represents most of the flows you’ll come across.

**The many AI Foundry authentication flows and identity patterns**

Okay, now authorization? Yes, authorization. I’m not one for bullshitting, so I’ll just tell you up front authorization in Foundry can be hard. It gets even harder when you lock down networking because often the error messages you will receive are the same for blocked traffic and failed authorization. The complexities of authorization is exactly why I spent so much time explaining identity and authentication to you. I wish I could tell you every permission in every scenario, but it would take many, many, posts to do that. Instead, I’d advise you to do (sometimes I fail to do this) which is RTFM (go ahead and Google that). This particular product group has made strong efforts to document required permissions, so your first stop should always be the Foundry public documentation. In some instances, you will also need to access the Azure Machine Learning documentation (again, this is built on top of AML) because there are sometimes assumptions that you’ll do just that because you should know this is a feature its inheriting from AML (yeah, not fair but it’s reality).

In general, at an absolute minimum, the permissions assigned to the identities below will get you started as of the date of this post (updated 3/17/2025).

As I covered in my prior posts, the AI Foundry Hub can use either a system-assigned or user-assigned managed identity. You won’t hear me say this often, but just use the system-assigned managed identity if you can for the hub. The required permissions will be automatically assigned and it will be one less thing for you to worry about. Using the permissions listed above should work for a user-assigned managed identity as well (this is on my backburner to re-validate).

A project will always use a system assigned managed identity. The only permission listed above that you’ll need to manually grant is Reader over the Private Endpoint for the default storage account. This is only required if you’re using private endpoint for your default storage account. There may be additional permissions required by the project depending on the activities you are performing and data you are accessing.

On the user side the permissions above will put you in a good place for your typical developer or AI engineer to use most of the features within Foundry. If you’re interacting with other resources (such as an AI Search Index when using the on-your-own-data feature) you’ll need to ensure the user is granted appropriate permissions to those resources as well (typically Search Service Contributor – management plane to list indexes and create indexes and Search Index Data Contributor – data plane create and view records within an index. If your user is fine-tuning a model that is deployed within the Azure OpenAI or AI Service instance, they may additionally need the Azure OpenAI Service Contributor role (to upload the file via Foundry for fine-tuning). Yeah, lots of scenarios and lots of varying permissions for the user, but that covers the most common ones I’ve run into.

Lastly, we have the compute identities. There is no standard here. If you’ve deployed a prompt flow to a managed identity, the compute will need the permissions to connect to the resources behind the connections (again assuming Entra-ID is configured for the connection, if using credential Azure Machine Learning Workspace Secrets Reader on the project is likely sufficient). Using a prompt flow that requires a custom environment may require an image be pushed to the Azure Container Registry which the compute will pull so it will need the Acr Pull RBAC role assignment on the registry.

Complicated right? What happens when stuff doesn’t work? Well, in that scenario you need to look at the logs (both Azure Activity Log and diagnostic logging for the relevant service such as blob, Search, OpenAI and the like). That will tell you what the user is failing to do (again, only if you’re using Entra ID for your connections) and help you figure out what needs to be added from a permissions perspective. If you’re using credentials for your connections, the most common issue with them is with the default storage account where the storage account has had the storage access keys disabled.

Here are the key things I want you to take away from this:

Know the identity being used. If you don’t know which identity is being used, you’ll never get authorization right. Use the of the downstream service logs if you’re unsure. Remember, management plane stuff in Azure Activity Log and data plane stuff in diagnostic logs.
Use Entra ID authentication where possible. Yeah it will make your Azure RBAC a bit more challenging, but you can scope the access AND understand who the hell is doing what.
RTFM where possible. Most of this is buried in the public documentation (sometimes you need to put on your Indiana Jones hat). Remember that if you don’t find it in Foundry documentation, look to Azure Machine Learning.
Use the above information as general guide to get the basic environment setup. You’ll build from that basic foundation.

Alrighty folks, your eyes are likely heavy. I hope this helps a few souls out there who are struggling with getting this product up and running. If you know me, you know I’m no fan boy, but this particular product is pretty damn awesome to get us non-devs immediately getting value from generative AI. It may take some effort to get this product running, but it’s worth it!

Thanks and see you next post!

AI Foundry – Credential vs Identity Data Stores

Posted on January 13, 2025 by mattfeltonma

This is a part of my series on AI Foundry:

Hello again folks. Today, I’m going to continue my series on AI Foundry. I’ve been scratching my head on how best to tackle this series, because the service consists of so many foundational services plumbed together into a larger solution so there is a lot to talk about. The product can be complicated when implementing it with all the security bells and whistles. Getting it right requires a solid baseline understanding of the foundational components security capabilities (such as Azure Storage, Azure Key Vault, etc) and how these components work together for the purposes of AI Foundry.

**The many components of an AI Foundry deployment**

For the purposes of this post, I’m going to focus in on Azure Storage, specifically the storage account associated with the AI Foundry Hub. I will refer to this storage account as the default storage account. As I covered in my first post, AI Foundry is built on top of Azure Machine Learning. Like Azure Machine Learning, AI Foundry uses the default storage account to store artifacts created by the AI Foundry hub and projects. This includes files for the Prompt Flows you create, files used by the compute provisioned in the managed virtual network, and other artifacts related to the functionality of the product. This storage account is shared across the AI Foundry hub and all projects created within it.

The default storage account is critical to the functionality and if you muck up the identity or networking configuration, the product simply won’t work. The errors you’ll receive won’t always indicate an obvious problem with your storage account configuration. To help you avoid mucking up the identity portion, I’m going to use this post to explain your options for identity integration with the default storage account.

AI Foundry uses workspace connection resources to connect to external resources outside of the workspace. This includes the default storage account, AOAI (Azure OpenAI Service) or AI Service instance, and the like. When you create a connection in AI Foundry, you configure how the workspace should authenticate to the resource (determined by the authType property of the connection) when called by a user. This will most commonly be either Entra ID or an API key. In the example below, you see I have a connection object for an AI Search instance set to use Entra authentication by configuring the authType to AAD.

 {
      "id": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.MachineLearningServices/workspaces/aifhaifoundryeus296/connections/connaisaifoundryeus296",
      "location": null,
      "name": "connmysearchservice",
      "properties": {
        "authType": "AAD",
        "category": "CognitiveSearch",
        "createdByWorkspaceArmId": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.MachineLearningServices/workspaces/aifhaifoundryeus296",
        "error": "Network Service does not have permission to check resource /subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.Search/searchServices/aisaifoundryeus296 details. Please consider grant Azure Machine Learning (appId: 0736f41a-0425-4b46-bdb5-1563eff02385) read or contributor access to connected resource.",
        "expiryTime": null,
        "group": "AzureAI",
        "isSharedToAll": true,
        "metadata": {
          "ApiType": "Azure",
          "ApiVersion": "2024-05-01-preview",
          "DeploymentApiVersion": "2023-11-01",
          "ResourceId": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rgaifeus296/providers/Microsoft.Search/searchServices/mysearchservice"
        },
        "peRequirement": "NotApplicable",
        "peStatus": "NotApplicable",
        "sharedUserList": [],
        "target": "https://mysearchservice.search.windows.net",
        "useWorkspaceManagedIdentity": false
      },
      "systemData": {
        "createdAt": "2025-01-12T23:19:01.8005674Z",
        "createdBy": "d34d51b2-34b4-45d9-b6a8-cc5422eb400a",
        "createdByType": "Application",
        "lastModifiedAt": "2025-01-12T23:19:01.8005674Z",
        "lastModifiedBy": "d34d51b2-34b4-45d9-b6a8-cc5422eb400a",
        "lastModifiedByType": "Application"
      },
      "tags": null,
      "type": "Microsoft.MachineLearningServices/workspaces/connections"
    }

Creating this connection allows me to use the AI Search instance within the AI Foundry hub and projects such as using it within the ChatPlayground Chat With Your Data feature. When the connection object is called, an Entra ID identity will be used. This could be the user’s identity, it could a project’s managed identity, or it could even be a managed-online endpoint’s managed identity. In all cases, the identity will be an Entra ID identity that can be authenticated to the tenant and the actions it is authorized to do are determined by its Azure RBAC assignments. It’s critical to understand that if you choose Entra ID-based authentication, you need to have proper permissions in place.

When a new AI Foundry hub is created, it will either create new storage account or integrate with an existing storage account to be used as the default storage account. During setup via the Portal, in the identity section you’ll see the option to choose credential-based or identity-based authentication when connecting to the default storage account. By default, credential-based access will be used. If you are provisioning via Terraform (which as of right now will require you to use the AzApi resource provider) you would set the properties.systemDatastoresAuthMode property to either accesskey or identity. As of the date of this blog, this property still is not documented in the REST API documentation that I could find, however, it will work when referencing it with API version Microsoft.MachineLearningServices/workspaces@2024-10-01-preview.

So why would you choose identity-based access if you have to additionally provision the relevant security principals with access via RBAC? Before I answer that, let me do a quick recap on authorization in Azure. As I cover in my series on Azure authorization, services like storage have both a management plane and data plane. While the management plane is always Entra ID-based authentication and Azure RBAC, the data plane for most services (storage included) can use either Entra ID/Azure RBAC or API keys (via Storage Access Keys and SAS tokens). Usage of any type of static key typically grants the security principal using the key complete access to the data plane. Additionally, determining who is using the key at any given time is mostly impossible. For that reason, choosing to use Entra ID/Azure RBAC should be your preference wherever possible. Entra ID will give your traceability back to the security principal that touched the resource and Azure RBAC will give you the ability to assign granular permissions across the data plane.

If you instead select credential-based authentication a few things happen. When the new AI Foundry hub is created the connections made to the default storage account will be configured to use a SAS token. Any security principal with read access to the workspace can use that connection information for the storage account from within an AI Foundry project to connect to the storage account using those credentials. This means no audibility about what user is doing what with the storage account. This goes for any connection you share across projects that use an API key. Not good.

**Default storage account configured to use credential-based authentication**

It’s worth understanding the Key Vault resource used by AI Foundry in this scenario. When selecting credential-based authentication for the default storage account, the storage access keys for the storage account are stored in the Key Vault. Both the AI Foundry hub and projects under the hub are granted access to the secrets via Key Vault access policies. Yuck and yuck. Users do not get access to the Key Vault itself. Foundry simply enables them to exercise the use of the credential via permissions over teh connection object within the Foundry hub or project. When using identity-based authentication and Entra ID for your connections, the Azure Key Vault will be used minimally (such as being used if you deploy a model from the model catalog to managed online endpoint and select key-based authentication) to none.

Hopefully at this point I’ve sold you on the benefits of using the identity-based authentication to the default storage account (and Entra ID for connected resources). As a quick recap, if you care about least privilege and audibility, you’ll choose identity-based authentication. The main consideration of choosing identity-based authentication for the default storage account is that you need to get Azure RBAC right or else shit will break. Oh yes will it break.

If you configure your AI Foundry instance with a SMI (system-assigned managed identity) for the hub and projects, the required permissions on the default storage account will be granted for these identities. This includes:

Hub identity
- Storage Blob Data Contributor
- Storage File Data Privileged Contributor
Project identity
- Storage Account Contributor
- Storage Blob Data Contributor
- Storage File Data Privileged Contributor
- Storage Table Data Contributor

If you’re nosy like I am, you’ll notice the Azure RBAC assignments for both identities for the hub and project have an ABAC condition attached (yes an actual use case!). I plan on covering ABAC conditions in depth in my authorization series, but essentially they are a way of scoping the access to an attribute of the security principal, resource, or session. Within AI Foundry, they are used to limit the managed identities to accessing the blob containers specific to their underlining AML workspace. This helps to prevent the managed identity of one project from accessing artifacts produced by another project. For example, here are the conditions associated with my hub’s managed identity:

(
 (
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action'})
  AND
  !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action'})
 )
 OR 
 (
  @Resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringStartsWithIgnoreCase '67b8ddaa-f77e-4d12-b9ca-440326274da9'
 )
)

If you opt to use a UMI (user-assigned managed identity) for the AI Foundry hub you’ll need to manually grant these permissions to the UMI prior to provisioning the hub. You should try to include these conditions.

As I mentioned earlier, there are three primary sets of identities that hit the resources for an AI Foundry. These include the hub/project identity, user identity, and compute identities. If you opt to use identity-based authentication to the storage account, you will need to ensure you grant your users appropriate permissions on the storage account. When a user does something like create a prompt flow, the user’s identity context is used to access the file endpoint in the storage account to create a file share that will contain prompt flows they create.

This typically includes:

Storage Blob Data Contributor
Storage File Data Privileged Contributor

If you’re spinning up a managed-online endpoint, you will need to grant that managed identity (if using an UMI, these are automatically added if using an SMI):

Storage Blob Data Reader
Storage Blob Data Contributor

The last thing I want to mention is specific to if you creating Private Endpoints for your default storage account (which for a secure AI Foundry, you should be). Ensure you grant each AI Foundry project managed identity Reader over the private endpoints (both file and blob) for the default storage account. This is required when previewing data from the AI Foundry Portal for use cases like uploading data for fine-tuning a model. I’m not sure where this requirement comes from, but if you don’t include it, your users will run into weird permission errors when attempting to upload data to the default storage account from within AI Foundry.

Let’s sum things up:

The default storage account configuration is critical to successful use of the product. Muck up authorization and prepared for pain.
Use identity-based authentication for connectivity to the default storage account. This will ensure auditability for who accesses what.
Use Entra ID authentication for your AI Foundry connections wherever possible. This will give you auditability and the ability to scope permissions via Azure RBAC.
If you using identity-based authentication, ensure you put in place the right permissions for the hub/project (done automatically if using SMI), user, and compute identities.
If you’re having trouble with users uploading data for fine-tuning via AI Foundry, your project is probably missing the read permissions over the default storage account private endpoints.
If you’re having trouble provisioning a managed online endpoint that is using an UMI, you are probably missing permissions on the default storage account.

That wraps up this post. Thanks folks!

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Tag Archives: authorization