Deploying Resources Across Multiple Azure tenants

Hello fellow geeks! Today I’m going to take a break from my AI Foundry series and save your future self some time by walking you through a process I had to piece together from disparate links, outdated documentation, and experimentation. Despite what you hear, I’m a nice guy like that!

The Problem

Recently, I’ve been experimenting with AVNM (Azure Virtual Network Manager) IPAM (IP address management) solution which is currently in public preview. In the future I’ll do a blog series on the product, but today I’m going to focus on some of the processes I went through to get a POC (proof-of-concept) working with this feature across two tenants. The scenario was a management and managed tenant concept where the management tenant is the authority for the pools of IP addresses the managed tenant can draw upon for the virtual networks it creates.

Let’s first level set on terminology. When I say Azure tenant, what I’m really talking about is the Entra ID tenant the Microsoft Azure subscriptions are associated with. Every subscription in Azure can be associated with only one Entra ID tenant. Entra ID provides identity and authentication services to associated Azure subscriptions. Note that I excluded authorization, because Azure has its own authorization engine in Azure RBAC (role-based access control).

Relationship between Entra ID tenant and Azure resources

Without deep diving into AVNM, its IPAM feature uses the concepts of “pools” which are collections of IP CIDR blocks. Pools can have a parent and child relationship where one large pool can be carved into smaller pools. Virtual networks in the same regions as the pool can be associated with these pools (either before or after creation of the virtual network) to draw down upon the CIDR block associated with the range. You also have the option of creating an object called a Static CIDR which can be used to represent the consumption of IP space on-premises or another cloud. For virtual networks, as resources are provisioned into the virtual networks IPAM will report how many of the allocated IP addresses in a specific pool are being used. This allows you to track how much IP space you’ve consumed across your Azure estate.

AVNM IPAM Resources

My primary goal in this scenario was to create a virtual network in TenantB that would draw down on an AVNM address pool in TenantA. This way I could emulate a parent company managing the IP allocation and usage across its many child companies which could be spread across multiple Azure tenants. To this I needed to

  1. Create an AVNM instance in TenantA
  2. Setup the cross-tenant AVNM feature in both tenants.
  3. Create a multi-tenant service principal in TenantB.
  4. Create a stub service principal in TenantA representing the service principal in TenantB.
  5. Grant the stub service principal the IPAM Pool User Azure RBAC role.
  6. Create a new virtual network in TenantB and reference the IPAM pool in TenantA.

My architecture would similar to image below.

Multi-Tenant AVNM IPAM Architecture

With all that said, I’m now going to get into the purpose of this post which is focusing steps 3, 4, and 6.

Multi-Tenant Service Principals

Service principals are objects in Entra ID used to represent non-human identities. They are similar to an AWS IAM user but cannot be used for interactive login. The whole purpose is non-interactive login by a non-human. Yes, even the Azure Managed Identity you’ve been using is a service principal under the hood.

Unlike Entra ID users, service principals can’t be added to another tenant through the Entra B2B feature. To make a service principal available across multiple tenants you need to create what is referred to as a multi-tenant service principal. A multi-tenant service principal exist has an authoritative tenant (I’ll refer to this as the trusted tenant) where the service principal exists as an object with a credential. The service principal has an attribute named appid which is a unique GUID representing the app across all of Entra. Other tenants (I’ll refer to these as trusting tenants) can then create a stub service principal in their tenant by specifying this appid at creation. Entra ID will represent this stub service principal in the trusted tenant as an Enterprise Application within Entra.

Multi-Tenant Service Principal in Trusted Tenant

For my use case I wanted to have my multi-tenant service principal stored authoritatively TenantB (the managed tenant) because that is where I would be deploying my virtual network via Terraform. I had an existing service principal I was already using so I ran the command below to update the existing service principal to support multi-tenancy. The signInAudience attribute is what dictates whether a service principal is single-tenant (AzureADMyOrg) or multi-tenant (AzureADMultipleOrgs).

az login --tenant <TENANTB_ID>
az ad app update --id "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --set signInAudience=AzureADMultipleOrgs

Once my service principal was updated to a multi-tenant service principal I next had to provision it into TenantA (management tenant) using the command below.

az login --tenant <TENANTA_ID>
az ad sp create --id "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX"

The id parameter in each command is the appId property of the service principal. By creating a new service principal in TenantA with the same appId I am creating the stub service principal for my service principal in TenantB.

Many of the guides you’ll find online will tell you that you need to grant Admin Consent. I did not find this necessary. I’m fairly certain it’s not necessary because the service principal does not need any tenant-wide permissions and won’t be acting on behalf of any user. Instead, it will exercise its direct permissions against the ARM API (Azure Resource Manager) based on the Azure RBAC role assignments created for it.

Once these commands were run, the service principal appeared as an Enterprise Application in TenantA. From there I was able to log into TenantA and create an RBAC role assignment associating the IPAM Pool user role to the service principal.

Creating The New VNet in TenantB with Terraform… and Failing

At this point I was ready to create the new virtual network in TenantB with an address space allocated from an IPAM pool in TenantA. Like any sane human being writing Terraform code, my first stop was to the reference document for the AzureRm provider. Sadly my spirits were quickly crushed (as often happens with that provider) the provider module (4.21.1) for virtual networks does not yet support the ipamPoolPrefixAllocations property. I chatted with the product group and support for it will be coming soon.

When the AzureRm provider fails (as it feels like it often does with any new feature), my fallback was to AzApi provider. Given that the AzApi is a very light overlay on top of the ARM REST API, I was confident I’d be able to use it with the proper ARM REST API version to create my virtual network. I wrote my code and ran my terraform apply only to run into an error.

Forbidden({"error":{"code":"LinkedAuthorizationFailed","message":"The client has permission to perform action 'Microsoft.Network/networkManagers/ipamPools/associateResourcesToPool/action' on scope '/subscriptions/97515654-3331-440d-8cdf-XXXXXXXXXXXX/resourceGroups/rg-demo-avnm/providers/Microsoft.Network/virtualNetworks/vnettesting', however the current tenant '6c80de31-d5e4-4029-93e4-XXXXXXXXXXXX' is not authorized to access linked subscription '11487ac1-b0f2-4b3a-84fa-XXXXXXXXXXXX'."}})

When performing cross-tenant activities via the ARM API, the platform needs to authenticate the security principal to both Entra tenants. From a raw REST API call this can be accomplished by adding the x-ms-authorization-auxiliary header to the headers in the API call. In this header you include a bearer token for the second Entra ID tenant that you need to authenticate to.

Both the AzureRm and AzApi providers support this feature through the auxiliary_tenant_ids property of the provider. Passing that property will cause REST calls to be made to the Entra ID login points for each tenant to obtain an access token. The tenant specified in the auxiliary_tenant_ids has its bearer token passed in the API calls in the x-ms-authorization-auxiliary header. Well, that’s the way it’s supposed to work. However, after some Fiddler captured I noticed it was not happening with AzApi v2.1.0 and 2.2.0. After some Googling I turned up this Github repository issue where someone was reporting this as far back as February 2024. It was supposed resolved in v1.13.1, but I noticed a person posting just a few weeks ago that it was still broken. My testing also seemed to indicate it is still busted.

What to do now? My next thought was to use the AzureRm provider and pass an ARM template using the azurerm_resource_group_deployment module. I dug deep into the recesses of my brain to surface my ARM template skills and I whipped up a template. I typed in terraform apply and crossed my fingers. My Fiddler capture showed both access tokens being retrieved (YAY!) and being passed in the underlining API call, but sadly I was foiled again. I had forgotten that ARM templates to not support referencing resources outside the Entra ID tenant the deployment is being pushed to. Foiled again.

My only avenue left was a REST API call. For that I used the az rest command (greatest thing since sliced bread to hit ARM endpoints). Unlike PowerShell, the az CLI does not support any special option for auxiliary tenants. Instead, I need to run az login to each tenant and store the second tenant’s bearer token in a variable.

az login --service-principal --username "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --password "CLIENT_SECRET" --tenant "<TENANTB_ID>"

az login --service-principal --username "d34d51b2-34b4-45d9-b6a8-XXXXXXXXXXXX" --password "CLIENT_SECRET" --tenant "<TENANTA_ID>"

auxiliaryToken=$(az account get-access-token \
--resource=https://management.azure.com/ \
--tenant "<TENANTA_ID>" \
--query accessToken -o tsv)

Once I had my bearer tokens, the next step was to pass my REST API call.

az rest --method put \
--uri "https://management.azure.com/subscriptions/97515654-3331-440d-8cdf-XXXXXXXXXXXX/resourceGroups/rg-demo-avnm/providers/Microsoft.Network/virtualNetworks/vnettesting?api-version=2022-07-01" \
--headers "x-ms-authorization-auxiliary=Bearer ${auxiliaryToken}" \
--body '{
"location": "centralus",
"properties": {
"addressSpace": {
"ipamPoolPrefixAllocations": [
{
"numberOfIpAddresses": "100",
"pool": {
"id": "/subscriptions/11487ac1-b0f2-4b3a-84fa-XXXXXXXXXXXX/resourceGroups/rg-avnm-test/providers/Microsoft.Network/networkManagers/test/ipamPools/main"
}
}
]
}
}
}'

I received a 200 status code which meant my virtual network was created successfully. Sure enough the new virtual network in TenantB and in TenantA I saw the virtual network associated to the IPAM pool.

Summing It Up

Hopefully the content above saves someone from wasting far too much time trying to get cross tenant stuff to work in a non-interactive manner. While my end solution isn’t what I’d prefer to do, it was my only option due to the issues with the Terraform providers. Hopefully, the issue with the Az Api provider is remediated soon. For AVNM IPAM specifically, AzureRm providers will be here soon so the usage of Az Api will likely not be necessary.

What I hope you took out of this is a better understanding of how cross tenant actions like this work under the hood from an identity, authentication, and authorization perspective. You should also have a better understanding of what is happening (or not happening) under the hood of those Terraform providers we hold so dear.

TLDR;

When performing cross tenant deployments here is your general sequence of events:

  1. Create a multi-tenant service principal in Tenant A.
  2. Create a stub service principal in Tenant B.
  3. Grant the stub service principal in Tenant B the appropriate Azure RBAC permissions.
  4. Obtain an access token from both Tenant A and Tenant B.
  5. Package one of the access tokens in the x-ms-authorization-auxiliary header in your request and make your request. You can use the az rest command like I did above or use another tool. Key thing is to make sure you pass it in addition to the standard Authorization header.
  6. ???
  7. Profit!

Thanks for reading!

Azure Authorization – Resource Locks and Azure Policy denyActions

This is part of my series on Azure Authorization.

  1. Azure Authorization – The Basics
  2. Azure Authorization – Azure RBAC Basics
  3. Azure Authorization – actions and notActions
  4. Azure Authorization – Resource Locks and Azure Policy denyActions
  5. Azure Authorization – Azure RBAC Delegation
  6. Azure Authorization – Azure ABAC (Attribute-based Access Control)

Welcome back! Today I have another post in my series on Azure Authorization. In my last post I covered how permissions listed in notActions and notDataActions in an Azure RBAC Role Definition is not an explicit deny but rather a subtraction from the permissions listed in the definition in the action and nonActions section. In this post I’m going two features which help to address that gap: Resource Locks and the Azure Policy denyActions feature.

Resource Locks

Let’s start with Resource Locks. Resource Locks can be used to protect important resources from actions that could delete the resource or actions that could modify the resource. They are an Azure Resource that are administered through the Microsoft.Authorization resource provider (specifically Microsoft.Authorization/locks) and come in two forms which include delete (CanNotDelete) and modification locks (ReadOnly). Resource locks can be applied at the subscription scope, resource group scope, and resource scope. Resource locks applied at a higher scope are inherited down.

Resource locks are a wonderful defense-in-depth control because they are another authorization control in addition to Azure RBAC. A common use case might be to place a read only resource lock on an Azure Subscription which is used to house ExpressRoutes resources since once setup, ExpressRoute Circuits are relatively static from a configuration perspective. This could help mitigate the risk of an authorized user or CI/CD pipeline identity, who may have the Azure RBAC Owner or Contributor over the subscription, from mucking up the configuration purposefully or by accident and causing broad issues for your Azure landscape.

Example of a resource lock on a resource

Resource locks do have a number of considerations that you should be aware of before you go throwing them everywhere. One of the main considerations is that they can be removed by anyone with access to Microsoft.Authorization/*, which will include built-in Azure RBAC roles such as Owner and User Access Administrator. It’s common practice for organizations to grant the identity (service principal or managed identity) used by a CI/CD pipeline the Owner role in order for that role to create role assignments required for the services it is deploying (you can work around this with delegation which I will cover in my next post!). This means that anyone with sufficient permissions on the pipeline could theoretically pass some code that removes the lock.

Another consideration is resource locks only affect management plane operations (if you’re unfamiliar with this concept check out my first post on authorization). Using a Storage Account as an example, a ReadOnly lock would prevent someone from modifying the CMK used to encrypt a storage account, but it wouldn’t stop a user with sufficient permissions at the data plane from deleting a container or blob. Before applying a ReadOnly resource lock, make sure you understand some of the ramifications of blocking management plane operations. Using an App Service as an example, a ReadOnly lock would prevent you from scaling that App Service up or down.

I’m a big fan of using resource locks for mission critical pieces of infrastructure such as ExpressRoute. Outside of that you’ll want to evaluate the considerations I’ve listed above in the public documentation. If your primary goal is to prevent delete operations, then you may be better off with using the next feature, Azure Policy deny Actions.

Azure Policy denyActions

Azure Policy is Azure’s primary governance tool. For those of you coming from AWS, think of Azure Policy as if AWS IAM Policy conditions had a baby with AWS Config. I like to think of it as a way to enforce the way a resource needs to look, and if the resource doesn’t look that way, you can block the action altogether, log that it’s not compliance, or remediate it.

It’s important to understand that Azure Policy sits in line with the ARM (Azure Resource Manager) API allowing it to prevent or remediate the resource creation or modification before it ever gets processed by the API. This is bonus vs having to remediate it after the fact with something like AWS Config.

Azure Policy Architecture

In addition to the functionality above, Microsoft has added a bit of authorization logic in to Azure Policy with effect of denyAction (yes it was a bit confusing to me why authorization to do something was introduced, but you know what, it can come in handy!). As of the date of this blog, the only action that can be denied is the DELETE action. While this may seem limited, it’s an awesome improvement and addresses some of the gaps in resource locks.

First, you can use the power of the Azure Policy language to filter to a specific resource type type with a specific set of tags. This allows you to apply these rules at scale. Use case here might be I want to deny the delete action on all Log Analytics Workspaces across my entire Azure estate. Second, the policy could be assigned a management group scope. By assigning the policy at the management group scope I can prevent deletions of these resources even by a user or service principal that might hold the Owner role of the subscription. This helps me mitigate the risk present with resource locks when a CI/CD pipeline has been given the Owner role over a subscription.

An example policy could look like something below. This policy would prevent any Log Analytics Workspaces tagged with a tag named rbac with a value equal to prod from being deleted.

$policy_id=(New-AzPolicyDefinition -Name $policy_name `
    -Description "This is a policy used for RBAC demonstration that blocks the deletion of Log Analytics Workspaces tagged with a key of rbac and a value of prod" `
    -ManagementGroupName $management_group_name `
    -Mode Indexed `
    -Policy '{
            "if": {
                "allOf": [
                    {
                        "field": "type",
                        "equals": "Microsoft.OperationalInsights/workspaces"
                    },
                    {
                        "field": "tags.rbac",
                        "equals": "prod"
                    }
                ]
            },
            "then": {
                "effect": "denyAction",
                "details": {
                    "actionNames": [
                        "delete"
                    ],
                    "cascadeBehaviors": {
                        "resourceGroup": "deny"
                    }
                }
            }
        }')
    ```

Just like resource locks, Azure Policy denyActions have some considerations you should be aware of. These include limitations such as it not preventing deletion of the resources when the subscription is deleted. Full limitations can be found here.

Conclusion

What I want you to take away from this post is that there are other tools beyond Azure RBAC that can help you secure your Azure resources. It’s important to practice a defense-in-depth approach and utilize each tool where it makes sense. Some general guidance would be the following:

  • Use resource locks when you want to prevent modification and Azure Policy denyActions when you want to prevent deletion. This will allow you to do it at scale and mitigate the Owner risk.
  • Be careful where you use ReadOnly resource locks. Blocking management plane actions even when the user has the appropriate permission can be super helpful, but it can also bite you in weird ways. Pay attention to the limitations in the documentation.
  • If you know the defined state of a resource needs to look like, and you’re going to deploy it that way, look at using Azure Policy with the deny effect instead of a ReadOnly resource lock. This way can enforce that configuration in code and mitigate the Owner risk.

So this is all well and good, but neither of these features allow you to pick the SPECIFIC actions you want to deny. In an upcoming post I’ll cover the feature that is kind there, but not really, to address that ask.