Recently I was giving a customer an overview of Azure Managed Identities and came across an interesting find while building a demo environment. If you’re unfamiliar with managed identities, check out my prior series for an overview. Long story short, managed identities provide a solution for non-human identities where you don’t have to worry about storing, securing, and rotating the credentials. For those of you coming from AWS, managed identities are very similar to AWS Roles. They come in two flavors, user-assigned and system-assigned. For the purposes of this post, I’ll be focusing on system-assigned.
Under the hood, a managed identity is essentially a service principal with some orchestration on top of it. Interestingly enough, there are a number of different service principal types. Running the command below will spit back the different types of service principals that exist in your Azure AD tenant.
az ad sp list --query='[].servicePrincipalType' --all | sort | uniq
Service principal types
If you’re interested in seeing the service principals associated with managed identities in your Azure AD tenant, you can run the command below.
az ad sp list --query="[?servicePrincipalType=='ManagedIdentity']" --all
Managed identities include a property called alternativeNames which is an array. In my testing I observed two values within this array. The first value is “isExplicit=True” or “isExplicit=False” which is set to True for user-assigned managed identities and False when it’s a system-assigned managed identity. If you want to see all system-assigned managed identities for example, you can run the command below.
az ad sp list --query="[?servicePrincipalType=='ManagedIdentity' && alternativeNames[?contains(@,'isExplicit=False')]]" --all
The other value in this array is the resource id of the managed identity in the case of a user-assigned managed identity. With a system-assigned managed identity this is the resource id of the Azure resource the system-assigned managed identity is associated with.
System-assigned managed identity
So why does any of this matter? Before we get to that, let’s cover the major selling point of a system-assigned managed identity when compared to a user-assigned managed identity. With a system-assigned managed identity, the managed identity (and its service principal) share the lifecycle of the resource. This means that if you delete the resource, the service principal is cleaned up… well most of the time anyway.
Sometimes this cleanup process doesn’t happen and you’re left with orphaned service principals in your directory. The most annoying part is you can’t delete these service principals (I’ve tried everything including calls direct to the ARM API) and the only way to get them removed is to open a support ticket. Now there isn’t a ton of risk I can think of with having these orphaned service principals left in your tenant since I’m not aware of any means to access the credential associated with it. Without the credential no one can authenticate as it. Assuming the RBAC permissions are cleaned up, it’s not really authorized to do anything within Azure either. However, beyond dirtying up your directory, it’s an identity with a credential that shouldn’t be there anymore.
I wanted an easy way to identify these orphaned system-assigned managed identities so I could submit a support ticket and get it cleaned up before it started cluttering up my demonstration tenant. This afternoon I wrote a really ugly bash script to do exactly that. The script uses some of the az cli commands I’ve listed above to identify all the system-assigned managed identities and then uses az cli to determine if the resource exists. If the resource doesn’t exist, it logs the displayName property of the system-assigned managed identity to a text file. Quick and dirty, but does the job.
Orphaned system-assigned managed identities
Interestingly enough, I had a few peers run the script on their tenants and they all had some of these orphaned system-assigned managed identities, so it seems like this problem isn’t restricted to my tenants. Again, I personally can’t think of a risk of these identities remaining in the directory, but it does point to an issue with the lifecycle management processes Microsoft is using in the backend.
Azure Arc is Microsoft’s attempt to extend the Azure management plane and Azure capabilities to resources running on-premises and other clouds. As the date of this blog these resources include Windows and Linux machines and Kubernetes clusters running on-premises or in another cloud like AWS (Amazon Web Services). Integrating these resources with Azure Arc projects them into the Azure management plane making them resources of Azure.
Azure resources
Once the resources are projected into Azure many capabilities of the platform can be used to assist with managing the resources. Examples include installing VM (virtual machine) extensions Microsoft Monitoring Agent extension to deliver logs and metrics to Azure Monitor, tracking changes and the software inventory of a machine using Azure Automation, or even auditing a machine’s compliance to a specific set of controls using Azure Policy. On the Kubernetes side you can monitor your on-premises clusters using Azure Container Insights or even deploy an App Service on your on-premises cluster. These capabilities continue to grow so check the official documentation for up to date information.
One of the interesting capabilities of Azure Arc that captured my eye was the ability to use a system-assigned management identity. I’ve written in the past about managed identities so I’ll stick to covering the very basics of them today. A managed identity is simply some Microsoft-managed automation on top of an Azure AD service principal, which is best explained as a security principal used for non-humans. One of the primary benefits managed identities provider over traditional service principals is automatic credential rotation. If you come from the AWS world, managed identities are similar to AWS IAM roles.
Those of you that manage service principals of scale are surely familiar with the pain of having to work with application owners to rotate the credentials of service principals in use within corporate applications. Beyond the operational overhead, the security risk exists for an attacker to obtain the service principal credentials and use them outside of Azure. Since service principals are not yet subject to Azure AD Conditional access, rotation of credentials becomes a critical control to have in place. Both this operational overhead and security risk make the automated rotation of credentials capability of managed identities a must have.
Managed identities are normally only available for use by resources running in Azure like Azure VMs, App Service instances, and the like. In the past, if you were coming from outside of Azure such as on-premises or another cloud like AWS, you’d be forced to use a service principal and struggle with the challenges I’ve outlined above. Azure Arc introduces the ability to leverage managed identities from outside of Azure.
This made me very curious as to how this was being accomplished for a machine running outside of Azure. The official documentation goes into some detail. The documentation explains a few key items:
A system-assigned managed identity is provisioned for the Arc-enabled server when the server is onboarded to Arc
The Azure Instance Metadata Service (IDMS) is configured with the managed identity’s service principal client id and certificate
Code running on the machine can request an access token from the IDMS
The IDMS challenges the application code to prove it is privileged enough to obtain the access token by requiring it provide a secret to attest that it is highly privileged
This information left me with a few questions:
What exactly is the challenge it’s hitting the application with?
Since the service principal is using a certificate-based credential, the private key has to be stored on the machine. If so where and how easy would it be for an attacker to steal it?
How easy would it be to steal the identity of the Azure Arc machine in order to impersonate it on another machine?
To answer these questions I decided to deploy Azure Arc to a number of Windows and Linux machines I have running on my home instance of Hyper-V. That would give me god access to each machine and the ability to deploy whatever toolset I wanted to take a peek behind the curtains. Using the service-principal onboarding process I deployed Azure Arc to a number of Windows and Linux machines running in my home lab. Since I’m stronger in Windows, I decided I’d try to extract the service principal credentials from a Windows machine and see if I could use them on Azure Arc-enabled Linux machine.
The technical overview documentation covers how Windows and Linux machines connect into Azure Arc and communicate. There are three processes that run to support Arc connectivity which include the Azure Hybrid Instance Metadata Service (HIMDS), Guest Configuration Arc Service, and Guest Configuration Extension Service. The service I was most interested in was the HIDMS service which is in charge of authenticating and obtaining access tokens from Azure.
To better understand the service, I first referenced the service in the Services MMC (Microsoft Management Console). This gave me the path to the executable. Poking around the source directory showed some dsc logs and the location of the processes involved with Azure Arc but nothing too interesting.
HIMDS Service
My next stop was the configuration information. As documented in the technical overview documentation, the configuration data is stored in a json file in the %ProgramData%\AzureConnectedMachineAgent\Config directory. The key pieces of information we can extract from this file are the tenant id, subscription id, certificate thumbprint the service principal is using and most importantly the client id. That’s a start!
I then wondered what else might be stored through this directory hierarchy. Popping up a directory, I came saw the Certs subdirectory. Could it really be this easy? Would this directory contain the private-key certificate? Navigating into the directory does indeed show a certificate file. However, attempting to open the certificate up using the Crypto Shell Extensions resulted in an error stating it’s not a valid certificate. No big deal, maybe the extension is wrong right? I then took the file and ran it through a few Python scripts I have to enumerate certificate data, but unfortunately no certificate format I tried worked. After a quick discussion with my good friend Armen Kaleshian, we both came to the conclusion that the certificate is more than likely wrapped with some type of symmetric encryption.
Service Principal certificate
At this point I decided to take a step back and observe the HIMDS process to ensure this file is being used by the process, and if so, whether I could figure where the key that was used to wrap the certificate was stored. For that I used Process Monitor (the old tools are still the best!). After restarting the HIMDS service and sifting through the capture information, I found the mentions of the process reading the certificate file after reading from the agent config. In the past I’ve seen applications store the symmetric key used to wrap information like this in the registry, but I could not find any obvious calls to the registry indicating this. After a further conversation with Armen, we theorized the symmetric key might be stored in the himds.exe executable itself which would mean every Azure Arc instance would be using the same key to wrap the certificate. This would mean all one would have to do is move the certificate and configuration file to a new machine and one would be able to use the credentials to impersonate the machine’s managed identity (more on that later).
Procmon capture of himds.exe
At this point I was fairly certain I had answered question number 2, but I wanted to explore question number 3 a bit more. For this I leveraged the PowerShell code sample provided in the documentation. Breaking down the code, I observed a web request is made to the IDMS endpoint. From the response from the IDMS endpoint, a returned header is extracted which contains the directory name a file is stored in. The contents of this file are extracted and then included as an authorization header using basic authentication. The resulting response from the IDMS contains the access token used to communicate with the relevant Microsoft cloud API.
By default the security principal doesn’t have any permissions in the ARM (Azure Resource Manager) API, so I granted it reader rights on the resource group and wrote some simple Python code to query for a list of resources in the in a resource group. I was able to successfully obtain the access token and got back a list of resources.
This indicated to me that the additional challenge introduced with the IDMS running on Arc requires the process to obtain a secret generated by the IDMS that is placed in the %PROGRAMDATA\AzureConnectedMachineAgent\Tokens directory. This directory is locked down for access to the security principal the himds service runs as, SYSTEM, the Administrators group, and most importantly the Hybrid agent extension applications group. After the secret is obtain from the file in this directory, it is used as a secret to perform basic authentication with the IDMS service to obtain the access token. As the official documentation states, this is the group you would need to add the security principal running any processes you wanted to use the local IDMS. Question 1 had now been answered.
Tokens directory permissions
That left me with question 3. Based on what I learned so far, an attacker who compromised the machine or a security principal with sufficient permissions on the machine to access the %PROGRAMDATA%\AzureConnectedMachineAgent\ subdirectories could obtain the configuration of the agent and the encrypted certificate. Compromise could also come in the form of compromise of a process that was running as a security principal which was a member of the Hybrid agent extension applications group. Even with the data collected from the agentconfig.json file and the certificate file, the attacker would still be short of the symmetric key used to unwrap the certificate to make it available for use.
Here I took a gamble and tested the theory Armen and I had that the symmetric key was stored in the himds executable. I copied the agentconfig.json file and certificate file and move them to another machine. To eliminate the possibility of the configuration and credentials file being specific to the Windows operating system, I spun up an Ubuntu VM in my Hyper-V cluster. I then installed and configured the agent with a completely separate Azure AD tenant.
Once that was complete, I moved the agentconfig.json and myCert file from the existing Windows machine to the new Ubuntu VM and replaced the existing files. Re-running the same Python script (substituting out the variables for the actual URIs listed in the technical overview) I received back the same listing of resources from the resource group. This proved that if you’re able to get access to both the agentconfig.json and myCert files on an Azure Arc-enabled machine, you can move them to any other machine (regardless of operating system) and re-use them to impersonate source machine and exercise any permissions its been granted access.
Let’s sum up the findings:
The agentconfig.json file contains sensitive information including the tenant id, subscription id, and client id of the service principal associated with the Azure Arc-enabled machine.
The credential for the service principal is stored as a certificate file in the myCert file in the %PROGRAMDATA%\AzureConnectedMachineAgent\certs directory. This file is probably wrapped with some type of symmetric encryption.
The challenge the documentation speaks to involves the creation of a password by the HIMDS process which is then stored in the %PROGRAMDATA%\AzureConnectedMachineAgent\tokens directory. This directory is locked down to privileged users. The password contained in the file created by the HIMDS process is then passed back in a request to the IDMS service running on the machine using the basic HTTP authentication where it is validated and an access token is returned. This is used a method to prove the process requesting the token is privileged.
The agentconfig.json and myCert file can be moved from one Arc-enabled machine to another to be used to impersonate the source machine. This provided further evidence that the symmetric key used to encrypt the certificate file (myCert) is hardcoded into the executable.
It should come as no surprise to you follow old folks that the resulting findings stress the importance traditional security controls. A few recommendations I’d make are as follows:
Least Privilege
If you’re going to use the managed identity available to an Arc-enabled server for applications running on that machine, ensure you’re granting that managed identity only the permissions it requires and nothing beyond it.
Limit the human and non-human actors who have administrator privileges on Azure Arc-enabled machines.
Tightly control which security principals are members of the Hybrid agent extension applications group.
Logging and monitor
Log managed identity sign-ins and monitor for suspicious activity. Until Conditional Access is extended to service principals, it can’t be leveraged to further restrict where these security principals can be used from.
Log and monitor the security events on the Azure Arc-enabled servers
Patching and Updating
Ensure your Azure Arc-enabled machines are patched and updated. You can leverage the Azure Automation update management capabilities if you don’t have an existing system to do this already.
Identity and Access Management
Create a process to review the use case and security risks of applications that want to leverage this functionality.
Perform access reviews to validate any additional permissions granted to these managed identities are still warranted, and if not, remove them.
Nothing in the above is new or fancy but all things you should be doing today. The biggest take away you need here is by extending the Azure management plane you get great new features but you also introduce a potential new risk where a compromised Azure Arc-enabled machine outside of Azure could be used to impact the security of your Azure implementation. If you’re using service principals today (or hell, if you’re simply having users access Azure from their desktops or laptops) you already have this risk, but it’s always worthwhile understanding the different vectors available to attackers.
At the end of the day I love this feature. It is leagues better than how most organizations are handling service principal usage outside of Azure today where the credentials are often hardcoded into code into code, dumped into a file in an unsecured directory, or never rotated for years. There are tradeoffs, but as I said earlier, none of the mitigations are things you shouldn’t already be doing today. Outside of this capability, there are a lot of benefits to Azure Arc and I’ll be very interested to see how Microsoft grows this offering and extends it beyond.
Welcome back fellow geeks for the second installment in my series on Azure Managed Identities. In the first post I covered the business problem and the risks Managed Identities address and in this post I’ll be how managed identities are represented in Azure.
Let’s start by walking through the components that make managed identities possible.
The foundational component of any identity is the data store in which the identity lives in. In the case of managed identities, like much of the rest of the identity data for the Microsoft cloud, the data store is Azure Active Directory. For those of you coming from the traditional on-premises environment and who have had experience with your traditional directories such as Active Directory or one of the many flavors of LDAP, Azure Active Directory (Azure AD) is an Identity-as-a-Service which includes a directory component we can think of as a next generation directory. This means it’s designed to be highly scalable, available, and resilient and be provided to you in “as a service” model where a simple management layer sits in front of all the complexities of the compute, network, and storage infrastructure that makes up the directory. There are a whole bunch of other cool features such as modern authentication, contextual authorization, adaptive authentication, and behavioral analytics that come along with the solution so check out the official documentation to learn about those capabilities. If you want to nerd out on the design of that infrastructure you can check out this whitepaper and this article.
It’s worthwhile to take a moment to cover Azure AD’s relationship to Azure. Every resource in Azure is associated with an Azure subscription. An Azure subscription acts as a legal and payment agreement (think type of Azure subscription, pay-as-you-go, Visual Studio, CSP, etc), boundary of scale (think limits to resources you can create in a subscription), and administrative boundary. Each Azure subscription is associated with a single instance of Azure AD. Azure AD acts as the security boundary for an organization’s space in Azure and serves as the identity backend for the Azure subscription. You’ll often hear it referred to as “your tenant” (if you’re not familiar with the general cloud concept of tenancy check out this CSA article).
Azure AD stores lots of different object types including users, groups, and devices. The object type we are interested in for the purposes of managed identity are service principals. Service principals act as the security principals for non-humans (such as applications or Azure resources like a VM) in Azure AD. These service principals are then granted permissions to access resources in Azure by being assigned permissions to Azure resources such as an instance of Azure Key Vault or an Azure Storage account. Service principals are used for a number of purposes beyond just Managed Identities such as identities for custom developed applications or third-party applications.
Given that the service principals can be used for different purposes, it only makes sense that the service principal object type includes an attribute called the serviceprincipaltype. For example, a third-party or custom developed application that is registered with Azure AD uses the service principal type of Application while a managed identity has the value set to ManagedIdentity. Let’s take a look at an example of the serviceprincipaltypes in a tenant.
In my Geek In The Weeds tenant I’ve created a few application identities by registering the applications and I’ve created a few managed identities. Everything else within the tenant is default out of the box. To list the service principals in the directory I used the AzureAD PowerShell module. The cmdlet that can be used to list out the service principals is the Get-AzureADServicePrincipal. By default the cmdlet will only return the 100 results, so you need to set the All parameter to true. Every application, whether it’s Exchange Online or Power BI, it needs an identity in your tenant to interact with it and resources you create that are associated with the tenant. Here are the serviceprincipaltypes in my Geek In The Weeds tenant.
Now we know the security principal used by a Managed Identity is stored in Azure AD and is represented by a service principal object. We also know that service principal objects have different types depending on how they’re being used and the type that represents a managed identity has a type of ManagedIdentity. If we want to know what managed identities exist in our directory, we can use this information to pull a list using the Get-AzureADServicePrincipal.
We’re not done yet! Managed Identities also come in multiple flavors, either system-assigned or user-assigned. System-assigned managed identities are the cooler of the two in that they share the lifecycle of the resource they’re used by. For example, a system-assigned managed identity can be created when an Azure Function is created thus that the identity will be deleted once the Azure VM is deleted. This presents a great option for mitigating the challenge of identity lifecycle management. By Microsoft handling the lifecyle of these identities each resource could potentially have its own identity making it easier to troubleshoot issues with the identity, avoid potential outages caused by modifying the identity, adhering to least privilege and giving the identity only the permissions the resource requires, and cutting back on support requests by developers to info sec for the creation of identities.
Sometimes it may be desirable to share a managed identity amongst multiple Azure resources such as an application running on multiple Azure VMs. This use case calls for the other type of managed identity, user-assigned. These identities do not share the lifecycle of the resources using them.
Let’s take a look at the differences between a service principal object for a user-assigned vs a system-assigned managed identity. Here I ran another Get-AzureADServicePrincipal and limited the results to serviceprincipaltype of ManagedIdentity.
In the above results we can see that the main difference between the user-assigned (testing1234) and system-assigned (systemmis) is the within the AlternativeNames property. For the system-assigned identity has values of isExplicit set to False and has another value of /subscriptions//resourcegroups/managedidentity/
providers/Microsoft.Compute/virtualMachines/systemmis. Notice the bolded portion specifies this is being used by a virtual machine named systemmis. The user-assigned identity has the isExplicit set to True and another property with the value of /subscriptions//resourcegroups/managedidentity/ providers/Microsoft.ManagedIdentity/userAssignedIdentities/testing1234. Here we can see the identity is an “explicit” managed identity and is not directly linked to an Azure resource.
This difference gives us the ability to quickly report on the number of system-assigned and user-assigned managed identities in a tenant by using the following command.
True would give us user-assigned and False would give us system-assigned. Neat right?
Let’s summarize what we’ve learned:
An object in Azure Active Directory is created for each managed identity and represents its security principal
The type of object created is a service principal
There are multiple service principal types and the one used by a Managed Identity is called ManagedIdentity
There are two types of managed identities, user-assigned and system-assigned
System-assigned managed identities share the lifecycle of the resource they are associated with while user-assigned managed identities are created separately from the resource, do not share the resource lifecycle, and can be used across multiple resources
The object representing a user-assigned managed identity has a unique value of isExplicit=True for the AlternativeNames property while a system-assigned managed identity has that value of isExplicit=False.
That’s it for this post folks. In the next post I’ll walk through the process of creating a managed identity for an Azure VM and will demonstrate with a bit of Python code how we can use the managed identity to access a secret stored in Azure Key Vault.
“I love the overhead of password management” said no one ever.
Password management is hard. It’s even harder when you’re managing the credentials for non-humans, such as those used by an application. Back in the olden days when the developer needed a way to access an enterprise database or file share, they’d put in a request with help desk or information security to have an account (often referred to as a service account) provisioned in Windows Active Directory, an LDAP, or a SQL database. The request would go through a business approval and some support person would created the account, set the password, and email the information to the developer. This process came with a number of risks:
Risk of compromise of the account
Risk of abuse of the account
Risk of a significant outage
These risks arise due to the following gaps in the process:
Multiple parties knowing the password (the party who provisions the account and the developer)
The password for the account being communicated to the developer unencrypted such as plain text in an email
The password not being changed after it is initially set due to the inability or difficult to change the password
The password not being regularly rotated due to concerns over application outages
The password being shared with other developers and the account then being used across multiple applications without the dependency being documented
Organizations tried to mitigate the risk of compromise by performing such actions as requiring a long and complex password, delivering the password in an encrypted format such as an encrypted Microsoft Office document, instituting policy requiring the password to be changed (exceptions with this one are frequent due to outage concerns), implementing password vaulting and management such as CyberArk Enterprise Password Vault or Hashicorp Vault, and instituting behavioral monitoring solutions to check for abuse. Password rotation and monitoring are some of the more effective mitigations but can also be extremely challenging and costly to institute at a scale even with a vaulting and management solution. Even then, there are always the exceptions to the systems with legacy applications which are not compatible (sadly these are often some of the more critical systems).
When the public cloud came around the credential management challenge for application accounts exploded due to the most favored traits of a public cloud which include on-demand self-service and rapid elasticity and scalability. The challenge that was a few hundred application identities has grown quickly into thousands of applications and especially containers and serverless functions such as AWS Lambda and Azure Functions. Beyond the volume of applications, the public cloud also changes the traditional security boundary due to its broad network access trait. Instead of the cozy feeling multiple firewalls gave you, you now have developers using cloud services such as storage or databases which are directly administered via the cloud management plane which is exposed directly to the Internet. It doesn’t stop here folks, you also have developers heavily using SaaS-based version control solutions to store the code which may have credentials hardcoded into it potentially publicly exposing those credentials.
Thankfully the public cloud providers have heard the cries of us security folk and have been working hard to help address the problem. One method in use is the creation of security principals which are designed around the use of temporary credentials. This way there are no long standing credentials to share, compromise, or abuse. Amazon has robust use of this concept in AWS using IAM Roles. Instead of hardcoding a set of IAM User credentials in a Lambda or an application running on an EC2 instance, a role can be created with the necessary permissions required for the application and be assumed by either the Lambda service or EC2 instance.
For this series of posts I’m going to be focusing on one of Microsoft Azure’s solutions to this problem, which are called Managed Identities. For you folks that are more familiar with AWS, Managed Identities conceptually work the same was as IAM Roles. A security principal is created, permissions are granted, and the identity is assumed by a resource such as an Azure Web App or an Azure VM. There are some features that differ from IAM Roles that add to the appeal of Managed Identities such as associating the identity lifecycle of the Managed Identity to the resource such that when the resource is created, the managed identity is created, and when the resource is destroyed, the identity is destroy.
In the next entry I will do a deeper dive into what a managed identity looks like behind the scenes.