Authentication in Azure OpenAI Service

Posted on April 2, 2023 by mattfeltonma

This is part of my series on the Azure OpenAI Service:

Updates:

1/18/2024 to reference considerable library changes with new API version. See below for details
4/3/2023 with simpler way to authenticate with Azure AD via Python SDK

Hello again!

1/18/2024 Update – Hi folks! There were some considerable changes to the OpenAI Python SDK which offers an even simpler integration with the Azure OpenAI Service. While the code in this post is a bit dated, I feel the thought process is still important so I’m going to preserve it as is! If you’re looking for examples of how to authenticate with the Azure OpenAI Service using the Python SDK with different types of authentication (service principal vs managed identity) or using the REST API, I’ve placed a few examples in this GitHub repository. Hope it helps!

Days and nights have been busy diving deeper into the AI landscape. I’ve been reading a great book by Tom Taulli called Artificial Intelligence Basics: A Non-Technical Introduction. It’s been a huge help in getting down the vocabulary and understanding the background to the technology from the 1950s on. In combination with the book, I’ve been messing around a lot with Azure’s OpenAI Service and looking closely at the infrastructure and security aspects of the service.

In my last post I covered the controls available to customers to secure their specific instance of the service. I noted that authentication to the service could be accomplished using Azure Active Directory (AAD) authentication. In this post I’m going to take a deeper look at that. Be ready to put your geek hat on because this post will be getting down and dirty into the code and HTTP transactions. Let’s get to it!

Before I get into the details of how supports AAD authentication, I want to go over the concepts of management plane and data plane. Think of management plane for administration of the resource and data plane for administration of the data hosted within the resource. Many services in Azure have separate management planes and data planes. One such service is Azure Storage which just so happens to have similarities with authentication to the OpenAI Service.

When a customer creates an Azure Storage Account they do this through interaction with the management plane which is reached through the ARM API hosted behind management.azure.come endpoint. They must authenticate against AAD to get an access token to access the API. Authorization via Azure RBAC then takes place to validate the user, managed identity, or service principal has permissions on the resource. Once the storage account is created, the customer could modify the encryption key from a platform managed key (PMK aka key managed by Microsoft) to a customer managed key (CMK), enable soft delete, or enable network controls such as the storage firewall. These are all operations against the resource.

Once the customer is ready to upload blob data to the storage account, they will do this through a data plane operation. This is done through the Blob Service API. This API is hosted behind the blob.core.windows.net endpoint and operations include creation of a blob or deletion of a blob. To interact with this API the customer has two means of authentication. The first method is the older method of the two and involves the use of static keys called storage account access keys. Every storage account gets two of these keys when a storage account is provisioned. Used directly, these keys grant full access to all operations and all data hosted within the storage account (SAS tokens can be used to limit the operations, time, and scope of access but that won’t be relevant when we talk the OpenAI service). Not ideal right? The second method is the recommended method and that involves AAD authentication. Here the security principal authenticates to AAD, receives an access token, and is then authorized for the operation via Azure RBAC. Remember, these are operations against the data hosted within the resource.

**Authentication in Management Plane vs Data Plane in Azure Storage**

Now why did I give you a 101 on Azure Storage authentication? Well, because the Azure OpenAI Service works in a very similar way.

Let’s first talk about the management plane of the Azure OpenAI Service. Like Azure Storage (and the rest of Azure’s services) it is administered through the ARM API behind the management.azure.com endpoint. Customers will use the management plane when they want to create an instance of the Azure OpenAI Service, switch it from a PMK to CMK, or setup diagnostic settings to redirect logs (I’ll cover logging in a future post). All of these operations will require authentication to AAD and authorization via Azure RBAC (I’ll cover authorization in a future post).

Simple right? Now let’s move to the complexity of the data plane.

Two API keys are created whenever a customer creates an Azure OpenAI Service instance. These API keys allow the customer full access to all data plane operations. These operations include managing a deployment of a model, managing training data that has been uploaded to the service instance and used to fine tune a model, managing fine tuned models, and listing available models. These operations are performed against the Azure OpenAI Service API which lives behind a unique label with an FQDN of openai.azure.com (such as myservice.openai.azure.com). Pretty much all the stuff you would be doing through the Azure OpenAI Studio. If you opt to use these keys you’ll need to remember control access to these keys via securing management plane authorization aka Azure RBAC.

In the above image I am given the option to regenerate the keys in the case of compromise or to comply with my organization’s key rotation process. Two keys are provided to allow for continued access to the service while other key is being rotated.

Here I have simple bit of code using the OpenAI Python SDK. In the code I provide a prompt to the model and ask it to complete it for me and use one of the API keys to authenticate to it.

import logging
import sys
import os
import openai

def main():
    # Setup logging
    try:
        logging.basicConfig(
            level=logging.ERROR,
            format='%asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[logging.StreamHandler(sys.stdout)]
        )
    except:
        logging.error('Failed to setup logging: ', exc_info=True)

    try:

        # Setup OpenAI Variables
        openai.api_type = "azure"
        openai.api_base = os.getenv('OPENAI_API_BASE')
        openai.api_version = "2022-12-01"
        openai.api_key = os.getenv('OPENAI_API_KEY')

        response = openai.Completion.create(
            engine=os.getenv('DEPLOYMENT_NAME'),
            prompt='Once upon a time'
        )

        print(response.choices[0].text)

    except:
        logging.error('Failed to respond to prompt: ', exc_info=True)


if __name__ == "__main__":
    main()

The model gets creative and provides me with the response below.

If you look closely you’ll notice an warning about the security of my session. The reason I’m getting that error is shut off certificate verification in the OpenAI library in order to intercept the calls with Fiddler. Now let me tell you, shutting off certificate verification was a pain in the ass because the developers of the SDK are trying to protect users from the bad guys. Long story short, the Azure Python SDK doesn’t provide an option to turn off certificate checking like say the Azure Python SDK (which you can pass a kwarg of verify=False to turn it off in the request library used underneath). While the developers do provide a property called verify_ssl_certs, it doesn’t actually do anything. Since most Python SDKs use the requests library underneath the hood, I went through the library on my machine and found the api_requestor.py file. Within this file I modified the _make_session function which is creating a requests Sessions object. Here I commented out the developers code and added the verify=False property to the Session object being created.

**Turning off certificate verification in OpenAI Python SDK**

Now don’t go and do this in any environment that matters. If you’re getting a certificate verification failure in your environment you should be notifying your information security team. Certificate verification is an absolute must to ensure the identity of the upstream server and to mitigate the risk of man-in-the-middle attacks.

Once I was able to place Fiddler in the middle of the HTTPS session I was able to capture the conversation. In the screenshot below, you can see the SDK passing the api-key header. Take note of that header name because it will become relevant when we talk AAD authentication. If you’re using OpenAI’s service already, then this should look very familiar to you. Microsoft was nice enough to support the existing SDKs when using one of the API keys.

At this point you’re probably thinking, “That’s all well and good Matt, but I want to use AAD authentication for all the security benefits AAD provides over a static key.” Yeah yeah, I’m getting there. You can’t blame me for nerding out a bit with Fiddler now can you?

Alright, so let’s now talk AAD authentication to the data plane of the Azure OpenAI Service. Possible? Yes, but with some caveats. The public documentation illustrates an example of how to do this using curl. However, curl is great for a demonstration of a concept, but much more likely you’ll be using an SDK for your preferred programming language. Since Python is really the only programming language I know (PowerShell doesn’t count and I don’t want to show my age by acknowledging I know some Perl) let me demonstrate this process using our favorite AAD SDK, MSAL.

For this example I’m going to use a service principal, but if your code is running in Azure you should be using a managed identity. When creating the service principal I granted it the Cognitive Services User RBAC role on the resource group containing the Azure OpenAI Service instance as suggested in the documentation. This is required to authorize the service principal access to data plane operations. There are a few other RBAC roles for the service, but as I said earlier, I’ll cover authorization in a future post. Once the service principal was created and assigned the appropriate RBAC role, I modified my code to include a function which calls MSAL to retrieve an access token with the access scope of Cognitive Services, which the Azure OpenAI Service falls under. I then pass that token as the API key in my call to the Azure OpenAI Service API.

import logging
import sys
import os
import openai
from msal import ConfidentialClientApplication

def get_sp_access_token(client_id, client_credential, tenant_name, scopes):
    logging.info('Attempting to obtain an access token...')
    result = None
    print(tenant_name)
    app = ConfidentialClientApplication(
        client_id=client_id,
        client_credential=client_credential,
        authority=f"https://login.microsoftonline.com/{tenant_name}",
    )
    result = app.acquire_token_for_client(scopes=scopes)

    if "access_token" in result:
        logging.info('Access token successfully acquired')
        return result['access_token']
    else:
        logging.error('Unable to obtain access token')
        logging.error(f"Error was: {result['error']}")
        logging.error(f"Error description was: {result['error_description']}")
        logging.error(f"Error correlation_id was: {result['correlation_id']}")
        raise Exception('Failed to obtain access token')

def main():
    # Setup logging
    try:
        logging.basicConfig(
            level=logging.ERROR,
            format='%asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[logging.StreamHandler(sys.stdout)]
        )
    except:
        logging.error('Failed to setup logging: ', exc_info=True)

    try:
        # Obtain an access token
        token = get_sp_access_token(
            client_id = os.getenv('CLIENT_ID'),
            client_credential = os.getenv('CLIENT_SECRET'),
            tenant_name = os.getenv('TENANT_ID'),
            scopes = "https://cognitiveservices.azure.com/.default"
        )
    except:
        logging.error('Failed to obtain access token: ', exc_info=True)

    try:
        # Setup OpenAI Variables
        openai.api_type = "azure"
        openai.api_base = os.getenv('OPENAI_API_BASE')
        openai.api_version = "2022-12-01"
        openai.api_key = token

        response = openai.Completion.create(
            engine=os.getenv('DEPLOYMENT_NAME'),
            prompt='Once upon a time'
        )

        print(response.choices[0].text)

    except:
        logging.error('Failed to summarize file: ', exc_info=True)


if __name__ == "__main__":
    main()

Let’s try executing that and see what happens.

Uh-oh! What happened? If you recall from earlier the API key is passed in the api-key header. However, to use the access token provided by AAD we have to pass it in the authorization header as seen in the example in Microsoft public documentation.

curl ${endpoint%/}/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2022-12-01 \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $accessToken" \
-d '{ "prompt": "Once upon a time" }'

Thankfully there is a solution to this one without requiring you to modify the OpenAI SDK. If you take a look in the api_requestor.py file again in the library you will see it provides the ability to override the headers passed in the request.

With this in mind, I made a few small modifications. I removed the api_key property and added an Authorization header to the request to the Azure OpenAI Service API which includes the access token received back from AAD.

import logging
import sys
import os
import openai
from msal import ConfidentialClientApplication

def get_sp_access_token(client_id, client_credential, tenant_name, scopes):
    logging.info('Attempting to obtain an access token...')
    result = None
    print(tenant_name)
    app = ConfidentialClientApplication(
        client_id=client_id,
        client_credential=client_credential,
        authority=f"https://login.microsoftonline.com/{tenant_name}",
    )
    result = app.acquire_token_for_client(scopes=scopes)

    if "access_token" in result:
        logging.info('Access token successfully acquired')
        return result['access_token']
    else:
        logging.error('Unable to obtain access token')
        logging.error(f"Error was: {result['error']}")
        logging.error(f"Error description was: {result['error_description']}")
        logging.error(f"Error correlation_id was: {result['correlation_id']}")
        raise Exception('Failed to obtain access token')

def main():
    # Setup logging
    try:
        logging.basicConfig(
            level=logging.ERROR,
            format='%asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[logging.StreamHandler(sys.stdout)]
        )
    except:
        logging.error('Failed to setup logging: ', exc_info=True)

    try:
        # Obtain an access token
        token = get_sp_access_token(
            client_id = os.getenv('CLIENT_ID'),
            client_credential = os.getenv('CLIENT_SECRET'),
            tenant_name = os.getenv('TENANT_ID'),
            scopes = "https://cognitiveservices.azure.com/.default"
        )
    except:
        logging.error('Failed to obtain access token: ', exc_info=True)

    try:
        # Setup OpenAI Variables
        openai.api_type = "azure"
        openai.api_base = os.getenv('OPENAI_API_BASE')
        openai.api_version = "2022-12-01"

        response = openai.Completion.create(
            engine=os.getenv('DEPLOYMENT_NAME'),
            prompt='Once upon a time',
            headers={
                'Authorization': f'Bearer {token}'
            }
            

        )

        print(response.choices[0].text)

    except:
        logging.error('Failed to summarize file: ', exc_info=True)


if __name__ == "__main__":
    main()

Running the code results in success!

4/3/2023 Update – Poking around today looking at another aspect of the service, I came across this documentation on an even simpler way to authenticate with Azure AD without having to use an override. In the code below, I specify an openai.api_type of azure_ad which allows me to pass the token direct via the openai_api_key property versus having to pass a custom header. Definitely a bit easier!

import logging
import sys
import os
import openai
from msal import ConfidentialClientApplication

def get_sp_access_token(client_id, client_credential, tenant_name, scopes):
    logging.info('Attempting to obtain an access token...')
    result = None
    print(tenant_name)
    app = ConfidentialClientApplication(
        client_id=client_id,
        client_credential=client_credential,
        authority=f"https://login.microsoftonline.com/{tenant_name}",
    )
    result = app.acquire_token_for_client(scopes=scopes)

    if "access_token" in result:
        logging.info('Access token successfully acquired')
        return result['access_token']
    else:
        logging.error('Unable to obtain access token')
        logging.error(f"Error was: {result['error']}")
        logging.error(f"Error description was: {result['error_description']}")
        logging.error(f"Error correlation_id was: {result['correlation_id']}")
        raise Exception('Failed to obtain access token')

def main():
    # Setup logging
    try:
        logging.basicConfig(
            level=logging.ERROR,
            format='%asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[logging.StreamHandler(sys.stdout)]
        )
    except:
        logging.error('Failed to setup logging: ', exc_info=True)

    try:
        # Obtain an access token
        token = get_sp_access_token(
            client_id = os.getenv('CLIENT_ID'),
            client_credential = os.getenv('CLIENT_SECRET'),
            tenant_name = os.getenv('TENANT_ID'),
            scopes = "https://cognitiveservices.azure.com/.default"
        )
        print(token)
    except:
        logging.error('Failed to obtain access token: ', exc_info=True)

    try:
        # Setup OpenAI Variables
        openai.api_type = "azure_ad"
        openai.api_base = os.getenv('OPENAI_API_BASE')
        openai.api_key = token
        openai.api_version = "2022-12-01"

        response = openai.Completion.create(
            engine=os.getenv('DEPLOYMENT_NAME'),
            prompt='Once upon a time '
        )

        print(response.choices[0].text)

    except:
        logging.error('Failed to summarize file: ', exc_info=True)


if __name__ == "__main__":
    main()

Let me act like I’m ChatGPT and provide you a summary of what we learned today.

The Azure OpenAI Service has both a management plane and data plane.
The Azure OpenAI Service data plane supports two methods of authentication which include static API keys and Azure AD.
The static API keys provide full permissions on data plane operations. These keys should be rotated in compliance with organizational key rotation policies.
The OpenAI SDK for Python (and I’m going to assume the others) sends an api-key header by default. This behavior can be overridden to send an Authorization header which includes an access token obtained from Azure AD.
It’s recommended you use Azure AD authentication where possible to leverage all the bells and whistles of Azure AD including the usage of managed identities, improved logging, and conditional access for service principal-based access.

Well folks, that concludes this post. I’ll be uploading the code sample above to my GitHub later this week. In the next batch of posts I’ll cover the authorization and logging aspects of the service.

I hope you got some value and good luck in your AI journey!

What If… Volume 2

Posted on October 20, 2020 by mattfeltonma

Hi there folks!

I’ve been busy lately buried in learning and practicing Kubernetes in preparation for the Certified Kubernetes Administrator exam. Tonight I’m taking a break to bring you another entry into the “What If” series I started a few months back.

Let’s get right to it.

What if I need to access a Private Endpoint in a subscription associated with a different Azure AD tenant and I have an existing Azure Private DNS Zone already?

I’ve been helping a good friend who recently joined Microsoft to support his customer as he gets up to speed on the Azure platform. This customer consists of two very large organizations which have a high degree of independence. Each of these organizations have their own Azure AD tenant and their own Azure footprint. One organization is further along in their cloud journey than the other.

Organization A (new to Azure) needed to consume some data that existed in an Azure SQL database in an Azure subscription associated with Organization B’s tenant. Both organizations have strict security and compliance requirements so they are heavy users of Azure PrivateLink Endpoints. A site-to-site VPN (virtual private network) connection was established between the two organizations to facilitate network communication between the Azure environments.

The customer environment looked similar to the above where a machine on-premises in Organization A needed to access the Azure SQL database in Organization B. If you look closely, you probably see the problem already. From a DNS perspective, we have two Azure Private DNS Zones for privatelink.database.windows.net. This means we have two authorities for the same zone.

My peer and I went back on forth with a few different solutions. One solution seemed obvious in that organization A would manually create an A record in their Azure Private DNS zone pointing to IP of the PrivateLink Endpoint in Organization B. Since the organizations had connectivity between the two environments, this would technically work. The challenge with this pattern is it would introduce a potential bottleneck depending on the size of the VPN pipe. It could also lead to egress costs for Organization A depending on how the VPN connection was implemented.

The other option we came up with was to create a Private Endpoint in Organization A’s Azure subscription which would be associated with the Azure SQL instance running in Organization B’s Azure subscription. This would avoid any egress costs, we wouldn’t be introducing a potential bottleneck, and we’d avoid the additional operational head of having to manually manage the A record in Organization A’s Azure Private DNS Zone. Neither of us had done this before and while it seemed to be possible based on Microsoft’s documentation, the how was a bit lacking when talking PaaS services.

To test this I used two separate personal tenants I keep to test scenarios that aren’t feasible to test with internal resources. My goal was to build an architecture like the below.

So was it possible? Why yes it was, and an added bonus I’m going to tell you how to do it.

When you create a Private Endpoint through the Azure Portal, there is a Connection Method radio button seen below. If you’re creating the Private Endpoint for a resource within the existing tenant you can choose the Connect to an Azure resource in my directory option and you get a handy guided selection tool. If you want to connect to a resource outside your tenant, you instead have to select the Connect to an Azure resource by resource ID or alias. In this field you would end the full resource ID of the resource you’re creating the Private Endpoint for, which in this case is the Azure SQL server resource id. You’ll be prompted to enter the sub-resource which for Azure SQL is SqlServer. Proceed to create the Private Endpoint.

After the Private Endpoint has been created you’ll observe it has a Connection status of Pending. This is part of the approval workflow where someone with control over the resource in the destination tenant needs to approve of the connection to the Azure SQL server.

If you jump over to the other resource in the target tenant and select the Private endpoint connections menu option you’ll see there is a pending connection that needs approval along with a message from the requestor.

Select the endpoint to approve and click the approve button. At that point the Private Endpoint in the requestor tenant and you’ll see it has been approved and is ready for use.

This was a fun little problem to work through. I was always under the assumption this would work, the documentation said it would work, but I’m a trust but verify type of person so I wanted to see and experience it for myself.

I hope you enjoyed the post and learned something new. Now back to practicing Kubernetes labs!

Deep Dive into Azure AD and AWS SSO Integration – Part 5

Posted on December 16, 2019 by mattfeltonma

I’m back yet again with the fifth entry into my series on integrating Azure AD and AWS SSO. It’s been a journey and the series has covered a lot of ground. It started with outlining the challenge with the initial integration of Azure AD and AWS using the AWS app in the Azure Marketplace. From there it took a deep dive into the components of the solution and how it compares to a standard integration using your SAML provider of choice. It continued with the steps necessary to configure Azure AD and AWS SSO to support the federated trust to enable single sign-on. The fourth post explored the benefits of SCIM and went step by step on how to configure SCIM between the two services. For this final post I’m going to cover a few different scenarios to demonstrate what’s possible with this new integration.

Before I jump into the scenarios, there is one final task that needs to be completed now that the federated trust and SCIM have been setup. That task is setting up the permission sets in AWS SSO. Permission sets are simply IAM policies (either AWS-managed or custom policies you create). For those of you from the Microsoft Azure world, an IAM policy is a collection of permissions which define what a security principal (such as a user or role) is authorized to do. They are most similar to an Azure RBAC role definition but more flexible and granular due to advanced features such as condition keys. Permission sets are projected into the AWS accounts they are assigned to as AWS IAM roles. These are the IAM roles the security principal assumes.

As I mentioned above, AWS SSO supports both AWS-managed IAM policies and custom IAM policies for permission sets. If you go into the AWS Accounts menu option of AWS SSO you’ll see the accounts associated with the AWS Organization and which permission sets have been associated to the AWS accounts thus resulting in AWS IAM Roles being created within the AWS account. In the image below you can see that I’ve provisioned two permission sets to account1 and account2.

The permission sets tab displays the permission sets I’ve created and whether or not they’ve been provisioned to any accounts. In the screenshot below you’ll see I’ve added four AWS-managed policies for Billing, SecurityAudit, AdministratorAccess, and NetworkAdministrator. Additionally, I created a new permission set named SystemsAdmin which uses a custom IAM policy which restricts the principal assuming the rule to EC2, CloudWatch, and ELB activities.

Back on the AWS organization tab, if you click on an account you can see the AWS SSO Users or Groups that have been assigned to a permission set. In the image below, you can see that I’ve assigned both the B2B Security Admins group and the Security Admins group to the AdministratorAccess permission set and the System Operators group to the SystemsAdmin permission set.

With permission sets out of the way, let’s jump into the scenarios.

Scenario 1 – Windows AD User, AD FS, Azure AD, AWS SSO

In this scenario the user is Bart Simpson who is a member of the System Operators group on-premises and exists authoritatively in a Windows AD forest. A federated trust has been established with Azure AD using an instance of AD FS running on-premises. Azure AD has been integrated with AWS SSO for both SSO (via SAML) and provisioning (via SCIM).

Once Bart was logged into a domain-joined machine, I popped open a browser and navigated to My Apps portal at https://myapps.microsoft.com. This redirected me to the Azure AD login screen. Here I entered Bart’s user name.

Azure AD performed its home realm discovery process, identified that the domain jogcloud.com is configured for federated authentication, and redirected me to AD FS. Take note I purposely broke integrated windows authentication here to show you each step. In a correctly configured browser, you wouldn’t see this screen.

After I successfully authenticated to AD FS, I was bounced back over to Azure AD where the assertion was delivered. Azure AD then whipped up a SAML assertion for AWS SSO, returned it to the browser, and redirected the browser to the AWS SSO assertion consumer URL. AWS SSO consumed the assertion and authenticated Bart into AWS SSO displaying the AWS IAM Role selection page with the relevant roles he has permission to access.

Scenario 2 – Windows AD User, AD FS with Certificate MFA, Azure AD with Conditional Access, AWS SSO

Scenario 1 is pretty simple, so let’s get fancy and layer on some security. Here I added an access control policy into AD FS requiring certificate-based authentication for members of the Security Admins group. Additionally, I added a conditional access policy in Azure AD requiring MFA for any user that is a member of that same group.

Since Homer Simpson regularly runs a nuclear reactor, he’s also the Security Admin for JOGCLOUD. He has been made member of the Windows AD Security Admin group.

As a first step I again popped open a browser and navigated to the My Apps portal. After Homer’s username was plugged in, Azure AD redirected me to the AD FS server. I again broke IWA to capture each step in the process.

After the password challenge was satisfied, I was prompted to provide the appropriate user certificate.

From there I was authenticated to Azure AD and served up the My Apps portal.

Wondering why I wasn’t prompted for Azure MFA? No, I didn’t misconfigure it (at least this time). A not well documented feature (at least in my opinion) of Azure AD is that you can pass a claim asserting a user has satisfied the MFA requirement thus making for a better user experience because the user isn’t required to authenticate multiple times. Yes folks, this means you can layer your traditional certificate-based authentication on top of Azure AD and AWS.

After selecting the AWS SSO app, I was signed into AWS SSO and presented with the role selection screen.

I then selected a one of the roles and was signed into the relevant AWS account assuming the AdministratorAccess IAM Role.

Scenario 3 – Azure AD B2B User, AWS SSO

What if you have a multi-tenant situation due to an acquisition or merger or perhaps you farm out operations to a managed service provider? No worries there, B2B is also supported with this pattern. In this scenario I’m using a user sourced from tenant that has been invited via Azure AD’s B2B. The user has been added to the B2B Security Admins group which exists authoritatively in the inviting tenant (jogcloud.com) and was synchronized to AWS SSO via SCIM.

Opening a browser and navigating to the My Apps portal kicks off Azure AD authentication and drops the user into their source tenant. Once there I can change my tenant by selecting the profile icon and selecting the jogcloud tenant.

I’m then presented with the apps that I’m authorized to use in the jogcloud tenant, which includes the AWS SSO app.

Azure AD kicks off the federated authentication and I’m presented with the AWS role selection page where I can choose to assume the AdministratorAccess role in two of the AWS accounts.

Scenario 4 – AWS CLI

I know what you’re saying now, “But what about CLI?” Well folks, for that you can leverage the AWS CLI v2. It’s still in preview right now, but I did test it using the user from scenario 2 and it worked flawlessly. The experience is pretty anti-climatic so I’m not going to dive into it. The user experience is similar to using the Azure PowerShell cmdlets in that a web browser instance is opened and guides you through the authentication process.

That will sum up this series.

Few technologies get me excited enough to write five posts, but this integration is really amazing. With AWS hooking into Azure AD as effectively as they have (especially love the CLI integration), it reduces operational overhead and improves security which is a combination you rarely see together. Most importantly, it puts the customer first by optimizing the user experience. If you weren’t convinced on Azure AD’s capabilities as an IDaaS, hopefully this series has helped educate you as to the value of the platform.

With that I’ll sign off. A big thanks to the AWS product team that worked on this integration. You did an amazing job that will greatly benefit our mutual customers.

To the rest of you, I wish you happy holidays!

Deep Dive into Azure AD and AWS SSO Integration – Part 4

Posted on December 11, 2019 by mattfeltonma

Today we continue exploring the new integration between Microsoft’s Azure AD (Azure Active Directory) and AWS (Amazon Web Services) SSO (Single Sign-On). Over the past three posts I’ve covered the high level concepts of both platforms, the challenges the integration seeks to solve, and how to enable the federated trust which facilitates the single sign-on experience. If you haven’t read through those posts, I recommend you before you dive into this one. In this post I’ll be covering the neatest feature of the new integration, which is the support for automated provisioning.

If you’ve ever worked in the identity realm before, you know the pains that come with managing the life cycle of an identity from initial provisioning, changes resulting to the identity such as department and position changes, to the often forgotten stage of de-provisioning. On-premises these problems were used solved by cobbled together scripts or complex identity management solution such as SailPoint Identity IQ or Microsoft Identity Manager. While these tools were challenging to implement and operate, they did their job in the world of Windows Active Directory, LDAP, SQL databases and the like.

Then came cloud, and all bets were off. Identity data stores skyrocketed from less than a hundred to hundreds and sometimes thousands (B2C has exploded far beyond event that). Each new cloud service introduced into the enterprise introduced yet another identity management challenge. While some of these offerings have APIs that support identity management operations, most do not, and those that do are proprietary in nature. Writing custom code to each of the APIs is a huge challenge that most enterprises can’t keep up. The result is often manual management of an identity life cycle, through uploading exported CSV files or some poor soul pointing and clicking a thousand times in a vendor portal.

Wouldn’t it be great if there was some mythical standard out that would help to solve this problem, use a standard API through REST, and support the JSON format? Turns out there is and that standard is SCIM (System for Cross-domain Identity Management). You may be surprised to know the standard has been around for a while now (technically 2011). I recall hearing about it at a Gartner conference many many hears ago. Unfortunately, it’s taken a long time to catch on but support is steadily increasing.

Thankfully for us, Microsoft has baked support into Azure AD and AWS recognized the value and took advantage of the feature. By doing this, the identity life cycle challenges of managing an Azure AD and AWS integration has been heavily re-mediated and our lives made easier.

Azure AD Provisioning – Example

Let’s take a look at how set it up, shall we?

The first place you’ll need to go is into the AWS account which is the master for the organization and into the AWS SSO Settings. In Settings you’ll see the provisioning option which is initially set as manual. Select to enable automatic provisioning.

AWS SSO Settings – Provisioning

Once complete, a SCIM endpoint will be created. This is the endpoint in AWS (referred to as the SCIM service provider in the SCIM standard) that the SCIM service on Azure AD (referred to as the client in the SCIM standard) will interact with to search for, create, modify, and delete AWS users and groups. To interact with this endpoint, Azure AD must authenticate to it, which it does with a bearer access token that is issued by AWS SSO. Be aware that the access token has a one year life span, so ensure you set some type of reminder. A quick search through the boto3 API doesn’t show a way to query for issued access tokens (yes you can issue more than one at at time) so you won’t be able to automate the process as of yet.

After SCIM is enabled, AWS SSO Settings for provisioning now reports SCIM in use.

Next you’ll need to bounce over to Azure AD and go into the enterprise app you created (refer to my third post for this process). There you’ll navigate to the Provisioning blade and select Automatic as the provisioning method.

You’ll then need to configure the URL and access token you collected from AWS and test the connection. This will cause Azure AD to test querying the endpoint for a random user and group to validate functionality.

If your test is successful you can then save the settings.

You’re not done yet. Next you have to configure a mapping which map attributes in Azure AD to the resource and attributes in the SCIM schema. Yes folks, SCIM does have a schema for attributes and resources (like users and groups). You can extend it as needed, but in this integration it looks to be using the default user and group resources.

Let’s take a look at what the group mappings look like.

The attribute names on the left are the names of the attributes in Azure AD and the attributes on the right are the names of the attributes Azure AD will write the values of the attributes to in AWS SSO. Nothing too surprising here.

How about the user mappings?

Lots more attributes in the user mappings by default. Now I’m not sure how many of these attributes AWS SSO supports. According to the SCIM standard, a client can attempt to write whatever it wants and any attributes the service provider doesn’t understand is simply discarded. The best list of attributes I could find were located here, and it’s not near this number. I can’t speak to what the minimum required attributes are to make AWS work, because their official instructions on this integration doesn’t say. I know some of the product team sometimes reads the blog, so maybe we’ll luck out and someone will respond with that answer.

The one tweak you’ll need to make here is to delete the mailNickName mapping and replace it with a mapping of objectId to externalId. After you make the change, click the save icon.

I don’t know why AWS requires this so I can only theorize. Maybe they’re using this attribute as a primary key in the back end database or perhaps they’re using it to map the users to the groups? I’m not sure how Azure AD is writing the members attribute over to AWS. Maybe in the future I’ll throw together a basic app to visualize what the service provider end looks like.

Now you need to decide what users and groups you want to sync to AWS SSO. Towards the bottom of the provisioning blade, you’ll see the option to toggle the provisioning status. The scope drop down box has an option to sync all users and groups or to sync only assigned users and groups. Best practice here is basic security, only sync what you need to sync, so leave the option on sync only assigned users and group.

The assigned users and groups refers to users that have been assigned to the enterprise application in Azure AD. This is configured on the Users and Groups blade for the enterprise app. I tested a few different scenarios using an Azure AD dynamic group, standard group, and a group synchronized from Windows AD. All worked successfully and synchronized the relevant users over.

Once you’re happy with your settings, toggle the provisioning status and save the changes. It may take some time depending on how much you’re syncing.

If the sync is successful, you’ll be able to hop back over to AWS SSO and you’ll see your users and groups.

Microsoft’s official documentation does a great job explaining the end to end cycle. The short of it is there’s an initial cycle which grabs all users and groups from Azure AD, then filters the list down to the users and groups assigned to the application. From there it queries the target system to match the user with the matching attribute and if it isn’t found creates it, and if found and needs updating, updates it.

Incremental cycles are down from that point forward every 40 minutes. I couldn’t find any documentation on how to adjust the synchronization frequency. Be aware of that 40 minute sync and consider the end to end synchronization if you’re sourcing from Windows Active Directory. In that case making changes in Windows AD could take just over an hour (assuming you’re using the 30 minute sync interval in Azure AD Connect) to fully synchronize.

As I described in my third post, I have a lab environment setup where a Windows Active Directory domain is syncing to Azure AD. I used that environment to play out a few scenarios.

In the first scenario I disabled Marge Simpson’s account. After waiting some time for changes to synchronize across both platforms, I saw in AWS SSO that Marge Simpson was now disabled.

For another scenario, I removed Barney Gumble from the Network Operators Active Directory group. After waiting time for the sync to complete, the Network Operators group is now empty reflecting Barney’s removal from the group.

Recall that I assigned four groups to the app in Azure AD, Network Operators, Security Admins, Security Auditors, and Systems Operators. These are the four groups syncing to AWS SSO. Barney Gumble was only a member of the Network Operators group, which means removing him put him out of scope for the app assignment. In AWS SSO, he now reports as being disabled.

For our final scenario, let’s look at what happens when I deleted Barney Gumble from Windows Active Directory. After waiting the required replication time, Barney Gumble’s user account was still present in AWS SSO, but set as disabled. While Barney wouldn’t be able to login to AWS SSO, there would still be cleanup that would need to happen on the AWS SSO directory to remove the stale identity records.

The last thing I want to cover is the logging capabilities of the SCIM service in Azure AD. There are two separate logs you can reference. The first are the Provisioning Logs which are currently in preview. These logs are going to be your go to to troubleshoot issues with the provisioning process. They’re available with an Azure AD P1 or above license and are kept for 30 days. Supposedly they’re kept for free for 7 days, but the documentation isn’t clear whether or not you have the ability to consume them. I also couldn’t find any documentation on if it’s possible to pull the logs from an API for longer term retention or analysis in Log Analytics or a 3rd party logging solution.

If you’ve ever used Azure AD, you’ll be familiar with the second source of logs. In the Azure AD Audit logs, you get additional information, which while useful, is more catered to tracking the process vs troubleshooting the process like the provisioning logs.

Before I wrap up, let’s cover a few key findings:

The access token used to access the SCIM endpoint in AWS SSO has a one year lifetime. There doesn’t seem to be a way to query what tokens have been issued by AWS SSO at this time, so you’ll need to manage the life cycle in another manner until the capability is introduced.
Users that are removed from the scope of the sync, either by unassigning them from the app or deleting their user object, become disabled in AWS SSO. The records will need to be cleaned up via another process.
If synchronizing changes from a Windows AD the end to end synchronization process can take over an hour (30 minutes from Windows AD to Azure AD and 40 minutes from Azure AD to AWS SSO).

That will wrap up this post. In my opinion the SCIM service available in Azure AD is extremely under utilized. SCIM is a great specification that needs more love. While there is a growing adoption from large enterprise software vendors, there is a real opportunity for your organization to take advantage of the features it offers in the same way AWS has. It can greatly ease the pain your customers and enterprise users experience having to manage the life cycle of an identity and makes for a nice belt and suspenders to modern identity capabilities in an application.

In the last post of my series I’ll demonstrate a few scenarios showing how simple the end to end experience is for users. I’ll include some examples of how you can incorporate some of the advanced security features of Azure AD to help protect your multi-cloud experience.

See you next post!

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Tag Archives: Azure AD