Securing Azure OpenAI Studio

Securing Azure OpenAI Studio

This is part of my series on GenAI Services in Azure:

  1. Azure OpenAI Service – Infra and Security Stuff
  2. Azure OpenAI Service – Authentication
  3. Azure OpenAI Service – Authorization
  4. Azure OpenAI Service – Logging
  5. Azure OpenAI Service – Azure API Management and Entra ID
  6. Azure OpenAI Service – Granular Chargebacks
  7. Azure OpenAI Service – Load Balancing
  8. Azure OpenAI Service – Blocking API Key Access
  9. Azure OpenAI Service – Securing Azure OpenAI Studio
  10. Azure OpenAI Service – Challenge of Logging Streaming ChatCompletions
  11. Azure OpenAI Service – How To Get Insights By Collecting Logging Data
  12. Azure OpenAI Service – How To Handle Rate Limiting
  13. Azure OpenAI Service – Tracking Token Usage with APIM
  14. Azure AI Studio – Chat Playground and APIM
  15. Azure OpenAI Service – Streaming ChatCompletions and Token Consumption Tracking
  16. Azure OpenAI Service – Load Testing

Updates

  • 5/28/2024 – Updated to mention that objectid of security principal is now included in native diagnostic logs

Hello again folks! Today I’m going to bounce back into AOAI (Azure OpenAI Service) mode and cover a topic that frequently comes up with my customers in regards to securing the service. Over the past year I’ve covered the infrastructure and security controls available within the AOAI. Much of that focus was on accessing the service through an SDK (software development kit) or directly through the API. Today I’m going to spend some time talking about the security controls available to secure the Azure OpenAI Studio, which I’m going to refer to as Studio for the rest of this post.

If you’re unfamiliar with Studio, it’s a GUI-based experience for interacting with the data plane of the AOAI service. In my authorization post, I cover the difference between the service’s management and data planes. At a high level, the management plane is for operations “on the service” while the data plane is for operations “in the service”.

Azure OpenAI Service Management Plane vs Data Plane

Microsoft recommends using an SDK or the API directly when interacting with the data plane of the service. It’s a good recommendation because there are lots of knobs for you to turn to lock down the service and address gaps in the service. When the service is using through the Azure OpenAI Studio, you lose the ability to inject some type of control component between the user’s endpoint and the instance of AOAI. The rest of this post will cover why that is, what controls are available to you, and what risks you’ll have to accept if you opt to make Studio available to your users.

Before I jump into the details of what you get and what you don’t get, I want to cover what some of the main use cases are for using Studio versus accessing the service through an SDK or direct through the API. First and foremost, it should be obvious that GUIs are much more accessible than having to write code to interact with the API. For example, say I’m performing a PoC (proof-of-concept) of the service and I want to quickly test the gpt 3.5 model’s ability to answer a question in my field. For that I can use the Completions interface within Studio to get a chat-like interface with zero coding.

Example of Chat Completion functionality in Azure OpenAI Studio

Another use case may be I want a simple way to test the models on my organization’s data to see if the models can provide value to that data. I don’t want to invest a ton of time coding to perform this functionality because I don’t yet know if the models will be able to provide any value on top of my data.

A lot of the use for Studio comes down to its simplicity of use. If you need to do some basic PoC with minimal funding, using Studio can be a nice shortcut to doing all the code you’d need to do in order to interact with the API to perform the actions.

Long story short, if you’re offering this service to your business units you’re likely going to be asked to provide Studio access to your users. My goal here is to help you understand the risks and mitigations of doing so.

So how does the Azure OpenAI Studio work? It appears to use an MVC (model-view-controller) architecture (or something similar to it for those application developer purists who are much smarter than me). In simple terms for non-developers like myself, the Studio application instructs the user’s browser which data plane endpoints to call and then provides a pretty view in the user’s browser of the responses received from those endpoints.

For you non-developers like myself, I find it helpful to perform an action within Studio and then review the Fiddler capture to observe what the browser did. In the Fiddler capture below, I used the Chat Completion interface in Studio to send a request a completion. You can see that the request was sent from my browser to the data plane endpoint of the service (openai.azure.com).

Fiddler capture of Chat Completion in Azure OpenAI Studio

This trait of the Studio can work to your advantage when you need to secure the Studio. If the calls are made from the user’s endpoint to the data plane, then that means network controls around the data plane can be used to enforce control over access to the Studio for the instance of AOAI. As I covered in my prior posts, AOAI is no different from other Microsoft PaaS services and provides the standard network controls which include the service firewall and support for Private Endpoints.

A common security standard for organizations using Azure is to use Private Endpoints for PaaS services. Private Endpoints allow you to restrict access to the public IP of a PaaS service and limit it to access through an endpoint deployed in the customer’s virtual network. Accessing the service through the Private Endpoint requires the user’s endpoint to be within your organization’s private network. This means by creating a Private Endpoint you can block access to access to the AOAI instance through Studio to endpoints within your private network. If the user attempts to access the AOAI instance through Studio outside of the private network, they’ll be blocked and will receive the error you see below.

Network controls blocking access through Azure OpenAI Studio

Placing your AOAI instance behind a Private Endpoint will be your primary means of controlling access to an AOAI instance through Studio. Creating the private endpoint and blocking public access keeps user’s from accessing the AOAI instance through Studio when hitting the public IP. However, users can still reach the AOAI instance through Studio if they are on the private network. You can lock that down by wrapping an NSG (Network Security Group) around the subnet containing the Private Endpoint, turning on Private Endpoint Network Policies in the subnet, and placing some type of mediator (such an Azure API Management instance) between the user’s endpoint and the AOAI instance. That will restrict the users to the API when interacting with the AOAI instance.

Example AOAI architecture

Outside of network controls, you don’t have much ability to control Studio access. There are no specific RBAC permissions that I’m aware of today that could be stripped from an RBAC role to prevent access to Studio. When it comes to authorization you should strive for least privilege as you’ve always done. My authorization blog has some guidance on how to handle that within the service.

Now that you understand what controls you have, let’s talk about the risks you’re going to need to accept if you plan on granting users access to Studio.

First and foremost, you’re going to need to accept the very basic logging provided by the diagnostics logging available within an AOAI instance. As I cover in the linked post above, the logging is minimal. Prompts and responses will not be logged, traceability in the logs will be limited, and you won’t get metrics as to token usage per call. The lack of visibility into prompts and responses becomes all that much more critical if shut off the built in content filtering and abuse monitoring.

Next up, you won’t have the ability to limit the usage of service on a per user or per app basis. AOAI has API limits around requests and tokens. There are capabilities today to control this on per user or per app basis within a single instance of AOAI today.

Let me summarize what we covered today:

  • Network controls are your primary means to securing access to an AOAI instance through the Azure OpenAI Studio
  • Placing an AOAI instance behind a Private Endpoint and blocking public access restrict Azure OpenAI Studio access to the AOAI instance to endpoints within your private network
  • Azure OpenAI Studio access to an AOAI instance can be blocked completely by placing the AOAI instance behind a private endpoint, inserting some sort of mediation solution (such as API Management, and wrapping an NSG around the subnet containing the Private Endpoint which blocks all access but traffic from the mediator.
  • Exercise least privilege using Azure RBAC but be aware there is no specific permission that allows access to the Azure OpenAI Studio
  • The diagnostic logs provided limited information. Prompts and responses are not logged to the diagnostic logs and neither are token consumption. The former will mean you don’t have visibility into the prompts users are making (think abuse, inclusion of PII, etc) and the latter means you won’t be able to tell who is creating the costs within the instance.

Considering all of the above, my recommendation to customers it to establish an approval process for usage of Studio and ensure there is a strong business need to justify accepting the risks outlined above. The lack of logging is the real gut punch for me. That is a lot of risk, especially since most regulated orgs opt out of content filtering and abuse monitoring.

Nothing to fancy in this post, but hopefully it helps some folks better understand the security options for Azure OpenAI Studio access.

Thanks!

Load Balancing in Azure OpenAI Service

Load Balancing in Azure OpenAI Service

This is part of my series on GenAI Services in Azure:

  1. Azure OpenAI Service – Infra and Security Stuff
  2. Azure OpenAI Service – Authentication
  3. Azure OpenAI Service – Authorization
  4. Azure OpenAI Service – Logging
  5. Azure OpenAI Service – Azure API Management and Entra ID
  6. Azure OpenAI Service – Granular Chargebacks
  7. Azure OpenAI Service – Load Balancing
  8. Azure OpenAI Service – Blocking API Key Access
  9. Azure OpenAI Service – Securing Azure OpenAI Studio
  10. Azure OpenAI Service – Challenge of Logging Streaming ChatCompletions
  11. Azure OpenAI Service – How To Get Insights By Collecting Logging Data
  12. Azure OpenAI Service – How To Handle Rate Limiting
  13. Azure OpenAI Service – Tracking Token Usage with APIM
  14. Azure AI Studio – Chat Playground and APIM
  15. Azure OpenAI Service – Streaming ChatCompletions and Token Consumption Tracking
  16. Azure OpenAI Service – Load Testing

Updates:

5/23/2024 – Check out my new post on this topic here!

Another Azure OpenAI Service post? Why not? I gotta justify the costs of maintaining the blog somehow!

The demand for the AOAI (Azure OpenAI Service) is absolutely insane. I don’t think I can compare the customer excitement over the service to any other service I’ve seen launch during my time working at cloud providers. With that demand comes the challenge to the cloud service provider of ensuring there is availability of the service for all the customers that want it. In order to do that, Microsoft has placed limits on the number of tokens/minute and requests/minute that can be made to a specific AOAI instance. Many customers are hitting these limits when moving into production. While there is a path for the customer to get the limits raised by putting in a request to their Microsoft account team, this process can take time and there is no guarantee the request can or will be fulfilled.

What can customers do to work around this problem? You need to spin up more AOAI instances. At the time of this writing you can create 3 instances per region per subscription. Creating more instances introduces the new problem distributing traffic across those AOAI instances. There are a few ways you could do this including having the developer code the logic into their application (yuck) ore providing the developer a singular endpoint which is doing the load balancing behind the scenes. The latter solution is where you want to live. Thankfully, this can be done really easily with a piece of Azure infrastructure you are likely already using with AOAI. That piece of infrastructure is APIM (Azure API Management).

Sample AOAI Architecture

As I’ve covered in my posts on AOAI and APIM and my granular chargebacks in AOAI, APIM provides a ton of value the AOIA pattern by providing a gate between the application and the AOAI instance to inspect and action on the request and response. It can be used to enforced Azure AD authentication, provide enhanced security logging, and capture information needed for internal chargebacks. Each of these enhancements is provided through APIM’s custom policy language.

APIM and AOAI Flow

By placing APIM into the mix and using a simple APIM policy we can introduce simple randomized load balancing. Let’s take a deeper look at this policy

<!-- This policy randomly routes (load balances) to one of the two backends -->
<!-- Backend URLs are assumed to be stored in backend-url-1 and backend-url-2 named values (fka properties), but can be provided inline as well -->
<policies>
    <inbound>
        <base />
        <set-variable name="urlId" value="@(new Random(context.RequestId.GetHashCode()).Next(1, 3))" />
        <choose>
            <when condition="@(context.Variables.GetValueOrDefault<int>("urlId") == 1)">
                <set-backend-service base-url="{{backend-url-1}}" />
            </when>
            <when condition="@(context.Variables.GetValueOrDefault<int>("urlId") == 2)">
                <set-backend-service base-url="{{backend-url-2}}" />
            </when>
            <otherwise>
                <!-- Should never happen, but you never know ;) -->
                <return-response>
                    <set-status code="500" reason="InternalServerError" />
                    <set-header name="Microsoft-Azure-Api-Management-Correlation-Id" exists-action="override">
                        <value>@{return Guid.NewGuid().ToString();}</value>
                    </set-header>
                    <set-body>A gateway-related error occurred while processing the request.</set-body>
                </return-response>
            </otherwise>
        </choose>
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
    </outbound>
    <on-error>
        <base />
    </on-error>
</policies>

In the policy above a random number is generated that is greater than or equal to 1 and less than 3 using the Next method. The application’s request is sent along to one of the two backends based upon that number. You could add additional backends by upping the max value in the Next method and adding another when condition. Pretty awesome right?

Before you ask, no you do not need a health probe to monitor a cloud service provider managed service. Please don’t make your life difficult by introducing an Application Gateway behind the APIM instance in front of the AOAI instance because Application Gateway allows for health probes and more complex load balancing. All you’re doing is paying Microsoft more money, making your operations’ team life miserable, and adding more latency. Ensuring the service is available and health is on the cloud service provider, not you.

But Matt, what about taking an AOAI instance out of the pool if it beings throttling traffic? Again, no you do not need to this. Eventually this APIM as a simple load balancer pattern will not necessary when the AOAI service is more mature. When that happens, your applications consuming the service will need to be built to handle throttling. Developers are familiar with handling throttling in their application code. Make that their responsibility.

Well folks, that’s this short and sweet post. Let’s summarize what we learned:

  • This pattern requires Azure AD authentication to AOAI. API keys will not work because each AOAI instance has different API keys.
  • You may hit the requests/minute and tokens/minute limits of an AOAI instance.
  • You can request higher limits but the request takes time to be approved.
  • You can create multiple instances of AOAI to get around the limits within a specific instance.
  • APIM can provide simple randomized load balancing across multiple instances of AOAI.
  • You DO NOT need anything more complicated than simple randomized load balancing. This is a temporary solution that you will eventually phase out. Don’t make it more complicated than it needs to be.
  • DO NOT introduce an Application Gateway behind APIM unless you like paying Microsoft more money.

Have a great week!

Authorization in Azure OpenAI Service

This is part of my series on GenAI Services in Azure:

  1. Azure OpenAI Service – Infra and Security Stuff
  2. Azure OpenAI Service – Authentication
  3. Azure OpenAI Service – Authorization
  4. Azure OpenAI Service – Logging
  5. Azure OpenAI Service – Azure API Management and Entra ID
  6. Azure OpenAI Service – Granular Chargebacks
  7. Azure OpenAI Service – Load Balancing
  8. Azure OpenAI Service – Blocking API Key Access
  9. Azure OpenAI Service – Securing Azure OpenAI Studio
  10. Azure OpenAI Service – Challenge of Logging Streaming ChatCompletions
  11. Azure OpenAI Service – How To Get Insights By Collecting Logging Data
  12. Azure OpenAI Service – How To Handle Rate Limiting
  13. Azure OpenAI Service – Tracking Token Usage with APIM
  14. Azure AI Studio – Chat Playground and APIM
  15. Azure OpenAI Service – Streaming ChatCompletions and Token Consumption Tracking
  16. Azure OpenAI Service – Load Testing

Hello folks!

The fun with the new Azure OpenAI Service continues! I’ve been lucky enough to have been tapped to help a number of Microsoft financial services customers with getting the Azure OpenAI Service in place with the appropriate infrastructure and security controls. In the process, I get to learn from a ton of smart people in the AI space. It’s truly been one of the highlights of my 20-year career.

Over the past few weeks I’ve been posting about what I’ve learned, and today I’m going to continue with that. In my first post on the service I gave a high level overview of the security controls Microsoft makes available to the customer to secure their instance of Azure OpenAI Service. In my second post I dove deep into how the service handles authentication and how Azure Active Directory (Azure AD) can be used to improve over the built-in API key-based authentication. Today I’m going to cover authorization and demonstrate how using Azure AD authentication lets you take advantage of granular authorization with Azure RBAC.

Let’s dig in!

As I covered in my last post, the Azure OpenAI Service has both a management plane and data plane. Each plane supports different types of authentication (process of verifying the identity of a user, process, or device, often as a prerequisite to allowing access to resources in an information system) and authorization (The right or a permission that is granted to a system entity to access a system resource). Operations such as swapping to a customer-managed key, enabling a private endpoint, or assigning a managed identity to the service occur within the management plane. Activities such as uploading training data or issuing a prompt to a model occur at the data plane. Each plane uses a different API endpoint. The image below will help you visualize the different planes.

Azure OpenAI Service Management and Data Planes

As illustrated above, authorization within the management plane is handled using Azure RBAC because authentication to that plane requires Azure AD-based authentication. Here we can limit the operations occurring at the management plane a security principal (user, service principal, managed identity, Azure Active Directory group (local or synchronized from on-premises) can perform by using Azure RBAC. For those of you coming from the AWS world, and where the Azure OpenAI Service may be your first venture into Azure, Azure RBAC is Azure’s authorization solution. It’s similar to an AWS IAM Policy. Let’s take a look at a built-in RBAC role that a customer might grant a data scientist who will be using the Azure OpenAI Service.

{
    "id": "/subscriptions/de90ea7d-a9c3-4957-8c96-XXXXXXXXXXXX/providers/Microsoft.Authorization/roleDefinitions/a97b65f3-24c7-4388-baec-2e87135dc908",
    "properties": {
        "roleName": "Cognitive Services User",
        "description": "Lets you read and list keys of Cognitive Services.",
        "assignableScopes": [
            "/"
        ],
        "permissions": [
            {
                "actions": [
                    "Microsoft.CognitiveServices/*/read",
                    "Microsoft.CognitiveServices/accounts/listkeys/action",
                    "Microsoft.Insights/alertRules/read",
                    "Microsoft.Insights/diagnosticSettings/read",
                    "Microsoft.Insights/logDefinitions/read",
                    "Microsoft.Insights/metricdefinitions/read",
                    "Microsoft.Insights/metrics/read",
                    "Microsoft.ResourceHealth/availabilityStatuses/read",
                    "Microsoft.Resources/deployments/operations/read",
                    "Microsoft.Resources/subscriptions/operationresults/read",
                    "Microsoft.Resources/subscriptions/read",
                    "Microsoft.Resources/subscriptions/resourceGroups/read",
                    "Microsoft.Support/*"
                ],
                "notActions": [],
                "dataActions": [
                    "Microsoft.CognitiveServices/*"
                ],
                "notDataActions": []
            }
        ]
    }
}

Let’s briefly walkthrough each property. The id property is the unique resource name assigned to this role definition. Next up we have the name property and description properties which need no explanations. The assignableScopes property determines at which scope an RBAC role can be assigned. Typical scopes include management groups, subscriptions, resource groups, and resources. Built-in roles will always have an assignable scope of “/” which denotes the RBAC role can be assigned to any management group, subscription, resource group, or role.

I’ll spend a bit of time on the permissions property. The permissions property contains a few different child properties including actions, notActions, dataActions, and notDataActions. The actions property lists the management plane operations allowed by the role while the dataActions lists the data plane operations allowed by the role. The notActions and notDataActions are interesting in that they are used to strip permissions out of the actions or dataActions. For example, say you granted a user full data plane operations to an Azure Key Vault but didn’t want them to have the ability to delete keys. You could to this by giving the user the dataAction of Microsoft.KeyVaults/* and notDataAction of Microsoft.KeyVaults/keys/purge/action. Take note this is NOT an explicit deny. If the user gets this permission in another way through assignment of a different RBAC role the user will be able to perform the action. At this time, Azure does not have a generally available feature that allows for an explicit deny like AWS IAM and what does exist in preview has an extremely narrow scope such that it isn’t very useful.

When you’re ready to assign a role to a security principal (user, service principal, managed identity, Azure Active Directory group (local or synchronized from on-premises) you create what is called a role assignment. A role assignment associates an Azure RBAC Role Definition to a security principal and scope. For example, in the below image I’ve created an RBAC Role Assignment for the Cognitive Services User Role at the resource group scope for the user Carl Carlson. This grants Carl the permission to perform the operations listed in the role definition above to any resource within the resource group, including the Azure OpenAI Resource.

Azure RBAC Role Assignment

Scroll back and take a look at the role definition, notice any risky permission? If you noticed the permission Microsoft.CognitiveServices/accounts/listkeys/action (remember that the Azure OpenAI Service falls under the Cognitive Services umbrella), grab yourself a cookie. As I’ve covered previously, every instance of the Azure OpenAI Service comes with two API keys. These API keys allow for authentication to the instance at the data plane level, can’t be limited in what they can do, and are very difficult to ever track back to who used them. You will want to very tightly control access to those API keys so be wary of who you give this role out to and may want to instead create a similar custom role but without this permission.

The are two other roles which are specific two the Azure OpenAI Service are the Cognitive Services OpenAI Contributor and Cognitive Services OpenAI User. Let’s look at the contributor role first.

{
    "id": "/providers/Microsoft.Authorization/roleDefinitions/a001fd3d-188f-4b5d-821b-XXXXXXXXXXXX",
    "properties": {
        "roleName": "Cognitive Services OpenAI Contributor",
        "description": "Full access including the ability to fine-tune, deploy and generate text",
        "assignableScopes": [
            "/"
        ],
        "permissions": [
            {
                "actions": [
                    "Microsoft.CognitiveServices/*/read",
                    "Microsoft.Authorization/roleAssignments/read",
                    "Microsoft.Authorization/roleDefinitions/read"
                ],
                "notActions": [],
                "dataActions": [
                    "Microsoft.CognitiveServices/accounts/OpenAI/*"
                ],
                "notDataActions": []
            }
        ]
    }
}

The big difference here is this role doesn’t grant much at the management plane. While this role may seem appealing to give to a data scientist because it doesn’t allow access to the API keys, it also doesn’t allow access to the instance metrics. I’ll talk about this more when I do a post on logging and monitoring in the service, but access to the metrics are important for the data scientists. These metrics allow them to see how much volume they’re doing with the service which can help them estimate costs and avoid hitting API limits.

Under the dataActions you can see this role allows all data plane operations. These operations include uploading training data for the creation of fine-tuned models. If you don’t want your users to have this access, then you can either strip the permissions Microsoft.CognitiveServices/accounts/OpenAI/files/import/action or grant the user the next role I’ll talk about.

One interesting thing to note is that while this role grants all data actions, which include data plane permissions around deployments, users with this role cannot deploy models to the instance. An error will be thrown that the user does not have the Microsoft.CognitiveServices/accounts/deployments/write permission. I’m not sure if this by design, but if anyone has a workaround for it, let me know in the comments. It would seem like if you want the user to deploy a model, you’ll need to model a custom role after this role and add that permissions.

The last role I’m going to cover is the Cognitive Services OpenAI User role. Let’s look at the permissions for this one.

{
    "id": "/providers/Microsoft.Authorization/roleDefinitions/5e0bd9bd-7b93-4f28-af87-XXXXXXXXXXXX",
    "properties": {
        "roleName": "Cognitive Services OpenAI User",
        "description": "Ability to view files, models, deployments. Readers can't make any changes They can inference",
        "assignableScopes": [
            "/"
        ],
        "permissions": [
            {
                "actions": [
                    "Microsoft.CognitiveServices/*/read",
                    "Microsoft.Authorization/roleAssignments/read",
                    "Microsoft.Authorization/roleDefinitions/read"
                ],
                "notActions": [],
                "dataActions": [
                    "Microsoft.CognitiveServices/accounts/OpenAI/*/read",
                    "Microsoft.CognitiveServices/accounts/OpenAI/engines/completions/action",
                    "Microsoft.CognitiveServices/accounts/OpenAI/engines/search/action",
                    "Microsoft.CognitiveServices/accounts/OpenAI/engines/generate/action",
                    "Microsoft.CognitiveServices/accounts/OpenAI/engines/completions/write",
                    "Microsoft.CognitiveServices/accounts/OpenAI/deployments/search/action",
                    "Microsoft.CognitiveServices/accounts/OpenAI/deployments/completions/action",
                    "Microsoft.CognitiveServices/accounts/OpenAI/deployments/embeddings/action",
                    "Microsoft.CognitiveServices/accounts/OpenAI/deployments/completions/write"
                ],
                "notDataActions": []
            }
        ]
    }
}

Like the contributor role, this role is very limited with management plane permissions. At the data plane level, this role really allows for issuing prompts and not much else. This role is great a non-human application role assigned via service principal or managed identity. It will allow the application to issue prompts and not much else. You don’t have to worry about a user exploiting this role to access training data you may have uploaded or making any modification to the Azure OpenAI Service instance.

Well folks that wraps this up. Let’s sum up what we’ve learned:

  • The Azure OpenAI Service supports fine-grained authorization through Azure RBAC at both the management plane and data plane when the security principal is authenticated through Azure AD.
  • Avoid using API keys where possible and leverage Azure RBAC for authorization. You can make it much more fine-grained, layer in the controls provided by Azure AD on top of it, and associate the usage of the service back to user (kinda as we’ll see in my post on logging).
  • Tightly control access to the API keys. I’d recommend any role you give to a data scientist or an application that you strip out the listkeys permissions.
  • I’d recommend creating a custom role for human users modeled after the Cognitive Services User role but without the listkeys permission. This will grant the user access to the full data plane and allow access to management plane pieces such as metrics. You can optionally be granular with your dataActions and leave out the files permissions to prevent human users from uploading training data.
  • I’d recommend using the built-in Cognitive Services OpenAI User role for service principals and managed identities assigned to applications. It grants only the permissions these applications are likely going to need and nothing more.
  • I’d avoid using notActions and notDataActions since it’s not an explicit deny and it’s very difficult to determine an effective user’s access in Azure without another tool like Entra Permissions Management.

Well folks, I hope this post has helped you better understand authorization in the service and how you could potentially craft it to align with least privilege.

Next post up will be around logging.

Have a great night!

Authentication in Azure OpenAI Service

This is part of my series on the Azure OpenAI Service:

  1. Azure OpenAI Service – Infra and Security Stuff
  2. Azure OpenAI Service – Authentication
  3. Azure OpenAI Service – Authorization
  4. Azure OpenAI Service – Logging
  5. Azure OpenAI Service – Azure API Management and Entra ID
  6. Azure OpenAI Service – Granular Chargebacks
  7. Azure OpenAI Service – Load Balancing
  8. Azure OpenAI Service – Blocking API Key Access
  9. Azure OpenAI Service – Securing Azure OpenAI Studio
  10. Azure OpenAI Service – Challenge of Logging Streaming ChatCompletions
  11. Azure OpenAI Service – How To Get Insights By Collecting Logging Data
  12. Azure OpenAI Service – How To Handle Rate Limiting
  13. Azure OpenAI Service – Tracking Token Usage with APIM
  14. Azure AI Studio – Chat Playground and APIM
  15. Azure OpenAI Service – Streaming ChatCompletions and Token Consumption Tracking
  16. Azure OpenAI Service – Load Testing

Updates:

  • 1/18/2024 to reference considerable library changes with new API version. See below for details
  • 4/3/2023 with simpler way to authenticate with Azure AD via Python SDK

Hello again!

1/18/2024 Update – Hi folks! There were some considerable changes to the OpenAI Python SDK which offers an even simpler integration with the Azure OpenAI Service. While the code in this post is a bit dated, I feel the thought process is still important so I’m going to preserve it as is! If you’re looking for examples of how to authenticate with the Azure OpenAI Service using the Python SDK with different types of authentication (service principal vs managed identity) or using the REST API, I’ve placed a few examples in this GitHub repository. Hope it helps!

Days and nights have been busy diving deeper into the AI landscape. I’ve been reading a great book by Tom Taulli called Artificial Intelligence Basics: A Non-Technical Introduction. It’s been a huge help in getting down the vocabulary and understanding the background to the technology from the 1950s on. In combination with the book, I’ve been messing around a lot with Azure’s OpenAI Service and looking closely at the infrastructure and security aspects of the service.

In my last post I covered the controls available to customers to secure their specific instance of the service. I noted that authentication to the service could be accomplished using Azure Active Directory (AAD) authentication. In this post I’m going to take a deeper look at that. Be ready to put your geek hat on because this post will be getting down and dirty into the code and HTTP transactions. Let’s get to it!

Before I get into the details of how supports AAD authentication, I want to go over the concepts of management plane and data plane. Think of management plane for administration of the resource and data plane for administration of the data hosted within the resource. Many services in Azure have separate management planes and data planes. One such service is Azure Storage which just so happens to have similarities with authentication to the OpenAI Service.

When a customer creates an Azure Storage Account they do this through interaction with the management plane which is reached through the ARM API hosted behind management.azure.come endpoint. They must authenticate against AAD to get an access token to access the API. Authorization via Azure RBAC then takes place to validate the user, managed identity, or service principal has permissions on the resource. Once the storage account is created, the customer could modify the encryption key from a platform managed key (PMK aka key managed by Microsoft) to a customer managed key (CMK), enable soft delete, or enable network controls such as the storage firewall. These are all operations against the resource.

Once the customer is ready to upload blob data to the storage account, they will do this through a data plane operation. This is done through the Blob Service API. This API is hosted behind the blob.core.windows.net endpoint and operations include creation of a blob or deletion of a blob. To interact with this API the customer has two means of authentication. The first method is the older method of the two and involves the use of static keys called storage account access keys. Every storage account gets two of these keys when a storage account is provisioned. Used directly, these keys grant full access to all operations and all data hosted within the storage account (SAS tokens can be used to limit the operations, time, and scope of access but that won’t be relevant when we talk the OpenAI service). Not ideal right? The second method is the recommended method and that involves AAD authentication. Here the security principal authenticates to AAD, receives an access token, and is then authorized for the operation via Azure RBAC. Remember, these are operations against the data hosted within the resource.

Authentication in Management Plane vs Data Plane in Azure Storage

Now why did I give you a 101 on Azure Storage authentication? Well, because the Azure OpenAI Service works in a very similar way.

Let’s first talk about the management plane of the Azure OpenAI Service. Like Azure Storage (and the rest of Azure’s services) it is administered through the ARM API behind the management.azure.com endpoint. Customers will use the management plane when they want to create an instance of the Azure OpenAI Service, switch it from a PMK to CMK, or setup diagnostic settings to redirect logs (I’ll cover logging in a future post). All of these operations will require authentication to AAD and authorization via Azure RBAC (I’ll cover authorization in a future post).

Simple right? Now let’s move to the complexity of the data plane.

Two API keys are created whenever a customer creates an Azure OpenAI Service instance. These API keys allow the customer full access to all data plane operations. These operations include managing a deployment of a model, managing training data that has been uploaded to the service instance and used to fine tune a model, managing fine tuned models, and listing available models. These operations are performed against the Azure OpenAI Service API which lives behind a unique label with an FQDN of openai.azure.com (such as myservice.openai.azure.com). Pretty much all the stuff you would be doing through the Azure OpenAI Studio. If you opt to use these keys you’ll need to remember control access to these keys via securing management plane authorization aka Azure RBAC.

Azure OpenAI Service API Keys

In the above image I am given the option to regenerate the keys in the case of compromise or to comply with my organization’s key rotation process. Two keys are provided to allow for continued access to the service while other key is being rotated.

Here I have simple bit of code using the OpenAI Python SDK. In the code I provide a prompt to the model and ask it to complete it for me and use one of the API keys to authenticate to it.

import logging
import sys
import os
import openai

def main():
    # Setup logging
    try:
        logging.basicConfig(
            level=logging.ERROR,
            format='%asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[logging.StreamHandler(sys.stdout)]
        )
    except:
        logging.error('Failed to setup logging: ', exc_info=True)

    try:

        # Setup OpenAI Variables
        openai.api_type = "azure"
        openai.api_base = os.getenv('OPENAI_API_BASE')
        openai.api_version = "2022-12-01"
        openai.api_key = os.getenv('OPENAI_API_KEY')

        response = openai.Completion.create(
            engine=os.getenv('DEPLOYMENT_NAME'),
            prompt='Once upon a time'
        )

        print(response.choices[0].text)

    except:
        logging.error('Failed to respond to prompt: ', exc_info=True)


if __name__ == "__main__":
    main()

The model gets creative and provides me with the response below.

If you look closely you’ll notice an warning about the security of my session. The reason I’m getting that error is shut off certificate verification in the OpenAI library in order to intercept the calls with Fiddler. Now let me tell you, shutting off certificate verification was a pain in the ass because the developers of the SDK are trying to protect users from the bad guys. Long story short, the Azure Python SDK doesn’t provide an option to turn off certificate checking like say the Azure Python SDK (which you can pass a kwarg of verify=False to turn it off in the request library used underneath). While the developers do provide a property called verify_ssl_certs, it doesn’t actually do anything. Since most Python SDKs use the requests library underneath the hood, I went through the library on my machine and found the api_requestor.py file. Within this file I modified the _make_session function which is creating a requests Sessions object. Here I commented out the developers code and added the verify=False property to the Session object being created.

Turning off certificate verification in OpenAI Python SDK

Now don’t go and do this in any environment that matters. If you’re getting a certificate verification failure in your environment you should be notifying your information security team. Certificate verification is an absolute must to ensure the identity of the upstream server and to mitigate the risk of man-in-the-middle attacks.

Once I was able to place Fiddler in the middle of the HTTPS session I was able to capture the conversation. In the screenshot below, you can see the SDK passing the api-key header. Take note of that header name because it will become relevant when we talk AAD authentication. If you’re using OpenAI’s service already, then this should look very familiar to you. Microsoft was nice enough to support the existing SDKs when using one of the API keys.

At this point you’re probably thinking, “That’s all well and good Matt, but I want to use AAD authentication for all the security benefits AAD provides over a static key.” Yeah yeah, I’m getting there. You can’t blame me for nerding out a bit with Fiddler now can you?

Alright, so let’s now talk AAD authentication to the data plane of the Azure OpenAI Service. Possible? Yes, but with some caveats. The public documentation illustrates an example of how to do this using curl. However, curl is great for a demonstration of a concept, but much more likely you’ll be using an SDK for your preferred programming language. Since Python is really the only programming language I know (PowerShell doesn’t count and I don’t want to show my age by acknowledging I know some Perl) let me demonstrate this process using our favorite AAD SDK, MSAL.

For this example I’m going to use a service principal, but if your code is running in Azure you should be using a managed identity. When creating the service principal I granted it the Cognitive Services User RBAC role on the resource group containing the Azure OpenAI Service instance as suggested in the documentation. This is required to authorize the service principal access to data plane operations. There are a few other RBAC roles for the service, but as I said earlier, I’ll cover authorization in a future post. Once the service principal was created and assigned the appropriate RBAC role, I modified my code to include a function which calls MSAL to retrieve an access token with the access scope of Cognitive Services, which the Azure OpenAI Service falls under. I then pass that token as the API key in my call to the Azure OpenAI Service API.

import logging
import sys
import os
import openai
from msal import ConfidentialClientApplication

def get_sp_access_token(client_id, client_credential, tenant_name, scopes):
    logging.info('Attempting to obtain an access token...')
    result = None
    print(tenant_name)
    app = ConfidentialClientApplication(
        client_id=client_id,
        client_credential=client_credential,
        authority=f"https://login.microsoftonline.com/{tenant_name}",
    )
    result = app.acquire_token_for_client(scopes=scopes)

    if "access_token" in result:
        logging.info('Access token successfully acquired')
        return result['access_token']
    else:
        logging.error('Unable to obtain access token')
        logging.error(f"Error was: {result['error']}")
        logging.error(f"Error description was: {result['error_description']}")
        logging.error(f"Error correlation_id was: {result['correlation_id']}")
        raise Exception('Failed to obtain access token')

def main():
    # Setup logging
    try:
        logging.basicConfig(
            level=logging.ERROR,
            format='%asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[logging.StreamHandler(sys.stdout)]
        )
    except:
        logging.error('Failed to setup logging: ', exc_info=True)

    try:
        # Obtain an access token
        token = get_sp_access_token(
            client_id = os.getenv('CLIENT_ID'),
            client_credential = os.getenv('CLIENT_SECRET'),
            tenant_name = os.getenv('TENANT_ID'),
            scopes = "https://cognitiveservices.azure.com/.default"
        )
    except:
        logging.error('Failed to obtain access token: ', exc_info=True)

    try:
        # Setup OpenAI Variables
        openai.api_type = "azure"
        openai.api_base = os.getenv('OPENAI_API_BASE')
        openai.api_version = "2022-12-01"
        openai.api_key = token

        response = openai.Completion.create(
            engine=os.getenv('DEPLOYMENT_NAME'),
            prompt='Once upon a time'
        )

        print(response.choices[0].text)

    except:
        logging.error('Failed to summarize file: ', exc_info=True)


if __name__ == "__main__":
    main()

Let’s try executing that and see what happens.

Uh-oh! What happened? If you recall from earlier the API key is passed in the api-key header. However, to use the access token provided by AAD we have to pass it in the authorization header as seen in the example in Microsoft public documentation.

curl ${endpoint%/}/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2022-12-01 \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $accessToken" \
-d '{ "prompt": "Once upon a time" }'

Thankfully there is a solution to this one without requiring you to modify the OpenAI SDK. If you take a look in the api_requestor.py file again in the library you will see it provides the ability to override the headers passed in the request.

With this in mind, I made a few small modifications. I removed the api_key property and added an Authorization header to the request to the Azure OpenAI Service API which includes the access token received back from AAD.

import logging
import sys
import os
import openai
from msal import ConfidentialClientApplication

def get_sp_access_token(client_id, client_credential, tenant_name, scopes):
    logging.info('Attempting to obtain an access token...')
    result = None
    print(tenant_name)
    app = ConfidentialClientApplication(
        client_id=client_id,
        client_credential=client_credential,
        authority=f"https://login.microsoftonline.com/{tenant_name}",
    )
    result = app.acquire_token_for_client(scopes=scopes)

    if "access_token" in result:
        logging.info('Access token successfully acquired')
        return result['access_token']
    else:
        logging.error('Unable to obtain access token')
        logging.error(f"Error was: {result['error']}")
        logging.error(f"Error description was: {result['error_description']}")
        logging.error(f"Error correlation_id was: {result['correlation_id']}")
        raise Exception('Failed to obtain access token')

def main():
    # Setup logging
    try:
        logging.basicConfig(
            level=logging.ERROR,
            format='%asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[logging.StreamHandler(sys.stdout)]
        )
    except:
        logging.error('Failed to setup logging: ', exc_info=True)

    try:
        # Obtain an access token
        token = get_sp_access_token(
            client_id = os.getenv('CLIENT_ID'),
            client_credential = os.getenv('CLIENT_SECRET'),
            tenant_name = os.getenv('TENANT_ID'),
            scopes = "https://cognitiveservices.azure.com/.default"
        )
    except:
        logging.error('Failed to obtain access token: ', exc_info=True)

    try:
        # Setup OpenAI Variables
        openai.api_type = "azure"
        openai.api_base = os.getenv('OPENAI_API_BASE')
        openai.api_version = "2022-12-01"

        response = openai.Completion.create(
            engine=os.getenv('DEPLOYMENT_NAME'),
            prompt='Once upon a time',
            headers={
                'Authorization': f'Bearer {token}'
            }
            

        )

        print(response.choices[0].text)

    except:
        logging.error('Failed to summarize file: ', exc_info=True)


if __name__ == "__main__":
    main()

Running the code results in success!

4/3/2023 Update – Poking around today looking at another aspect of the service, I came across this documentation on an even simpler way to authenticate with Azure AD without having to use an override. In the code below, I specify an openai.api_type of azure_ad which allows me to pass the token direct via the openai_api_key property versus having to pass a custom header. Definitely a bit easier!

import logging
import sys
import os
import openai
from msal import ConfidentialClientApplication

def get_sp_access_token(client_id, client_credential, tenant_name, scopes):
    logging.info('Attempting to obtain an access token...')
    result = None
    print(tenant_name)
    app = ConfidentialClientApplication(
        client_id=client_id,
        client_credential=client_credential,
        authority=f"https://login.microsoftonline.com/{tenant_name}",
    )
    result = app.acquire_token_for_client(scopes=scopes)

    if "access_token" in result:
        logging.info('Access token successfully acquired')
        return result['access_token']
    else:
        logging.error('Unable to obtain access token')
        logging.error(f"Error was: {result['error']}")
        logging.error(f"Error description was: {result['error_description']}")
        logging.error(f"Error correlation_id was: {result['correlation_id']}")
        raise Exception('Failed to obtain access token')

def main():
    # Setup logging
    try:
        logging.basicConfig(
            level=logging.ERROR,
            format='%asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[logging.StreamHandler(sys.stdout)]
        )
    except:
        logging.error('Failed to setup logging: ', exc_info=True)

    try:
        # Obtain an access token
        token = get_sp_access_token(
            client_id = os.getenv('CLIENT_ID'),
            client_credential = os.getenv('CLIENT_SECRET'),
            tenant_name = os.getenv('TENANT_ID'),
            scopes = "https://cognitiveservices.azure.com/.default"
        )
        print(token)
    except:
        logging.error('Failed to obtain access token: ', exc_info=True)

    try:
        # Setup OpenAI Variables
        openai.api_type = "azure_ad"
        openai.api_base = os.getenv('OPENAI_API_BASE')
        openai.api_key = token
        openai.api_version = "2022-12-01"

        response = openai.Completion.create(
            engine=os.getenv('DEPLOYMENT_NAME'),
            prompt='Once upon a time '
        )

        print(response.choices[0].text)

    except:
        logging.error('Failed to summarize file: ', exc_info=True)


if __name__ == "__main__":
    main()

Let me act like I’m ChatGPT and provide you a summary of what we learned today.

  • The Azure OpenAI Service has both a management plane and data plane.
  • The Azure OpenAI Service data plane supports two methods of authentication which include static API keys and Azure AD.
  • The static API keys provide full permissions on data plane operations. These keys should be rotated in compliance with organizational key rotation policies.
  • The OpenAI SDK for Python (and I’m going to assume the others) sends an api-key header by default. This behavior can be overridden to send an Authorization header which includes an access token obtained from Azure AD.
  • It’s recommended you use Azure AD authentication where possible to leverage all the bells and whistles of Azure AD including the usage of managed identities, improved logging, and conditional access for service principal-based access.

Well folks, that concludes this post. I’ll be uploading the code sample above to my GitHub later this week. In the next batch of posts I’ll cover the authorization and logging aspects of the service.

I hope you got some value and good luck in your AI journey!