Azure Authorization – Azure RBAC Basics

This is part of my series on Azure Authorization.

  1. Azure Authorization – The Basics
  2. Azure Authorization – Azure RBAC Basics
  3. Azure Authorization – actions and notActions
  4. Azure Authorization – Resource Locks and Azure Policy denyActions
  5. Azure Authorization – Azure RBAC Delegation
  6. Azure Authorization – Azure ABAC (Attribute-based Access Control)

Welcome back folks. In this post I will be continuing my series on Azure authorization by covering the basics of Azure RBAC. I will review the components that make up Azure RBAC and touch upon some of the functionality I’ll be covering in future posts of this series. Grab a coffee, get comfy, and prepare to review some JSON!

In my first post in this series I touched on the basics of authorization in Azure. In that post I covered the differences between the management plane (actions on a resource) and the data plane (actions on data stored in the resource). I also called out the four authorization planes for Azure which include Entra ID Roles, Enterprise Billing Accounts, Azure Classic Subscription Admin Roles, and Azure RBAC (Role-Based Access Control). If you’re unfamiliar with these concepts, take a read through that post before you continue.

Authorization Planes for Azure

Azure RBAC is the primary authorization plane for Microsoft Azure. When making a request to the ARM (Azure Resource Manager) REST API it’s Azure RBAC that decides whether or not the action is allowed (with the exceptions I covered in my first post and some exceptions I’ll cover through this series). Azure RBAC is configured by using a combination of two resources which include the Azure RBAC Role Definition and Azure RBAC Role Assignment. At a high level, an Azure RBAC Role Definition is a collection of permissions and an assignment is the granting of those permissions to a security principal for an access scope. Definitions and assignments can only be created by a security principal that has been assigned an Azure RBAC Role with the permissions in the Microsoft.Authorization resource provider. The specific permissions include Microsoft.Authorization/roleDefinitions/* and Microsoft.Authorization/roleAssignments/*. Out of the box there are two built-in roles that have these permission which include User Access Administrator and Owner (there are a few others that are newer and I’ll discuss in a future post).

Let me dig deeper into both of these resources.

It all starts with an Azure Role Definition. These can be either built-in (pre-existing roles Microsoft provides out of the box) or custom (a role you design yourself). A role definition is made up of three different components which include the permissions (actions), assignable scopes, and conditions. The permissions include the actions that can be performed, the assignable scope is the access scopes the role definition can be assigned to, and the conditions further constrain the access for the purposes of a largely new and upcoming additional features for Azure RBAC.

Azure RBAC Role Definition – Structure

Permissions are divided into four categories which include actions, notActions, dataActions, and notDataActions. The actions and notActions are management plane permissions and dataActions and notDataActions are data plane permissions. As mentioned earlier, management plane is actions on the resource while data plane is actions on the data within the resource. Azure RBAC used to be management plane only, but has increasingly (thankfully) been expanded into the data plane. Actions and dataActions are the actions that are allowed and notActions and notDataActions are the actions which should be removed from the list of actions. That likely sounds very confusing and I’ll cover it in more depth in my next post. For now, understand that a permission in notAction or notDataAction is not an explicit deny.

The assignable scopes component is a list of places the role definition can be assigned (I’ll get to this later). The assignable scopes include management group (with some limitations for custom role definitions), subscription, resource group, or resource. Note that whatever scopes are includes, the scopes under it are also included. For example, an assignable scope of a management group would make the role definition usable at that management scope, subscriptions associated with that management group, resource groups within those subscriptions, and resources within those resource groups.

Lastly, we have the conditions. Conditions are a new feature of Azure RBAC Role Definitions and Assignments and are used to further filter the access based on other properties of the security principal, session, or resource. These conditions are used in newer features I’ll be covering throughout this series.

When trying to learn something, I’m a big fan of looking at the raw information from the API. It gives you all the properties you’d never know about when interacting via the GUI. Below is the representation of a built-in Azure RBAC Role Definition. You can dump this information using Azure PowerShell, Azure CLI, or the REST API. Using Azure CLI you would use the az role definition command.

Azure RBAC Role Definition – Data

There are a few things I want you to focus on. First, you’ll notice the id property. Each Azure RBAC Role Definition has a unique GUID which must be unique across the Entra ID tenant. The GUIDs for built-in roles should (never run into an instance where they were not) be common across all instances of Entra ID. Like all Azure objects, Azure RBAC Role Definitions also have be be stored somewhere. Custom Azure RBAC Role Definitions can be stored at the management group or subscription-level. I’d recommend you store your Azure RBAC Role Definitions at the management group level whenever possible so they can be re-used (there is a limit of 5,000 custom Azure RBAC Role Definitions per Entra ID tenant) and they exist beyond the lifetime of the subscription (think of use case where you re-use a role definition from subscription 1 in subscription 2 then blow up subscription 1, uh oh).

Next, you’ll see the assignableScopes property with a value of “/”. That “/” represents the root management group (see my post on the root management group if you’re unfamiliar with the concept). As of 3/2024, only built-in role definitions can have an assignable scope of “/”. When you create a custom role definition you will likely create it with an assignable scope of either a management group (good for common roles that will be used across subscriptions and to keep under the 5,000 role definition limit) or subscription (good for use cases where a business unit may have specific requirements).

Lastly, you’ll see that a condition has been applied to this Azure RBAC Role Definition. As of 3/2024 only built-in roles will include conditions in the definitions. I’ll cover what these conditions do in a future post.

Excellent, so you now grasp the Azure RBAC Role Definition. Let me next dive into Assignments.

Azure RBAC Role Assignments associate a role definition to a security principal and assign an access scope to those permissions. At a high-level, they are made up of four components which include the security principal (the entity assigned the role), role definition (the collection of permissions), scope (the access scope), and conditions (further filters on how this access can be exercised).

Azure RBAC Role Assignment – Structure

The security principal component is the entity that will be assigned the role for the specific access scope. This can be an Entra ID user, group, service principal, or managed identity.

The role definition is the Azure RBAC Role Definition (collection of permissions) the security principal will be assigned. This can include a built-in role or a custom role.

The scope is the access scope the security principal can exercise the permissions defined in the role definition. It’s most common to create Azure RBAC Role Assignments at the subscription and resource group scope. If you have a subscription strategy where every application gets its own subscription, the subscription scope may make sense for you. If your strategy is subscriptions at the business unit level you may create assignments at the resource group. Assignments at the management group tend to be limited to roles for the central IT (platform and security) team. Take note there are limits to the number of assignments at different scopes which are documented at this link. As of 3/2024 you cannot assign an Azure RBAC Role with dataActions or notDataActions permissions at the management group scope.

Let’s now take a look at the API representation of a typical role assignment. You can dump this information using Azure PowerShell, Azure CLI, or the REST API. When using Azure CLI you would do:

az role assignment list
Azure RBAC Role Assignment – Data

Here there are a few properties to note. Just like the Azure Role Definitions, the id property of an Azure RBAC Assignment must contain a GUID unique to the Entra ID tenant.

The principalId is the object id of the security principal in Entra ID and the principalType is the object type of the security principal which will be user, group, or service principal. Why no managed identity? The reason for that is managed identities are simply service principals with some orchestration on top. If you’re curious as to how managed identities are represented in Entra ID, check out my post on orphaned managed identities.

The scope is the access scope the permissions will apply to. In the example above, the permissions granted by this role assignment will have the scope of the rg-demo-da-50bfd resource group.

This role assignment also has a condition. The capabilities of conditions and where they are used will be covered in a future post.

The last property I want to touch on is the delegatedManagedIdentityResourceId. This is a property used when Azure Lighthouse is in play.

Alright folks, that should give you a basic understanding of the foundations of Azure RBAC. Your key takeaways for today are:

  • Assigning a security principal permissions consists of two resources, the role definition (the set of permissions) and the assignment (combination of the role, security principal, and access scope).
  • Custom Azure RBAC Role Definitions should be created at the management group level where possible to ensure their lifecycle persists beyond the subscription they are used within. Be aware of the 5,000 per Entra ID tenant limit.
  • Azure RBAC Role Assignments are most commonly created at the subscription or resource group level. Usage at management groups will likely be limited to granting permissions for Central IT or Security. Be aware of the limits around the number of assignments.

In my next post I’ll dig deeper into how notActions and notDataActions works, demonstrate how it is not an explicit deny, and compare and contract an Azure RBAC Role Definition to an AWS IAM Policy.

Have a great week!

Azure Authorization – The Basics

This is part of my series on Azure Authorization.

  1. This is part of my series on Azure Authorization.
  2. Azure Authorization – The Basics
  3. Azure Authorization – Azure RBAC Basics
  4. Azure Authorization – actions and notActions
  5. Azure Authorization – Resource Locks and Azure Policy denyActions
  6. Azure Authorization – Azure RBAC Delegation
  7. Azure Authorization – Azure ABAC (Attribute-based Access Control)

Hello again!

I’ve been wanting to put together a series on authorization in Azure for a while now. Over the past month I spent some time digging into the topic and figured I’d get it in written form while it is still fresh in my mind. I’ll be covering a lot of areas and features in these series, including some cool stuff that is in public preview. Before we get into the gooey details, let’s start with the basics.

When we talk identity I like to break it into three topics: identity, authentication, and authorization. Let me first define those terms using some wording from the NIST glossary. Identity is the attribute or attributes that describe the subject (in terms of Azure think of this as an Entra ID user or service principal). Authentication which is the process of verifying a user, process, or device. Authorization, which will be the topic of this series, is the rights or permissions granted to a subject. More simply, identity is the representation of the subject, authentication is the process of getting assurance that the subject is who they claim to be, and authorization is what the subject can do.

Identity in Azure is provided by the Entra ID directory (formerly Azure Active Directory, Microsoft marketing likes to rebrand stuff every few years). Like any good directory, it supports a variety of object types. The ones relevant to Azure are users, groups, service principals, and managed identities. Users and groups can exist authoritatively in the Entra ID tenant, they can be synchronized from an on-premises directory using something like Entra ID Connect Sync (marketing, please rebrand this), or they can represent federated users from other Entra ID tenants via the Entra ID B2B feature.

Azure Subscriptions (logical atomic unit for Azure whose parallel would be in AWS would be an AWS Account) are associated to a single Entra ID tenant. An Entra ID tenant can be associated to multiple Azure subscriptions. When configuring authorization in Azure you will be able to associate the permissions to security principals sourced from the associated Entra ID tenant.

Since multiple Azure Subscriptions can be associated to the same Entra ID tenant, this means all of those Azure Subscriptions share a common identity data store and authentication boundary. This differs from an AWS Account where each AWS Account has its own unique authentication boundary and directory of IAM Users and Roles. There are positive and negatives to this architectural decision by Microsoft that we can have fun banter with over a few tequilas, but we’ll save that for another day. The key thing I want you to remember is every Azure Subscription in the same Entra ID tenant shares that directory and authentication boundary but has a separate authorization boundary within each subscription. This means that getting authorization right is critical.

Boundaries in Azure

The above will always be true when talking about the management plane (sometimes referred to as the control plane), however there are exceptions when talking about the data plane. So what are the management plane and data planes? I like to define the management plane as the destination you interact with when you want to perform actions on a resource. Meanwhile the data plane is the destination you communicate with when you want to perform actions on data the resource is storing.

Management plane versus data plane

In the image above you’ll see an example of how the management plane and data plane differ when talking about Azure Storage. When you communicate with the management plane you interact with the management.azure.com which is the endpoint for the Azure Resource Manager REST API. The Azure Portal, Azure CLI, Azure PowerShell, ARM (Azure Resource Manager) templates, Bicep templates, and the Azure Terraform Provider are all different ways to interact with this API. Interaction with this API will use Entra ID authentication (which uses modern authentication protocols and standards such as Open ID Connect and SAML). Determining what actions you can perform on resources behind this management plane is then determined by the permissions defined in Azure RBAC (Role-based Access Control) roles you have been assigned (more on this in the next post).

As I mentioned earlier, the data plane can break the the rule of “one identity plane to rule them all”. Notice how interactions with the data plane use a separate endpoint, in this case blob.core.windows.net. This is the Azure Storage Blob REST API and is the API used to interact with the data plane of Azure Storage when using blob storage. As is common with Azure, many PaaS (platform-as-a-service) offerings support Entra ID authentication and Azure RBAC authorization for both the management plane and data plane. What should pop out to you is that there is also support for a service-specific authentication and authorization mechanism, in this case storage account keys and SAS (shared access signature) tokens. You’ll see this pattern often with Azure PaaS offerings and its important to understand that the service-specific authentication and authorization mechanism should only be used if the service doesn’t support Entra ID authentication or authorization, or your use case specifically requires some functionality of the service-specific mechanism that isn’t available in Entra ID. The reason for this is service-specific mechanisms rarely support granular authorization (SAS tokens being an exception) effectively making the person in possession of that key “god” of the data plane for the service. Additionally, there are security features which are specific to Entra ID (Entra ID Conditional Access, Privileged Identity Management, Identity Protection, etc) and, perhaps most critically, traceability and auditability is extremely difficult to impossible when these mechanisms are used.

Yet another interesting aspect of Azure authorization is there are a few ways for a security principal who is highly privileged in other places to navigate themselves into highly privileged roles across Azure resources. In the image below, you can see how a highly privileged user in Entra ID can leverage the Entra ID Global Admin role to obtain highly privileged permissions in Azure. I’ve covered how this works, how to mitigate it, and how to detect it in this post. The other way to do this is through holding a highly privileged role across the Enterprise Billing constructs. While a bit dated, this blog post does a good job explaining how it’s possible.

Azure Authorization Planes

The key things I want you to walk away with for this post are:

  • With a shared identity and authentication boundary, ensuring a solid authorization model is absolutely mission critical for security of your Azure estate.
  • Ensure you tightly control your Entra ID Global Admins and Enterprise Billing authorization because they provide a way to bypass even the best configured Azure RBAC.
  • Whenever possible use Entra ID identities and authentication so you can take advantage of Azure RBAC.
  • Avoid using service-specific authentication and authorization because it tends to be very course-grained and difficult to track.

Alright folks, you are prepped with the basics. In my next post I’ll begin diving into Azure RBAC.

Have a great weekend!

Azure Virtual Network Manager – Dynamic Network Group Membership

Happy New Year fellow geeks!

Over the past few weeks I’ve been diving into the relatively new Azure product Azure Virtual Network Manager (AVNM). AVNM was first introduced back in late 2021 with the connectivity feature and security admin rule feature. In the past year both features have begun to trickle into general availability in some regions. I was interested in the Security Admin Rules feature so I did my usual thing and began to read through all the documentation and experiment with the service. I’ll be covering Security Admin Rules in another post. In this short post I will be focusing on how you onboard virtual networks to the connectivity and security admin rule features.

When an AVNM instance is created, it is assigned a scope of what it can manage. This can subscriptions added individually or it can be all subscriptions under a specific management group. A given scope can only have one AVNM instance assigned to it.

Azure Virtual network Manager Sample Architecture

Today, under the assigned scope, AVNM can manage how virtual networks are connected to each other with the connectivity feature and what traffic is allowed or denied within the virtual network with the security admin rules feature superseding Network Security Groups. Within an AVNM instance you group virtual networks under the managed scope into a construct called a Network Group. Network Groups are then associated to either a connectivity or security admin rule configuration as seen below.

Azure Virtual Network Manager Resource relationships

Network groups can contain multiple virtual networks and virtual networks can be members of multiple Network Groups. Virtual networks can be added to a Network Group manually or dynamically through Azure Policy. The rest of this post will focus on dynamic membership and some of the interesting properties of the Azure Policy definitions.

Before I dive into the policy definition I want to call out a neat feature the Product Group built into the solution. When accessing an AVNM instance from the Azure Portal there is a handy GUI-based tool included that can be used to graphically build the conditions on which virtual networks will be members of the Network Group. In the background, this tool builds out the Azure Policy definition and creates the assignment at the scopes you specify. This is one of the only products I’ve come across within Azure that assists the customer in building out an Azure Policy for the service. Great job by the product group!

Azure Policy builder to onboard virtual networks into a Network Group in Azure Virtual Network Manager

With the settings pictured above, I’m creating an Azure Policy to onboard all virtual networks tagged (there are a number of parameters and operators combinations you can use besides tags) with the key of environment and value of production under the specified scope to the Network Group. The policy will look something like this:

{
"properties": {
"policyType": "Custom",
"mode": "Microsoft.Network.Data",
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Network/virtualNetworks"
},
{
"allOf": [
{
"field": "tags['environment']",
"equals": "production"
}
]
}
]
},
"then": {
"effect": "addToNetworkGroup",
"details": {
"networkGroupId": "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rg-demo-avnm-core332fd/providers/Microsoft.Network/networkManagers/avnm-core332fd/networkGroups/ng-prod"
}
}
}
},
"id": "/providers/Microsoft.Management/managementGroups/jogcloud/providers/Microsoft.Authorization/policyDefinitions/test",
"type": "Microsoft.Authorization/policyDefinitions",
"name": "test"

}

I’ve bolded the two properties I want you to key in on. The first property is the mode property. If you’ve written a custom Azure Policy or examined built-in policies you will likely be used to that property being set to either all or indexed. Here you will see it is set to Microsoft.Network.Data. This is one of the new resource provider modes that has been introduced which extends Azure Policy’s functionality. The other interesting property is the effect property. Again, you will likely be used to this being audit, deny, deployIfNotExists, etc. Instead, it is populated with a value of addToNetworkGroup. Both of these properties are specific to AVNM’s feature for dynamic members into its Network Groups.

Being the geek I am, I decided to try writing my own custom Azure Policy definition which would parameterize the the tag key, value, and resource id of the Network Group. Interestingly, you’re blocked from parameterizing the Network Group id due to a regex filter that has been put in. This regex filter validates that the Network Group id looks like an id and will reject if you try to do it as a parameter. I plan on submitting some feedback requesting this regex filter be removed which would allow for this to be fully parameterized. As of now, it looks like you’ll need an Azure Policy definition for each Network Group where you’re using dynamic membership.

Error message when parameterizing Network Group resource id

Once you create your Azure Policy definition and create the assignment, at the next policy evaluation the matching virtual networks will be added into the Network Group as dynamic members. The feature works exactly as described and is incredibly handy in quickly and efficiently onboarding new and existing virtual networks to a specific Network Group to apply a connectivity or security admin rule configuration.

Well folks that’s it for this short blog post. I found the dynamic membership and new Azure Policy properties interesting enough to warrant their own post. I’ve added an example working parameterized Azure Policy definition to my custom Azure Policy GitHub repo if you’re interested in messing around with it yourself.

Expect more posts to come on Azure Virtual Network Manager. Have a great night!

The Challenge of Logging Azure OpenAI Stream Completions

This is part of my series on GenAI Services in Azure:

  1. Azure OpenAI Service – Infra and Security Stuff
  2. Azure OpenAI Service – Authentication
  3. Azure OpenAI Service – Authorization
  4. Azure OpenAI Service – Logging
  5. Azure OpenAI Service – Azure API Management and Entra ID
  6. Azure OpenAI Service – Granular Chargebacks
  7. Azure OpenAI Service – Load Balancing
  8. Azure OpenAI Service – Blocking API Key Access
  9. Azure OpenAI Service – Securing Azure OpenAI Studio
  10. Azure OpenAI Service – Challenge of Logging Streaming ChatCompletions
  11. Azure OpenAI Service – How To Get Insights By Collecting Logging Data
  12. Azure OpenAI Service – How To Handle Rate Limiting
  13. Azure OpenAI Service – Tracking Token Usage with APIM
  14. Azure AI Studio – Chat Playground and APIM
  15. Azure OpenAI Service – Streaming ChatCompletions and Token Consumption Tracking
  16. Azure OpenAI Service – Load Testing

Updates:

Hello again fellow geeks. Today I’m back with another Azure OpenAI Service (AOAI) post. I’ve talked in the past about the gaps in the native logging for the AOAI service and how the logs lack traceability and details on token usage to be used for chargebacks. I was lucky enough to work with Jake Wang and others on a reference architecture that could address these gaps using Azure API Manager (APIM). I also wrote some custom APIM policies to provide examples for how this information could be captured within APIM. I’ve observed customers coming up with creative solutions such as capturing the data within the application sitting in front of AOAI as a tactical means to get this data while more strategically using third-party API Gateway products such as Apigee, or even building custom highly functional and complex gateways. However, there was a use case that some of these solutions (such as the custom policies I wrote) didn’t account for, and that was streaming completions.

Like OpenAI’s API, the AOAI service API offers support for streaming chat completions. Streaming completions return the model’s completion as a series as events as the tokens are processed versus a non-streaming completion which returns the entire completion once the model is finished processing. The benefit of a streaming completion is a better user experience. There have been studies that show that any delay longer than 10 seconds won’t hold user attention. By streaming the completion as it’s generated the user is receiving that feedback that the website is responding.

Streaming Chat Completion

The OpenAI documentation points out a few challenges when using streaming completions. One of those challenges is the response from the API no longer includes token usage, which means you need to calculate token usage by some other means such as using OpenAI’s open source tokeniser tiktoken. It also makes it difficult to moderate content because only partial completions are received in each event. Outside of those challenges, there is also a challenge when using APIM. As my peer Shaun Callighan points out, Microsoft does not recommend logging the request/response body when dealing with a stream of server-events such as the API is returning with streaming chat completions because it can cause unexpected buffering (which it does with streaming chat completions). This means the application user will not get the behavior the application owner intended them to get. In my testing, nothing was returned until model finished the completion.

If using the Python SDK, you can make a chat completion streaming by adding the stream=true property to the ChatCompletion object as seen below.

        response = openai.ChatCompletion.create(
            engine=DEPLOYMENT_NAME,
            messages=[
                {
                   "role": "user",
                   "content": "Write me a bedtime story"
                }
            ],
            max_tokens=300,
            stream=True
        )

The body of the response includes a series of server-events such as the below.

...
data: {"id":"chatcmpl-8JNDagQPDWjNWOgbUm9u5lRxcmzIw","object":"chat.completion.chunk","created":1699628174,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":"Once"}}],"usage":null}
data: {"id":"chatcmpl-8JNDagQPDWjNWOgbUm9u5lRxcmzIw","object":"chat.completion.chunk","created":1699628174,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" upon"}}],"usage":null}
data: {"id":"chatcmpl-8JNDagQPDWjNWOgbUm9u5lRxcmzIw","object":"chat.completion.chunk","created":1699628174,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" a"}}],"usage":null}
...

So how do you deal with this if you are or were planning to use APIM for logging, load balancing, authorization, and throttling? You have a few options.

  1. You can move logging into the application and use APIM only for load balancing, authorization, and throttling.
  2. You can insert a proxy logging solution behind APIM to handle logging of both streaming and non-streaming completions and use APIM only for load balancing, authorization, and throttling.
  3. You can block streaming completions at APIM.

Option 1

Option 1 is workable at a small scale and is a good tactical solution if you need to get something out to production quickly. The challenge with this option is enforcing it at scale. If you have amazing governance within your organization and excellent SDLC maybe you can enforce this. In my experience, few organizations have the level of maturity needed for this. The other problem with this is ideally logging for the purposes of compliance should be implemented and enforced by another entity to ensure separation of duties.

Benefits

  1. Quick and easy to put in place.

Considerations

  1. Difficult to enforce at scale.
  2. Puts the developers in charge of enforcing logging on themselves. Could be an issue with separation of duties.

Option 2

Option 2 is an interesting solution that my peer Shaun Callighan came up. In Shaun’s architecture a proxy-type solution is placed between APIM and AOAI and that solution handles parsing the requests and responses, calculating token usage, and logging the information to an Event Hub. They have even been kind enough to provide a sample solution demonstrating how this could be done with an Azure Function.

Benefits

  1. Allows you to use continue using APIM for the benefits around load balancing, authorization, and throttling.
  2. Supports streaming chat completions.
  3. Provides the logging necessary for compliance and chargebacks for both streaming and non-streaming chat completions.
  4. Centralized enforcement of logging.

Considerations

  1. You will need to develop your own code to parse the responses/responses, calculate chargebacks, and deliver the logs to Event Hub. (You could use Shaun’s code as a starting point)
  2. You’ll need to ensure this proxy does not become a bottleneck. It will need to scale as requests to the AOAI instance scale along with APIM and whatever else you have in path of the user’s request.

Option 3

Option 3 is another valid option (and honestly a simple fix IMO) and may be where some customers end up in the near term. With this option you block the use of streaming completions at APIM with a custom policy snippet like below. If the developers are worried about the user experience, there is always the option to flash a “processing”-like message in the text window while the model processes the completion.

Benefits

  1. Allows you to continue using APIM for logging, load balancing, throttling, and authorization.
  2. No new code introduced.
  3. Centralized enforcement of logging.
  4. No additional bottlenecks.

Considerations

  1. Your developers may hate you for this.
  2. There may be a legitimate use case where stream chat completions are required.

Since Shaun has a proof-of-concept example for option 2, I figured I’d showcase a sample APIM policy snippet for option 3. In the APIM policy snippet below, I determine if the stream property is included in the request body and store the value in a variable (it will be true or false). I then check the variable to see if the value is true, and if so I return a 404 status code with the message that streaming chat completions are not allowed.

        <!-- Capture the value of the streaming property if it is included -->
        <choose>
            <when condition="@(context.Request.Body.As<JObject>(true)["stream"] != null && context.Request.Body.As<JObject>(true)["stream"].Type != JTokenType.Null)">
                <set-variable name="isStream" value="@{
                    var content = (context.Request.Body?.As<JObject>(true));
                    string streamValue = content["stream"].ToString();
                    return streamValue;
                }" />
            </when>
        </choose>
        <!-- Blocks streaming completions and returns 404 -->
        <choose>
            <when condition="@(context.Variables.GetValueOrDefault<string>("isStream","false").Equals("true", StringComparison.OrdinalIgnoreCase))">
                <return-response>
                    <set-status code="404" reason="BlockStreaming" />
                    <set-header name="Microsoft-Azure-Api-Management-Correlation-Id" exists-action="override">
                        <value>@{return Guid.NewGuid().ToString();}</value>
                    </set-header>
                    <set-body>Streaming chat completions are not allowed by this organization.</set-body>
                </return-response>
            </when>
        </choose>

If you ignore streaming chat completions and try to use a policy such as this one, the model will complete the completion but APIM will throw a 500 status code back at the developer because the structure of a streaming response doesn’t look like the structure of a non-streaming response and it can’t be parsed using that policy’s logic. This means you’ll be throwing money out of the window and potentially struggling with troubleshooting root cause. TLDR, pick an option above to deal with streaming and get it in place if you’re using APIM for logging today or plan to.

Last but not least, I want to link to a wonderful policy snippet by Shaun Callighan. This policy snippet dumps the trace logs from APIM into the headers returned in the response from APIM. This is incredibly helpful when troubleshooting a 500 status code returned by APIM.

Well folks, that wraps up this short blog post on this Friday afternoon. Have a great weekend and happy holidays!