Granular Chargebacks in Azure OpenAI Service

Granular Chargebacks in Azure OpenAI Service

Yes folks, it’s time for yet another Azure OpenAI Service post. This time around I’m going to cover a pattern that can help with operationalizing the service by collecting and analyzing logging data for proper internal chargebacks to the many business units you likely have requesting the service. You’ll want to put on your nerd hat today because we’re going to need to dive a bit in the weeds for this one.

Let me first address why this post is even necessary. The capabilities provided by the AOAI (Azure OpenAI Service) have the feel of a core foundational technology, almost feeling as necessary as basic networking and PKI (public key infrastructure). The service has usages in almost every portion of the business and, very likely, every business unit is asking you for AI at this point.

Beyond the business demand, the architecture of the Azure OpenAI Service lends itself well to being centralized. Each instance offers the same set of static models and the data sent to and returned from the models is ephemeral. Unless you are creating fine-tuned models (which should be a very small percentage of customers), there isn’t any data stored by the service. Yes, there is default storage and human review of prompts and completions for abuse, but customers can opt out of this process. Additionally, as of the date of this blog, customers do not have access to those stored prompts and completions anyway so the risk of one compromise of those 30-days of stored prompts and completions due to a failed customer control doesn’t exist. Don’t get me wrong, there are legitimate reasons to create business unit specific instances for the service for edge use cases such as the creation of fine-tuned models. There are also good arguments to be made to create specific instances for compliance boundaries and separating production from non-production. However, you should be looking at consolidating instances where possible and providing it as a centralized core service.

Now if you go down the route I suggested above, you’ll run into a few challenges. Two of most significant challenges are throttling limits per instance and chargebacks. Addressing the throttling problem isn’t terribly difficult if you’re using the APIM (Azure API Management) pattern I mentioned in my last post. You can enforce specific limits on a per application basis when using Azure AD authentication at APIM on the frontend and you can use a very basic round robin-like load balancing APIM policy at the backend to scale across multiple Azure OpenAI Service instances. The chargeback problem is a bit more difficult to solve and that’s what I’ll be covering in the rest of the post.

The AOAI Service uses a consumption model for pricing which means the more you consume, the more you pay. If you opt to centralize the service, you’re going to need a way to know which app is consuming which amount of tokens. As I covered in my logging post, the native logging capabilities of the AOAI service are lacking as of the date of this blog post. The logs don’t include details as to who made a call (beyond an obfuscated IP address) or the number of tokens consumed by a specific call. Without this information you won’t be able to determine chargebacks. You should incorporate some of this logging directly into the application calling the AOAI service, but that logging will likely be application centric where the intention is to trace a specific call back to an individual user. For a centralized service, you’re likely more interested in handling chargebacks at the enterprise level and want to be able to associate specific token consumption back to a specific business unit’s application.

I took some time this week and thought about how this might be able to be done. The architecture below is what I came up with:

Azure OpenAI Service Chargeback Architecture

APIM and APIM custom policies are the key components of this architecture that make chargebacks possible. It is used to accomplish two goals:

  1. Enforce Azure AD Authentication and Authorization to the AOAI endpoint.
  2. Provide detailed logging of the request and response sent to the service.

Enforcing Azure AD authentication and authorization gives me the calling application’s service principal or managed identity identifier which allows me to correlate the application back to a specific business unit. If you want the details on that piece you can check out my last post. I’ve also pushed the custom APIM policy snippet to GitHub if you’d like to try it yourself.

The second goal is again accomplished through a custom APIM policy. Since APIM sits in the middle of the conversation it gets access to both the request from the application to the AOAI service and the response back. Within the response from a Completion or ChatCompletion the API returns the number of prompt, completion, and total tokens consumed by a specific request as can be seen below.

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "This is a test message.",
        "role": "assistant"
      }
    }
  ],
  "created": 1684425128,
  "id": "chatcmpl-7HaDAS0JUZKcAt2ch2GC2tOJhrG2Q",
  "model": "gpt-35-turbo",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 6,
    "prompt_tokens": 14,
    "total_tokens": 20
  }
}

Awesome, the information we need is there but how do we log it? For that, you can use an APIM Logger. An APIM Logger is an integration with a specific instance of Azure Application Insights or Azure Event Hub. The integration allows you to specify the logger in an APIM policy and send specific data to that integrated service. For the purposes of this integration I choose the Azure Event Hub. The reason being I wanted to allow logging large events (the integration supports up to 200KB messages) in case I wanted to capture the prompt or completion and I wanted the flexibility to integrate with an upstream service for ETL (extract, transform, load).

Setting the up the logger isn’t super intuitive if you want to use a managed identity for APIM to authenticate to the Azure Event Hub. To save you the time I spent scratching my head trying to figure out the properties, I created an ARM template you can use to create your own logger. Once the logger is created, you can begin calling it in the APIM policy. Below is an example of the APIM policy I used to parse the request and response and extract the information I was interested in.

        <log-to-eventhub logger-id="chargeback" partition-id="1">@{

                var responseBody = context.Response.Body?.As<JObject>(true);

                return new JObject(
                    new JProperty("event-time", DateTime.UtcNow.ToString()),
                    new JProperty("appid", context.Request.Headers.GetValueOrDefault("Authorization",string.Empty).Split(' ').Last().AsJwt().Claims.GetValueOrDefault("appid", string.Empty)),
                    new JProperty("operation", responseBody["object"].ToString()),
                    new JProperty("model", responseBody["model"].ToString()),
                    new JProperty("modeltime", context.Response.Headers.GetValueOrDefault("Openai-Processing-Ms",string.Empty)),
                    new JProperty("completion_tokens", responseBody["usage"]["completion_tokens"].ToString()),
                    new JProperty("prompt_tokens", responseBody["usage"]["prompt_tokens"].ToString()),
                    new JProperty("total_tokens", responseBody["usage"]["total_tokens"].ToString())
                ).ToString();
        }</log-to-eventhub>

In the policy above I’m extracting the application’s client id from the access token generated by Azure Active Directory for access to the Azure OpenAI Service. Recall that I have the other policy snippet in place I mentioned earlier in this post in place to force the application to authenticate and authorize using Azure AD. I then grab the pieces of information from the response that I would find useful in understanding the costs of the service and each app’s behavior within the AOAI service.

Now that the logs are being streamed to the Event Hub, you need something to pick them up. You have a lot of options in this space. You could use Azure Data Factory, custom function, Logic App, SIEM like Splunk, and many others. What you choose to do here really depends on where you want to put the data and what you want to do with it prior to putting it there. To keep it simple for this proof-of-concept, I chose the built-in Azure Stream Analytics integration with Event Hub.

The integration creates a Stream Analytics Job that connects to the Event Hub, does small amount of transformation in setting the types for specific fields, and loads the data into a PowerBI dataset.

Azure Stream Analytics and Event Hub Integration

Once the integration was setup, the requests and responses I was making to the AOAI services began to populate in the Power BI dataset. I was then able to build some really basic (and very ugly) visuals to demonstrate the insights this pattern provides for chargebacks. Each graphic shows the costs accumulated by individual applications by their application id.

Power BI Report Showing Application by Application Costs

Pretty cool right? Simple, easy to implement, and decent information to work from.

Since this was a POC, I cut some corners on the reporting piece. For example, I hardcoded the model pricing into some custom columns. If I were to do this at the enterprise level, I’d be supplementing this information from data pulled from the Microsoft Graph and the Azure Retail Pricing REST API. From the Microsoft Graph I’d pull additional attributes about the service principal / managed identity such as a more human readable name. From the Azure Retail Pricing REST API I’d pull down the most recent prices on a per model basis. I’d also likely store this data inside something like Cosmos or Azure SQL to provide for more functionality. From a data model perspective, I’d envision a “enterprise-ready” data model version of the pattern looking like the below.

Possible data model

The key challenge I set out to address here was how to get the data necessary to do chargebacks and what could I do with that data once I got it. Mission accomplished!

Well folks, that covers it. I’d love to see someone looking for a side project with more data skills than me (likely any human being breathing air today) build out the more “full featured” solution using a similar data model to what I referenced above. I hope this pattern helps point your organization in the right direction and spurs some ideas as to how you could solve the ETL and analysis part within your implementation of this pattern.

I’m always interested in hearing about cool solutions. If you come up with something neat, please let me know in the comments or each out on LinkedIn.

Have a great week!

Behavior of Azure Event Hub Network Security Controls

Behavior of Azure Event Hub Network Security Controls

Welcome back fellow geeks.

In this post I’ll be covering the behavior of the network security controls available for Azure Event Hub and the “quirk” (trying to be nice here) in enforcement of those controls specifically for Event Hub. This surfaced with one of my customers recently, and while it is publicly documented, I figured it was worthy of a post to explain why an understanding of this “quirk” is important.

As I’ve mentioned in the past, my primary customer base consists of customers in regulated industries. Given the strict laws and regulations these organizations are subject to, security is always at the forefront in planning for any new workload. Unless you’ve been living under a rock, you’ve probably heard about the recent “issue” identified in Microsoft’s CosmosDB service, creatively named ChaosDB. I’ll leave the details to the researchers over at Wiz. Long story short, the “issue” reinforced the importance of practicing defense-in-depth and ensuring network controls are put in place where they are available to supplement identity controls.

You may be asking how this relates to Event Hub? Like CosmosDB and many other Azure PaaS (platform-as-a-service) services, Event Hub has more than one method to authenticate and authorize to access to both the control plane and the data plane. The differences in these planes are explained in detail here, but the gist of it is the control plane is where interactions with the Azure Resource Manager (ARM) API occurs and involves such operations as creating an Event Hub, enabling network controls like Private Endpoints, and the like. The data plane on the other hand consists of interactions with the Event Hub API and involves operations such as sending an event or receiving an event.

Azure AD authentication and Azure RBAC authorization uses modern authentication protocols such as OpenID Connect and OAuth which comes with the contextual authorization controls provided by Azure AD Conditional Access and the granularity for authorization provided by Azure RBAC. This combination allows you to identify, authenticate, and authorize humans and non-humans accessing the Event Hub on both planes. Conditional access can be enforced to get more context about a user’s authentication (location, device, multi-factor) to make better decisions as to the risk of an authentication. Azure RBAC can then be used to achieve least privilege by granting the minimum set of permissions required. Azure AD and Azure RBAC is the recommended way to authenticate and authorize access to Event Hub due to the additional security features and modern approach to identity.

The data plane has another method to authenticate and authorize into Event Hubs which uses shared access keys. There are few PaaS services in Azure that allow for authentication using shared access keys such as Azure Storage, CosmosDB, and Azure Service Bus. These shared keys are generated at the creation time of the resource and are the equivalent to root-level credentials. These keys can be used to create SAS (shared access signatures) which can then be handed out to developers or applications and scoped to a more limited level of access and set with a specific start and expiration time. This makes using SAS a better option than using the access keys if for some reason you can’t use Azure AD. However, anyone who has done key management knows it’s an absolute nightmare of which you should avoid unless you really want to make your life difficult, hence the recommendation to use Azure AD for both the control plane and data plane.

Whether you’re using Azure AD or SAS, the shared access keys remain a means to access the resource with root-like privileges. While access to these keys can be controlled at the control plane using Azure RBAC, the keys are still there and available for use. Since the usage of these keys when interacting with the data plane is outside Azure AD, it means conditional access controls aren’t an option. Your best bet in locking down the usage of these keys is to restrict access at the control plane as to who can retrieve the access keys and use the network controls available for the service. Event Hubs supports using Private Endpoints, Service Endpoint, and the IP-based restrictions via what I will refer to as the service firewall.

If you’ve used any Azure PaaS service such as Key Vault or Azure Storage, you should be familiar with the service firewall. Each instance of a PaaS service in Azure has a public IP address exposes that service to the Internet. By default, the service allows all traffic from the Internet and the method of controlling access to the service is through identity-based controls provided by the supported authentication and authorization mechanism.

Access to the service through the public IP can be restricted to a specific set of IPs, specific Virtual Networks, or to Private Endpoints. I want quickly to address a common point of confusion for customers; when locking down access to a specific set of Virtual Networks such as available in this interface, you are using Service Endpoints. This list should be empty because there are few very situations where you are required to use a Service Endpoint now because of the availability of Private Endpoints. Private Endpoints are the strategic direction for Microsoft and provide a number of benefits over Service Endpoints such as making the service routable from on-premises over an ExpressRoute or VPN connection and mitigating the risk of data exfiltration that comes with the usage of Service Endpoints. If you’ve been using Azure for any length of time, it’s worthwhile auditing for the usage of Service Endpoints and replacing them where possible.

Service Firewall options

Now this is where the “quirk” of Event Hubs comes in. As I mentioned earlier, most of the Azure PaaS services have a service firewall with a similar look and capabilities as above. If you were to set the the service firewall to the setting above where the “Selected Networks” option is selected and no IP addresses, Virtual Networks, or Private Endpoints have been exempted you would assume all traffic is blocked to the service right? If you were talking about a service such as Azure Key Vault, you’d be correct. However, you’d be incorrect with Event Hubs.

The “quirk” in the implementation of the service firewall for Event Hub is that it is still accessible to the public Internet when the Selected Networks option is set. You may be thinking, well what if I enabled a Private Endpoint? Surely it would be locked down then right? Wrong, the service is still fully accessible to the public Internet. While this “quirk” is documented in the public documentation for Event Hubs, it’s inconsistent to the behaviors I’ve observed in other Microsoft PaaS services with a similar service firewall configuration. The only way to restrict access to the public Internet is to add a single entry to the IP list.

Note in public documentation

So what does this mean to you? Well this means if you have any Event Hubs deployed and you don’t have a public IP address listed in the IP rules, then your Event Hub is accessible from the Internet even if you’ve enabled a Private Endpoint. You may be thinking it’s not a huge deal since “accessible” means open for TCP connections and that authentication and authorization still needs to occur and you have your lovely Azure AD conditional access controls in place. Remember how I covered the shared access key method of accessing the Event Hubs data plane? Yeah, anyone with access to those keys now has access to your Event Hub from any endpoint on the Internet since Azure AD controls don’t come into play when using keys.

Now that I’ve made you wish you wore your brown pants, there are a variety of mitigations you can put in place to mitigate the risk of someone exploiting this. Most of these are taken directly from the security baseline Microsoft publishes for the service. These mitigations include (but are not limited to):

  • Use Azure RBAC to restrict who has access to the share access keys.
  • Take an infrastructure-as-code approach when deploying new Azure Event Hubs to ensure new instances are configured for Azure AD authentication and authorization and that the service firewall is properly configured.
  • Use Azure Policy to enforce Event Hubs be created with correctly configured network controls which include the usage of Private Endpoints and at least one IP address in the IP Rules. You can use the built-in policies for Event Hub to enforce the Private Endpoint in combination with this community policy to ensure Event Hubs being created include at least one IP of your choosing. Make sure to populate the parameter with at least one IP address which could be a public IP you own, a non-publicly routable IP, or a loopback address.
  • Use Azure Policy to audit for existing Event Hubs that may be publicly available. You can use this policy which will look for Event Hub hubs with a default action of Allow or Event Hubs with an empty IP list.
  • Rotate the access keys on a regular basis and whenever someone who had access to the keys changes roles or leaves the organization. Note that rotating the access keys will invalidate any SAS, so ensure you plan this out ahead of time. Azure Storage is another service with access keys and this article provides some advice on how to handle rotating the access keys and the repercussions of doing it.

If you’re an old school security person, not much of the above should be new to you. Sometimes it’s the classic controls that work best. 🙂 For those of you that may want to test this out for yourselves and don’t have the coding ability to leverage one of the SDKs, take a look at the Event Hub add-in for Visual Studio Code. It provides a very simplistic interface for testing sending and receiving messages to an Event Hub.

Well folks, I hope you’ve found this post helpful. The biggest piece of advice I can give you is to ensure you read documentation thoroughly whenever you put a new service in place. Never assume one service implements a capability in the same way (I know, it hurts the architect in me as well), so make sure you do your own security testing to validate any controls which fall into the customer responsibility column.

Have a great week!