Hello folks!
I’m back with another entry on the Azure OpenAI Service (AOAI). In my previous posts, I’ve focused on the native security features that Microsoft provides to its customers to secure their instance of the service. However, in this post, I’ll be taking a slightly different approach. I’ll be walking you through a pattern that can be used to supplement those native features using Azure API Management (APIM).
For those who are unfamiliar with APIM, it is Azure’s API Gateway PaaS (platform-as-a-service) offering. Like any good API Gateway, it provides an abstraction layer away from backend APIs, which allows you to add additional authentication/authorization controls, throttling, transform requests, and log information from the requests and responses. In this post, I’ll be covering how the authentication/authorization controls can be used to supplement what is provided natively in AOAI.
I’ve covered authentication in the AOAI in a previous post, refer to that post for the gory details. For the purposes of this post, you need to understand at the data plane it supports both Azure AD authentication/Azure RBAC authorization and authentication with two API keys created when the service is instantiated.

To my knowledge, there is no way to disable the usage of API keys. Moreover, as I’ve discussed in my logging post, it is extremely difficult to trace back to what is using the API keys because the source IP address is masked and the calls aren’t associated with specific API keys or Azure AD identities. This makes it critically important to control who has access to the API keys. In my post on authorization within the service, I cover this conversation in more detail, and yes, it can be done with Azure RBAC.

Controlling access should be your first priority. However, wouldn’t it be great to restrict access to the service to Azure AD authentication only? This is where APIM comes in. APIM is placed between the application calling the AOAI service and the AOAI service. This establishes a man-in-the-middle scenario where APIM can analyze and modify the request and responses between the application and AOAI service.

The image above is an example of this pattern. Here, the calling application is provisioned with either a service principal (running outside of Azure) or a managed identity (running within Azure or integrated with Azure Arc). Instead of pointing the application directly to the Azure OpenAI Service, it is pointed to a custom domain configured on the APIM instance, and the APIM instance is configured to front the Azure OpenAI Service API. My peer Jake Wang put together some wonderful instructions on how to set this piece up in this repository.
Once APIM is set up to pass traffic along to the AOAI service, a custom APIM policy can be introduced to start controlling access. Since the goal is to limit access to the AOAI service to applications using an Azure AD identity, the validate-jwt policy can be used. This policy captures and extracts the JSON Web Token (bearer token) and parses the content within it to verify that the token was issued by the issuer specified in the policy.
The policy would be structured as shown below. In this policy, any request made to the API must include a JWT issued by the Azure AD tenant (you can find your tenant ID here). Additionally, the policy filters to ensure that the token is intended for the Cognitive Services OAuth scope, which AOAI falls under. If the request doesn’t include the JWT issued by the tenant, the user receives a 403.
<!--
This sample policy enforces Azure AD authentication and authorization to the Azure OpenAI Service.
It limits the authorization tokens issued by the organization's tenant for Cognitive Services.
The authorization token is passed on to the Azure OpenAI Service ensuring authorization to the actions within
the service are limited to the permissions defined in Azure RBAC.
You must provide values for the AZURE_OAI_SERVICE_NAME and TENANT_ID parameters.
-->
<policies>
<inbound>
<base />
<set-backend-service base-url="https://{{AZURE_OAI_SERVICE_NAME}}.openai.azure.com/openai" />
<validate-jwt header-name="Authorization" failed-validation-httpcode="403" failed-validation-error-message="Forbidden">
<openid-config url="https://login.microsoftonline.com/{{TENANT_ID}}/v2.0/.well-known/openid-configuration" />
<issuers>
<issuer>https://sts.windows.net/{{TENANT_ID}}/</issuer>
</issuers>
<required-claims>
<claim name="aud">
<value>https://cognitiveservices.azure.com</value>
</claim>
</required-claims>
</validate-jwt>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>
If you followed the instructions in the repository I linked above, you can enforce this policy for the API you created as seen below.

Once the policy is in place, you can test it by attempting to authenticate to the APIM API endpoint and specifying an AOAI API key. In the image below, an attempt is made to call the endpoint with an API key.

Success! Even though the API key is valid, APIM is rejecting the request before it ever reaches the AOAI instance, preventing the API keys from being used.
This pattern also passes the bearer token on to the AOAI service, so the RBAC you configure on your AOAI instance will be enforced. In my post on authorization, I provide some guidance on which built-in RBAC roles make since and which permissions you’ll want to carefully distribute.
What’s even cooler is that now that the application is forced to authenticate using Azure AD, the application ID can be extracted. If there are multiple applications hitting the same AOAI instance, different throttling can be applied on a per-application basis instead of having them share one big pool of request/token allowance at the AOAI service level.
This can be achieved with a policy similar to the one shown below. This policy looks for specific app IDs in the bearer token and applies different throttling based on the application.
<!--
This sample policy enforces Azure AD authentication and authorization to the Azure OpenAI Service.
It limits the authorization tokens issued by the organization's tenant for Cognitive Services.
The authorization token is passed on to the Azure OpenAI Service ensuring authorization to the actions within
the service are limited to the permissions defined in Azure RBAC.
The sample policy also sets different throttling limits per application id. This is useful when an organization
has multiple applications consuming the same instance of the Azure OpenAI Service. This sample shows throttling
rules for two separate applications.
You must provide values for the AZURE_OAI_SERVICE_NAME, TENANT_ID, and CLIENT_ID_APP parameters. You can add multiple
lines for as many applications as you need to throttle.
-->
<policies>
<inbound>
<base />
<set-backend-service base-url="https://{{AZURE_OAI_SERVICE_NAME}}.openai.azure.com/openai" />
<validate-jwt header-name="Authorization" failed-validation-httpcode="403" failed-validation-error-message="Forbidden">
<openid-config url="https://login.microsoftonline.com/{{TENANT_ID}}/v2.0/.well-known/openid-configuration" />
<issuers>
<issuer>https://sts.windows.net/{{TENANT_ID}}/</issuer>
</issuers>
<required-claims>
<claim name="aud">
<value>https://cognitiveservices.azure.com</value>
</claim>
</required-claims>
</validate-jwt>
<choose>
<when condition="@(context.Request.Headers.GetValueOrDefault("Authorization","").Split(' ').Last().AsJwt().Claims.GetValueOrDefault("appid", string.Empty).Equals("{{CLIENT_ID_APP1}}"))">
<rate-limit-by-key calls="1" renewal-period="60" counter-key="@(context.Request.Headers.GetValueOrDefault("Authorization","").Split(' ').Last().AsJwt().Claims.GetValueOrDefault("appid", string.Empty))" increment-condition="@(context.Response.StatusCode == 200)" />
</when>
</choose>
<choose>
<when condition="@(context.Request.Headers.GetValueOrDefault("Authorization","").Split(' ').Last().AsJwt().Claims.GetValueOrDefault("appid", string.Empty).Equals("{{CLIENT_ID_APP2}}"))">
<rate-limit-by-key calls="10" renewal-period="60" counter-key="@(context.Request.Headers.GetValueOrDefault("Authorization","").Split(' ').Last().AsJwt().Claims.GetValueOrDefault("appid", string.Empty))" increment-condition="@(context.Response.StatusCode == 200)" />
</when>
</choose>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>
While the above is impressive, it only works if the application is restricted from direct access to the Azure OpenAI Service. To achieve this, I recommend creating a Private Endpoint for the AOAI service and wrapping a Network Security Group around the subnet (NSGs are now supported for private endpoints) to block access to the resources within the subnet to anything but the APIM instance. Keep in mind that the APIM instance needs to be able to access resources within the virtual network, which means that an APIM needs to be deployed in internal mode. The architecture could look similar to the image below.

One thing to note is that if access is blocked as described above, it will break the AOAI studio feature within the Azure Portal. This is because calls to the data plane of the AOAI service are now blocked. A workaround could be to use a jump host or shared server if you need to continue supporting that feature. However, that opens up the risk that someone could write some code while on that machine and use the API keys.
Let me sum up what we learned today:
- APIM policies can be used to enforce Azure AD authentication and can block the use of API keys.
- You must lock down the Azure OpenAI Service to just APIM to make this effective. Remember this will break access to the Studio within the Azure Portal.
- Since you’re forcing Azure AD authentication, you can use the application id to add custom throttling.
That’s all for this post. The policy samples used in this blog have been uploaded to this repository on GitHub. Feel free to experiment with them and build upon them. If you end up building upon them and doing anything interesting, do reach out and let me know. I’m always interested in geeking out! In my next post, I’ll cover how to use an APIM policy to create custom logging that can be delivered to an Event Hub and consumed by the upstream service of your choice. Have a great week!