Entra ID – Deep Dive – Protocol Primer – Part 2

This is part of my series on Microsoft Entra ID:

  1. Entra ID – Deep Dive – The Basics – Part 1
  2. Entra ID – Deep Dive – Protocol Primer – Part 2

Welcome back folks. Today I’ll be continuing my deep dive series into Entra. In my last post I went over the basics of Entra ID covering what it is at a high level and how it handles human and non-human identities. One of the features of Entra ID that I highlighted in that post was that it provides authentication services. It is capable of providing authentication of a human or non-human through older protocols like Kerberos (don’t get me started on this feature or else I’ll spend the whole post ranting) and LDAP (through Entra ID Domain Services, another service I hate), but also more modern protocols such as SAML and OIDC (OpenID Connect). Before I dive into Microsoft’s implementation of OIDC, I figured it was a good time to do a light protocol primer (primarily for my own benefit because I can only re-read the RFC so many times before it stops being fun. Yeah I find it fun to read a good RFC, so what?).

What is OAuth?

You are probably thinking, “Why the hell are we talking about OAuth?” We need to talk about OAuth (Open Authorization) because OIDC is built on top of the OAuth protocol. If you have a basic understanding of OAuth, then OIDC makes a lot more sense. I’m not going to try to make you an expert, because to make you an expert I’d need to expert which I am very very very far from. Instead, I’m going to give you the basics. If you want a better/smarter explanation, start with the RFC(s) and then take a read through Vittorio Bertocci’s (an absolute legend) many articles, ebook, and videos online.

There are a lot of misconceptions out there where folks will talk about OAuth authentication, which is not a real thing. OAuth itself exists as an authorization protocol to provide a framework (lots of SHOULDs/COULDs and not a ton of MUSTs in that RFC) for how applications can get limited access to a user’s data based around the user’s consent, aka delegation.

The protocol refers to this limited access as a scope. The assignment of a specific scope to an application gives you the ability to do delegation vs impersonation. In the latter, the application will typically act as you with your full permission set vs with delegation you grant consent for the application to access a subset of your data with a more restricted set of permissions. A good example would be delegating the application the read permission over your email vs the reading, writing new emails, and deleting child emails.

When it comes to the whole process of a user delegating a scope of access to an application, a number of different roles are involved. These roles include:

  • Client
  • Resource Owner
  • Authorization Server
  • Resource Server

The client is an application that needs to access some data. Within the protocol it’s important to divide applications into a few different buckets, because the protocol supports them in different ways as I’ll cover in a bit. The standard breaks them into three buckets: web applications, browser-based applications, and native applications. I’m going to keep it simple and break it into two which include web applications and non-web applications.

Clients can be either public clients (non-web apps) or confidential clients (web apps). Confidential clients have some type of credential they use to authenticate themselves to the authorization server where as public clients do not (because their code runs on the user’s machine so there is no way to secure the credential). Clients must register with the authorization server either through dynamic registration (which Entra ID does not support today) or through some other type of process. Registration, at a minimum, will include providing the authorization server with a redirectUri, which grant types it will use, and whether it’s a public or confidential client. The client is then issued a unique identifier called a client_id and optionally a client_secrete if a confidential client. We’ll see an example of how Entra does it later on this series. Examples of clients could be applications you develop, third-party applications you integrate with, or Microsoft-native applications like Microsoft Teams.

The resource owner is the user or organization that owns the data the client wants to access and is the entity that is capable of granting access to that data through a consent process. Consent is a major focus in OAuth since it relies on delegation of a specific scope of access to the data. Consent is the process of the resource owner approving that delegation. Resource owners in the Entra ID world are going to be business units whose employees are represented as user objects in the tenant.

The authorization server is the role that glues all other roles together. This is the server that will authenticate the user (OAuth doesn’t care how), get the user’s consent for the client to access the data, and issues an access token to the client. Entra ID fulfills this role in the Microsoft cloud world.

The resource server hosts the resource owner’s data. It will consume the access token obtained by the client from the authorization server and allow or deny access to the data. Resource servers in the Microsoft world could be your custom built application or the Microsoft Graph API.

The RFC has a basic diagram which does a good job explaining the flow at a high level.

High level OAuth flow

You’ll notice the the term authorization grant in the above image. An authorization grant represents the resource owner’s authorization (delegation) of a specific scope of access to their data and is used by the client to get an access token which the resource server consumes and approves/denies access. In the base specification for OAuth 2.10, there are three types of grants (there are a ton of extension grants, some of which we’ll cover in this series) which include the authorization code grant, the refresh token grant, and the client credentials grant.

Before I describe the grant types, it’s worth calling out that I’m going to be talking specifically about OAuth 2.1 (which is still a draft RFC right now). OAuth 2.1 seeks to address a lot of the security issues with OAuth 2.0. In OAuth 2.0 there were a bunch more grant types including resource owner credentials flow and the implicit flow there are somewhat of security nightmares. OAuth 2.1 removes those grant types and the official spec sticks to the three I described above while adding an additional requirement for PKCE Proof-Key for Code Exchange) for both public AND confidential clients. PKCE helps to address authorization code interception attacks. This Okta article does a great job describing the security benefit brings. I’ll demonstrate this with MSAL in a later post. Now back to the authorization grant types.

The authorization code grant type involves sending the resource owner to the authorization server to authenticate and consent to the client’s access of their data, returning an authorization code to the client, and the client exchanging that with the authorization server for an access token. This is going to be your go to grant type any delegation use case. An example of this would be an application accessing my data in a storage account a storage account that belongs to me.

Authorization Code Flow

Next up is the client credentials flow grant type. In this flow there is no user consent because the data the client is trying to access is under its control. Essentially, it uses its own identity context to access the data because it’s already been authorized to do so. An example here would be an application pulling Entra ID sign-in logs from the MS Graph API.

Client Credentials Flow

Lastly, we have the refresh token grant. This grant type is used by the client to exchange a refresh token for a fresh access token. Access tokens must be short lived (typically around an hour). Instead of having the resource owner go through the whole authentication and consent process again, the client can exchange its longer living refresh token (if it requested one) for a new access token of the same or lesser scope.

In addition to grant types above there are extension grant types. The one that will be relevant to this series is the jwt bearer type, or more formally the JSON Web Token (JWT) profile. In the Microsoft world, you’ll see this referred to as the on-behalf-of flow. This is the flow that Microsoft will use for any multi-hop OAuth. There is also another newer grant type to be aware of which is the token exchange flow. This has a similar use case as the jwt bearer flow for multi-hop OAuth but isn’t limited to JWTs and provides additional information in the access token which can be very helpful in identifying client (actor) vs the resource owner (subject) in the access token. Entra doesn’t support this flow to my understanding, so you’ll be using jwt-bearer instead for multi-hop flows as we’ll see in a future post.

In an authorization request will look something like the below:

GET /authorize?response_type=code&client_id=s6BhdRkqt3&state=xyz
&redirect_uri=https%3A%2F%2Fclient%2Eexample%2Ecom%2Fcb
&code_challenge=6fdkQaPm51l13DSukcAH3Mdx7_ntecHYd1vi3n0hMZY
&code_challenge_method=S256&scope=User.Read HTTP/1.1
Host: server.example.com

In the above example we see the client is requesting the authorization code grant type, is specifying its client id and its redirect_uri (which were established during client registration), a code challenge (for PKCE), and the scope of access it is requesting.

Access tokens come in a few flavors which you can read about in the RFC. The most common type of token is a bearer token. The bearer token is exactly what it sounds like, whoever bears the token holds the power! Bearer tokens are typically JWTs (JSON Web Tokens). While RFC doesn’t specifically require the access token to be cryptographically signed, the ones that Entra ID issues are. The public key used to verify the signature can be obtained for Entra ID from a public metadata endpoint we’ll see later.

Here is a sample access token issued by Entra:

{
"typ": "JWT",
"alg": "RS256",
"kid": "ABC123XYZ789KeyIdentifier"
}
{
"aud": "api://backend-app-client-id",
"iss": "https://login.microsoftonline.com/tenant-id/v2.0",
"iat": 1780963927,
"nbf": 1780963927,
"exp": 1780968246,
"aio": "AaQAW/8cAAAA...sessionData...",
"azp": "frontend-app-client-id",
"azpacr": "1",
"name": "John Doe",
"oid": "user-object-id-guid",
"preferred_username": "john.doe@example.com",
"rh": "1.AbcA...refreshTokenHash...",
"scp": "user_impersonation",
"sid": "session-id-guid",
"sub": "subject-claim-unique-identifier",
"tid": "tenant-id-guid",
"uti": "unique-token-identifier",
"ver": "2.0",
"xms_ftd": "xEyJj...federationMetadata..."
}

There are a few important endpoints the client needs to know about for the authorization server. This includes the authorization endpoint (where the resource owner is sent to authenticate and consent) and the token endpoint (where the client obtains an access token). These can be retrieved via a metadata endpoint. This is actually how Entra ID does it as we’ll see in a future post.

Ok, with that you should now have a high level understanding of OAuth and be aware of its role as an authorization protocol. Key in on that word, authorization. When I perform an OAuth flow I get an access token back to my app that I can use to access a resource owner’s data, but I would still need to authenticate the user to my application and get some basic profile information via another means. In comes OIDC.

What is OpenID Connect?

Like the prior section, my goal is give you a primer. If you want the gory details, take a read through the specification (another tolerable if not enjoyable read). Microsoft and Auth0 have solid one pagers if reading specifications isn’t your style.

The OIDC protocol is built on top of the OAuth (Open Authorization) protocol to provide an identity layer and authentication layer. It gives us the means get some assurance that the user is who they say they are and get some basic information about the user.

Within the OpenID protocol there are three roles that exist. These include:

  • End User
  • RP (Relying Party)
  • OP (OpenID Provider

Since the protocol is built on top of the OAuth protocol, these roles will map nicely to the OAuth roles as we’ll see.

We first have the end user. The end user is the human participant that will be access our application. They are very often also the resource owner for any data we may want to access about them as we’ll see later.

Next, we have the RP. The RP is the application that is requesting end user authentication and claims (or data/attributes) about the user. This will be an OAuth client application.

Finally we have the OP. The OP is the server capable of authenticating the user and providing claims about the user’s identity. This role is fulfilled by the OAuth authorization server in all (I don’t believe their are exceptions, but feel free to correct me) instances.

The OP will issue a security token referred to as an id token with claims about the authentication of the end user, including claims about the user.

This is another instance where the specification did a great job with a high level flow diagram.

High-level OIDC Flow

The structure of the authentication request is almost identical to the structure of an authorization request in OAuth.

GET /authorize?response_type=code&client_id=s6BhdRkqt3
&redirect_uri=https%3A%2F%2Fclient%2Eexample%2Ecom%2Fcb
&scope=User.Read+openid+profile
&state=random-state-value
&nonce=random-nonce-value
&code_challenge=IFrWuREBBR_QJ39q5Ts4
&code_challenge_method=S256

Notice above that we have an added state and nonce. The state helps to provide CSRF (cross-site request forgery) attacks, such as fooling the victim into accessing an attacker’s account to get them to upload data or perhaps purchase things for the attacker’s account. The nonce helps to mitigate id token replay attacks (app validates the nonce in the id token matches what it expects for the user’s session). Now the major things to pay attention to is the additional scopes. Here we see the openid and profile scopes. The openid scope tells the OP the client is looking for an id token. The profile scope is an optional scope which tells the OP to include additional claims in the id token such as the user’s name, preferred_username, and the like.

Once the RP (client / application) exchanges is authorization code (think authorization code grant) to the OP/authorization server, it returns back the an id token in addition to the access token. The application can then use the claims in the id token to identify the user, get information into how the user authenticated, get group information, and anything else you can stuff into the claims. It gives the application context about the user.

Below is an example id token’s payload. ID tokens follow the JWT standard and are cryptographically signed by a private key held by the OP. Clients will need to verify the signature using the OPs public key which is usually published in the OIDC discovery endpoint which looks something like this https://{issuer}/.well-known/openid-configuration. We’ll see an example when we break down Entra’s implementation.

{
"iss": "http://exmaple.com.com",
"sub": "123456",
"aud": "myclientid",
"exp": 1311281970,
"iat": 1311280970,
"name": "Homer Simpson",
"given_name": "Homer",
"family_name": "Simpson",
"gender": "male",
"birthdate": "2025-10-31",
"email": "homersimpson@example.com",
"picture": "http://example.com/homersimpson/me.jpg"
}

Your takeaways

At this point you should have a reasonably decent high level understanding of OAuth/OIDC. If you are already an “expert” you likely snorted milk through you nose reading my shitty explanation. What I mainly want you to take away from this is that OAuth is an authorization protocol with OIDC providing an authentication layer nicely on top. I like to think of OAuth as the cake with OIDC as the frosting. That top layer doesn’t work without the bottom layer (be quiet you straight frosting eaters) and the bottom layer is very bland and incomplete without the top layer.

In my next post I’ll walk through building out the required components in Entra ID for the frontend application. You’ll read and recognize properties that translate directly back to these two protocols. Other properties may not have the same name, but you’ll understand why they exist.

Alright, my brain is fried. Enjoy the weekend!

Entra ID – Deep Dive – The Basics – Part 1

Entra ID – Deep Dive – The Basics – Part 1

This is part of my series on Microsoft Entra ID:

  1. Entra ID – Deep Dive – The Basics – Part 1
  2. Entra ID – Deep Dive – Protocol Primer – Part 2

Yeah… You read that right. I’m finally biting the bullet and doing a series on Entra ID. There are some cobwebs there, but once upon a time I was a half-baked “identity guy”. Nah, I’m not returning to that place, but I have had to visit it more frequently over the past few months. With the growing use of agents, identity has one again become of those things that apparently everyone is a so called “expert in”. I aint no expert, but I have been digging into the chatter around the implications of agents into the identity world. Given my employer, that has been spending some time in the Entra ID Agent Identity space.

Digging into that demanded I go back to some of the basics of Entra ID such as the purpose of an application registration versus a service principal and the request flow when authenticating a user to an application using OIDC (OpenID Connect) or executing an on-behalf-of flow in OAuth. I had conceptual knowledge of the above, but it was too ivory tower. I needed to eat the dog food and do it. After spending a few weeks reading through the documentation, reviewing captured requests and responses, reviewing RFCs, and poorly coding some applications to exercise on the concepts I figured it was time to brain dump what I learned before I get distracted with some new shiny widget and forget 60% of it.

So yeah… that’s where this series is coming from. Hopefully some folks out there draw some value out of it beyond serving as a refresher for my neural pathways.

With that rambling out of the way, let’s get to it.

WTF is Entra ID?

Many years ago when Microsoft first introduced Azure Active Directory my peers and I scratched our heads with this question. Given the name, the initial assumption was it was the Windows Active Directory “killer” (stop trying to make that happen, it aint gonna happen). That seems to have been a pretty common assumption given what I’ve heard from customers over the years as I sat in the vendor space and is likely why the name was shifted a few years back to Entra ID. Either that or marketing needed to justify their existence with another product rename.

If you want the professional explanation as to what Entra ID is you can read the public documentation and bask in the marketing mumbo jumbo. Since you’re here, you’re going to get the less fancy and quick and dirty Matt Felton explanation. My take is that Entra ID does a lot of shit, but at its core it is:

  • An identity store for the identities of human and non-human security principals, their attributes, and their credentials.
  • An authentication service which can act as an OAuth Authorization Server, OIDC identity provider, SAML IdP / SP, and even Kerberos / LDAP provider (gross)

It provides these core services to Microsoft’s cloud services such as M365, Azure, Dynamics 365, and whatever other clouds Microsoft has that I’m missing. It can be extended to provide these services to other applications by integrating with it via one of the supported protocols like we’ll see in further posts in this series.

Entra ID is divided into separate tenants which represent unique identity boundaries. Services for a given customer (such as an Azure subscription) are associated to an Entra ID tenant which acts as the identity boundary for those resources. Tenants can trust other tenants to facilitate cross tenant accessing of resources through features like Entra ID External ID.

The identity store piece of the Entra ID service is where users, groups, many types of service principals, applications, and device identities are stored. Similar to an LDAP (hell may be LDAP under the hood, who the hell knows) each of these objects has a schema with specific properties that are consumed for the other services Entra provides (as we’ll see later).

That should be enough context to give you a very general idea of what Entra ID is. Like I mentioned above, it does a lot of shit (conditional access, privileged identity management, etc) that isn’t relevant to the point of this series and that I’m not going to discuss. If you can hammer it in your head those two core functions, it will be enough to get you through this series.

Humans vs Non-Humans

Within Entra ID we have two primary objects that represent humans and non-humans. For humans we have the user identity. For non-humans we have service principals. Service principals come in many flavors including the base service principal (consider legacy), applications, managed identities, Agent Identity Blueprints, and Agent Identities (likely others I’m missing), Agent Users. The way I like to think about the many types of service principals (probably incorrect but I don’t care) is that the base service principal is the parent object class and things like managed identities, agent identity blueprints, and agent identities are children of that object class which look a bit different. Agent Users are a bit weird and I haven’t delved into them much, however, I like to think of it as a child of the base user object class.

The other object type that’s important to understand is the application object (typically referred to as the app registration). The application object is interesting (and can be confusing). It exists to represent a globally unique representation of an application. For a given application, you only ever have a single application object which exists in the tenant it was registered. What’s helped me is to think of the application object as the OAuth client registration. That’s probably not completely correct, but it helps ground you in its purpose. The application object can have a credential (secret, certificate, federated) which allows it to authenticate to Entra ID (think OAuth confidential client). It also contains information critical to OAuth such as the client id (appid), the audience (identifierUris), the OAuth scopes it supports (oauth2PermissionScopes), and redirectUris (when using interactive OAuth flows).

Below you’ll see an example of an application object for an application that is acting as a backend API to the demo solution I put together for this series.

Application Demo backend app for Entra authentication already exists and its id is 23f7d0f0-85e6-453a-b0bb-723b65ac7958
{
"id": "23f7d0f0-85e6-453a-b0bb-723b65ac7958",
"deletedDateTime": null,
"appId": "22d2ff53-9442-404c-8da5-01c2e135532d",
"applicationTemplateId": null,
"disabledByMicrosoftStatus": null,
"createdByAppId": "04b07795-8ddb-461a-bbee-02f9e1bf7b46",
"createdDateTime": "2026-06-08T23:46:22Z",
"displayName": "Demo backend app for Entra authentication",
"description": "This app is used to demonstrate a backend API that uses the user's identity context to fetch a user's story'",
"groupMembershipClaims": null,
"identifierUris": [
"api://22d2ff53-9442-404c-8da5-01c2e135532d"
],
"isDeviceOnlyAuthSupported": null,
"isDisabled": null,
"isFallbackPublicClient": false,
"nativeAuthenticationApisEnabled": null,
"notes": null,
"publisherDomain": "XXXXXXXX.onmicrosoft.com",
"serviceManagementReference": null,
"signInAudience": "AzureADMultipleOrgs",
"tags": [],
"tokenEncryptionKeyId": null,
"uniqueName": null,
"samlMetadataUrl": null,
"defaultRedirectUri": null,
"certification": null,
"optionalClaims": null,
"servicePrincipalLockConfiguration": null,
"requestSignatureVerification": null,
"addIns": [],
"api": {
"acceptMappedClaims": null,
"knownClientApplications": [],
"requestedAccessTokenVersion": 2,
"oauth2PermissionScopes": [
{
"adminConsentDescription": "Allow the application to impersonate the signed-in user to access their user story.",
"adminConsentDisplayName": "Impersonate user",
"id": "00000000-0000-0000-0000-000000000001",
"isEnabled": true,
"type": "User",
"userConsentDescription": "Allow the application to impersonate you to access your user story.",
"userConsentDisplayName": "Impersonate user",
"value": "user_impersonation"
}
],
"preAuthorizedApplications": []
},
"appRoles": [],
"info": {
"logoUrl": null,
"marketingUrl": null,
"privacyStatementUrl": null,
"supportUrl": null,
"termsOfServiceUrl": null
},
"keyCredentials": [],
"parentalControlSettings": {
"countriesBlockedForMinors": [],
"legalAgeGroupRule": "Allow"
},
"passwordCredentials": [
{
"customKeyIdentifier": null,
"displayName": null,
"endDateTime": "2028-06-08T23:46:45.7547438Z",
"hint": "NnW",
"keyId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX",
"secretText": null,
"startDateTime": "2026-06-08T23:46:45.7547438Z"
}
],
"publicClient": {
"redirectUris": []
},
"requiredResourceAccess": [
{
"resourceAppId": "e406a681-f3d4-42a8-90b6-c2b029497af1",
"resourceAccess": [
{
"id": "03e0da56-190b-40ad-a80c-ea378c433f7f",
"type": "Scope"
}
]
}
],
"verifiedPublisher": {
"displayName": null,
"verifiedPublisherId": null,
"addedDateTime": null
},
"web": {
"homePageUrl": null,
"logoutUrl": null,
"redirectUris": [],
"implicitGrantSettings": {
"enableAccessTokenIssuance": false,
"enableIdTokenIssuance": false
},
"redirectUriSettings": []
},
"spa": {
"redirectUris": []
}
}

Now comes the importance of the service principal. An application object will typically have an associated service principal (not much use without it) which acts as the security principal for the application in the given Entra ID tenant. Applications have one application object (or app registration) but many service principals (if it’s multi-tenant). Think of the service principal as the “stub” identity representing the application in the Entra ID tenant. Permissions are granted to the service principal and the application uses the service principal to exercise those permissions to do things such as querying the Microsoft Graph API.

If you build a multi-tenant application like I’ve done here, other Entra ID tenants can create a service principal for the application in their tenant allowing the application to possibly authenticate those users and access resources in that Entra ID tenant.

The official documentation likes to refer to the application object as the template for the application and the service principal as the security principal. I think that’s a pretty damn good single sentence explanation.

Below is an example of the service principal for the frontend application I built for this series. You can see the type of service principal (servicePrincipalType) in this scenario is application because this service principal has an associated application object (app registration).

{
"id": "ece073e5-744e-4d70-b79d-4887e1dd008f",
"deletedDateTime": null,
"accountEnabled": true,
"alternativeNames": [],
"appDisplayName": "Demo frontend app for Entra authentication",
"appDescription": "This app is used to demonstrate a frontend application where a user authenticates using Entra ID authentication via OIDC",
"appId": "afbd7539-a21f-4d11-93a3-490017032fb7",
"applicationTemplateId": null,
"appOwnerOrganizationId": "6c80de31-d5e4-4029-93e4-5a2b3c0e1299",
"appRoleAssignmentRequired": false,
"createdByAppId": "04b07795-8ddb-461a-bbee-02f9e1bf7b46",
"createdDateTime": "2026-06-08T23:45:54Z",
"description": null,
"disabledByMicrosoftStatus": null,
"displayName": "Demo frontend app for Entra authentication",
"homepage": null,
"isDisabled": null,
"loginUrl": null,
"logoutUrl": null,
"notes": null,
"notificationEmailAddresses": [],
"preferredSingleSignOnMode": null,
"preferredTokenSigningKeyThumbprint": null,
"replyUrls": [
"http://localhost:8100/callback"
],
"servicePrincipalNames": [
"afbd7539-a21f-4d11-93a3-490017032fb7"
],
"servicePrincipalType": "Application",
"signInAudience": "AzureADMultipleOrgs",
"tags": [],
"tokenEncryptionKeyId": null,
"samlSingleSignOnSettings": null,
"addIns": [],
"appRoles": [],
"info": {
"logoUrl": null,
"marketingUrl": null,
"privacyStatementUrl": null,
"supportUrl": null,
"termsOfServiceUrl": null
},
"keyCredentials": [],
"oauth2PermissionScopes": [],
"passwordCredentials": [],
"resourceSpecificApplicationPermissions": [],
"verifiedPublisher": {
"displayName": null,
"verifiedPublisherId": null,
"addedDateTime": null
}
}

What are we going to build?

Now that you have the bare bones basics of Entra, you likely want to understand how an application would go about using it for authentication and authorization. While you might not be doing this now and it may not seem relevant, it will become very relevant to you if you begin building agents in Microsoft’s clouds through Microsoft Foundry, CoPilot Studio, or the 18 other random services Microsoft allows agents to be built. This will also be relevant to you if you’re going to consume Microsoft resources (such as Azure) from other clouds through an application or an agent. So yeah, you should understand what this looks like to do. The whole Entra ID Agent Identity feature builds on these foundational pieces.

To see these concepts in action I’m going to walk through a very simplistic solution with a frontend website and backend API in Python. The frontend website is built using the Flask Framework and the backend API is built with fastapi.

Demo application architecture

The frontend website will use Entra ID to authenticate the user and will be issued an OIDC id token to identify the user. The frontend will get some information from the user from the id token and access token it receives from Entra and will make additional calls to the Microsoft Graph API using the client credentials flow to grab other attributes of the user’s identity. I’ll also show how to include the user’s Entra ID group information in the id or access token and how to handle nested group membership.

The frontend will have a link on it to call the /story endpoint of the backend API. The story endpoint will pull a pre-built AI generated story about the user which has been uploaded toblob storage in an Azure Storage Account. The backend API will use the on-behalf-of flow to access the storage account as the user to pull the user’s specific story.

This will demonstrate some of the most common flows including authentication, OAuth client credentials flow, and OAuth on-behalf-of (or jwt bearer). I’ll show you how the id tokens and access tokens look in different scenarios pointing out the relevant claims and how they’re used upstream. I’ll even share some process flows so you understand what does what in a given flow.

My primary goal here is give you the basics so you can walk away with a more solid understanding of what’s happening under the hood and how Entra ID has decided to implement OIDC and OAuth. I can’t stress how helpful this will be for you once you start diving into the agent identity space (which I’ll be covering after this series).

Summing it up

Ok, so you know what you’re in for. This series is gonna be relatively deep in the weeds so bring your favorite caffeinated beverage for future posts. I’m doing everything direct with the REST APIs because I want to show you the gory details. No pretty SDKs for you. If you’ve had a “conceptual” idea of how this works without the implementation specifics (like I did before I went down this rabbit hole) this series should help to fill those gaps.

In the next post I’ll walk through setting up Entra for the frontend website, authenticating a user, and exploring the id token and access token. The post following that will walk through using the application’s identity context to get more information about the user from the Microsoft Graph API such as nested group membership, then I’ll finish up the series by walking through the on-behalf-of flow with the backend API to show you how to carry the user’s identity context through the application to the destination resource down the line.

See you next post!

Microsoft Foundry – BYO AI Gateway – Part 2

Microsoft Foundry – BYO AI Gateway – Part 2

This is part of my series on Microsoft Foundry:

  1. Microsoft Foundry’s Evolution
  2. Microsoft Foundry BYO AI Gateway (BYO Model) – Part 1
  3. Microsoft Foundry BYO AI Gateway (BYO Model) – Part 2
  4. Microsoft Foundry BYO AI Gateway (BYO Model) – Part 3
  5. Microsoft Foundry Publishing Foundry Agents to Microsoft Teams – Part 1
  6. Microsoft Foundry Publishing Foundry Agents to Microsoft Teams – Part 2

Hello again! Today I’m going to continue my series on Microsoft Foundry’s new support for the BYO AI Gateway. In my past few posts I’ve walked through the evolution of Foundry and covered at a high level what an AI Gateway is and the problem this feature solves. In this post we’re gonna get down and dirty with the technical details on setting this up within Microsoft Foundry. I’ll do a follow-up post to focus on the APIM (API Management) configuration. Grab your coffee and put on your thinking music (for me that is some Blink and Third Eye Blind. Yeah, I’m old.).

Let’s get to it!

Current State Architecture

My customer base is primarily in the regulated industry so most of my customers are still at the experimentation state with the Foundry Agent Service. Given these customers have strict security requirements they are largely using the agent service with the standard agent configuration. In this configuration the outbound traffic (subsets of it, but that is a much larger conversation) can be tunneled through the customer virtual network for centralized logging, mediation, and facilitating access to private resources (again, with limitations today) through what the product group calls VNet injection but I’d say is more closely described as VNet integration via a delegated subnet. Threads (conversations in v2 agents) and agent metadata are stored in a Cosmos DB, vector stores created by an agent from tools such as the File Search tool are stored in AI Search, and files uploaded to the Foundry resource by users are stored in a Storage Account. These resources are all provisioned by the customer into the customer subscription and fully managed by the customer (RBAC, encryption, HA settings, etc). Private Endpoints for each resource are created within the customer’s virtual network and made accessible from the agent delegated subnet. The whole environment looks similar to what you see below.

Foundry Agent Service – Standard Agent Configuration with VNet Injection

As I covered in my last post, as of the date of this post Foundry native agents can only consume models deployed to their own Foundry resource. This creates an issue for customers wanting the governance of the models, visibility into the use of the LLMs, and improvements security posture and operational optimizations an AI Gateway can provide when it sits between the agent and the model. For now, customers are working around doing this using what I refer to as external agents. External agents run outside of Microsoft Foundry on customer-managed compute like an on-premises Kubernetes cluster or an Azure Function deployed to the customer subscription. The downfall of this direction is these external agents live on compute customers have to manage and can’t access many of the tools available to Foundry-native agents. This is the problem the BYO AI Gateway feature is attempting to fix.

No BYO Gateway vs BYO Gateway

Foundry resource architecture

Here is where the new connection type introduced in Foundry comes to the rescue. Before I dive into the details of that, I think it’s helpful to level set a bit on the resource hierarchy within Foundry. At the top is the top-level Azure resource referred to as the Foundry service which under the hood is a Cognitive Services account. The relevant resources for this discussion are below the account resource and are projects, deployments, and connections. Projects serve a few purposes with two of them being logical boundaries around connections (at the management plane) and agents (at the data plane) provisioned under the projects. Deployments of models (such as GPT-5) are children of the account and are made available to all projects within the account. The account can also have connections objects which can be shared across projects.

Relevant resource hierarchy

For the purposes of this discussion, I’m going to focus on the connection objects. Connection objects can be created at the account level and project level as discussed above. In the standard agent configuration, you’ll create a number of different connections during setup including connections to Cosmos, AI Search, and Azure Storage. Additional common connections could be to an App Insights instance for tracing or a Grounding With Bing Search resource to use with the Grounding with Bing tool. Connection objects will contain some type of pointer, like a URI and a credential. That credential is usually API Key, some Entra ID-based authentication mechanism, or general OAuth.

Connections are created at the account level when the Foundry account itself needs to access them. This could be for the usage of Content Understanding, to a Key Vault for storing connection secrets (API keys) in a customer subscription, an an App Insights instance used for tracing. From what I’ve observed, you will create connections at the account level if they need to be shared across all projects OR they’re used by the Foundry resource in general vs some type of project construct. Connections used by projects can also be created at the project level. When you provision a standard agent for example, you’ll create connection objects to the Cosmos DB, Storage Account, and AI Search resources mentioned above. The new category of connections for this post will be created at the project level. I’d had mixed behavior with how effectively connection objects at the account can be used downstream by the projects.

APIM and Model Gateway Connections

The BYO AI Gateway feature uses two new types of connection categories: ApiManagement and ModelGateway. These objects are the glue that allow the Foundry native agents to route requests for models through an AI Gateway. When we’re connecting to an APIM instance, you should ideally use the ApiManagement category and when you’re connecting to a third-party category you’ll use the ModelGateway category.

As of the date of this blog post, these connection objects have the following schema (relevant properties to this discussion only):

name: The name of the connection (needs to be less than 60 characters in my testing)
properties: {
category: ApiManagement or ModelGateway
target: The URI you want the agent to connect to
authType: For ApiManagement this can be ApiKey or ProjectManagedIdentity
credentials: This will be populated with the value of the API key if using that authType
isSharedToAll: true or false if you want this shared across all projects
# ApiManagement category with static models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
# Models discussed in detail below
models: "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
}
# ApiManagement category with dynamic discovery
metadata: {
deploymentAPIVersion: ARM API version for CognitiveServices/accounts/deployments API calls
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
}
# ModelGateway category with static models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
# Models discussed in detail below
models: "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
}
# ModelGateway category with dynamic models
metadata: {
deploymentInPath: true or false
inferenceAPIVersion: API version used for inferencing (not used if using OpenAI v1 API)
deploymentAPIVersion: ARM API version for CognitiveServices/accounts/deployments API calls
modelDiscovery: "{\"deploymentProvider\":\"AzureOpenAI\",\"getModelEndpoint\":\"/deployments/{deploymentName}\",\"listModelsEndpoint\":\"/deployments\"}"
}

I’ll walk through each of these properties in as much detail as I’ve been able to glean from them with my testing.

The category property is self-explanatory. You either set to this to ApiManagement (if using APIM) or Model Gateway (if using a third-party AI Gateway like a Kong or LiteLLM).

The target property is the URI you want the agent to try to connect to. As an example, if I create an API on my APIM instance for the v1 OpenAPI named openai-v1 my target would look like “https://myapim.azure-api.net/openai-v1/v1”. As of the date of this blog post, you MUST use the azure-api-net FQDN for the APIM. If you try to do a custom domain you’ll get an error back telling you that it’s not supported. I have a request into the product group to lift this limitation. I’ll update this if that is done. For third-party model gateway, this property serves the same purpose but can be any valid domain.

The authType property is going to be either ApiKey or ProjectManagedIdentity for an APIM connection. ProjectManagedIdentity will authenticate to the upstream APIM using the agent’s project’s Entra ID managed identity. When using ProjectManagedIdentity you must also specify the audience property and set it to cognitive services.azure.com if connecting to a backend Foundry resource hosting models. For a model gateway connection this will either be ApiKey or OAuth. Details on the OAuth setup can be found in the samples GitHub (I haven’t mucked with it yet). If you’re using the authType of ApiKey you additional need to pass the credentials property which includes a property of key with the API key similar to what you see below.

authType: ApiKey
credentials = {
key = MYAPIKEY
}

I haven’t messed extensively with the isSharedToAll property as of yet. For my use case I set this to false so each project got its own connection object. You may be able to create this object at the account level and set the isSharedToAll property, but I haven’t tested that yet. If you have, def let me know if that works.

Ok, now on to the property that can bring the most pain. Here we have the metadata property. This property is going to the main guts that makes this whole thing work. A few considerations, if doing this with Terraform or REST (can’t speak to Bicep or ARM), each of the properties I’m going to cover are CASE SENSITIVE. If you do the wrong casing, your connection object will not work. When connecting to an APIM or model gateway you can have Foundry either enumerate the models available (called dynamic discovery) or you can provide the exact models you want to expose (called static models).

Let’s first cover static models. Here is an example of me creating a connection to an APIM instance with static models using the authType or ProjectManagedIdentity. One thing to note is in my backend object in my APIM I’m appending /v1 to the backend path vs doing it in this connection object.

{
"id": "/subscriptions/X/resourceGroups/X/providers/Microsoft.CognitiveServices/accounts/X/projects/sampleproject1/connections/conn1apimgwstaticopenai-v1",
"name": "conn1apimgwstaticopenai-v1",
"properties": {
"audience": "https://cognitiveservices.azure.com",
"authType": "ProjectManagedIdentity",
"category": "ApiManagement",
"isSharedToAll": false,
"metadata": {
"deploymentInPath": "false",
"inferenceAPIVersion": null,
"models": "[{\"name\":\"gpt-4o\",\"properties\":{\"model\":{\"format\":\"OpenAI\",\"name\":\"gpt-4o\",\"version\":\"2024-08-06\"}}}]"
},
"target": "https://X.azure-api.net/openai-v1",
}

Since I’m using the v1 Azure OpenAI API, I don’t need to specify an inferenceAPIVersion. If I was using the classic API I’d need to specify the version (such as 2025-04-01-preview). Notice also I have set deploymentInPath to false. When set to true the connection will add the /deployments/deployment_name to the path. For the v1 API this isn’t required. Finally you got the models property. With a static model setup I list out the models I’m exposing to the connection. If you’re using Terraform, you MUST wrap the models in the jsonecode function. If you don’t, it will not work. The static model option is pretty helpful if you want to strictly control exactly what models the project is getting access to.

Let’s now switch over to dynamic discovery. Dynamic discovery requires you define a few additional operations inside of your API. The details can be found in this GitHub repo, but the basics of is you define an operation for a GET on a specific model and a LIST to find all the models available. These operations are management plane operations at the ARM API to retrieve deployment information. Here is an example of a setup with dynamic discovery using an APIM connection.

{
"id": "/subscriptions/X/resourceGroups/X/providers/Microsoft.CognitiveServices/accounts/X/projects/sampleproject1/connections/conn1apimgwdynamicopenai-v1",
"location": null,
"name": "conn1apimgwdynamicopenai-v1",
"properties": {
"audience": "https://cognitiveservices.azure.com",
"authType": "ProjectManagedIdentity",
"category": "ApiManagement",
"group": "AzureAI",
"isSharedToAll": false,
"metadata": {
"deploymentAPIVersion": "2024-10-01",
"deploymentInPath": "false",
"inferenceAPIVersion": null
},
"target": "https://X.azure-api.net/openai-v1",
},
"type": "Microsoft.CognitiveServices/accounts/projects/connections"
}

When doing the dynamic discovery, you’ll see the deploymentAPIVersion property set to the API version for the GET and LIST deployment operations of the ARM REST API. I added these operations into the API after I imported the v1 OpenAI spec. You can see an example in Terraform I put together in my lab repo. Dynamic discovery is a great solution when you want to the developer to have access to any new deployments you may push to the Foundry resources.

I’m not going to run through the ModelGateway connection categories because they will largely emulate what you see above with some minor differences. The official Foundry samples GitHub repo has the gory details. I also have examples in Terraform available in my own repo (if you dare subject yourself to reading my code).

Ok, so now you understand the basics of setting up the connection and what you need to do on the APIM side. For more details on setting up APIM you can reference this official repo.

Summing It Up

Ok, so you now you understand the basic connection object, how to set it up, and how it works. I’m going to cut it here and continue in another post where I’ll dig into the dirty details of how it looks to use this because I don’t want to overload your brain (and mine) with a super long post.

Before I jet I will want to provide some critical resources:

  1. My AMAZING peer Piotr Karpala has put together a repository with examples of this pattern (and some 3rd-party integrations) with Bicep. The stuff in there is gold. He was also my late night buddy helping me work through the quirks of this integration late at night. Couldn’t have gotten it done without him (or at least would have broken many keyboards).
  2. The Product Group’s official samples and explanations of the setup are located here. I’d highly recommending referencing them because they will always have more up to date instructions than my blog.
  3. I’ve put together some Terraform samples for my own purposes which are you welcome to reference, loot for your own means, and laugh at my pathetic coding ability. Check out this one for the Foundry portion and this one for the APIM portion.

And here are your tips for this post:

  1. RTFM. Seriously, read the official documentation. Today, this integration is challenging to put in place. If you try to lone wolf it, let me know how many keyboards end up being thrown through your window.
  2. If you’re coding in Terraform or making REST calls to create these connections, remember CASE SENSITIVITY matters. If you do the wrong case sensitivity, the resource will still create but it won’t work. You’ll get very frustrated trying to troubleshoot it.
  3. If you’re coding in Terraform don’t forget to use the jsonencode function on the models property. If you skip that, the resource will create but shit will not work.
  4. This is only supported for prompt agents today.
  5. Don’t forget this is public preview. So test it, but expect things to change and don’t throw this into production.

In the next post I’ll walk through how you can test the integration, some of the quirks and considerations for identity and authentication, and some of the neat APIM policy you can craft given some of the new information that is sent in the request.

See you next post!

Microsoft Foundry – The Evolution (Revisited)

Microsoft Foundry – The Evolution (Revisited)

Hi folks! In the past I did a series on the Azure OpenAI Service and Microsoft Foundry Hubs (FKA AI Foundry Hubs FKA AI Studio). Instead of going through and updating all those posts and losing the historical content and context (I don’t know about you, but I love have the historical context of a service) I’m instead going to preserve it as is and spin up a new series on the latest iteration of Microsoft Foundry. I’ll likely keep much of the general framework of the older series because it seemed to work. One additional piece I’ll be included in this series is some of the quirks of the service I’ve run into to potentially save you pain from having to troubleshoot it. For this first post, I’m going to start this off explaining how the service has involved. As always, my persona focus here is my fellow folks in the central IT and infrastructure space.

The history

Way back in 2023 the hype behind generative AI really started go insane. Microsoft managed to negotiate rights to host OpenAI’s models in Azure and introduced the Azure OpenAI Service. The demand across customers was insane where every business unit (BU) wanted it yesterday. Microsoft initially offered the service within the Cognitive Services framework under the Cognitive Services resource provider. This mean it inherited many of the controls native to Cognitive Services which included Private Endpoints, a limited set of outbound controls, support for API key and Entra ID authentication, and support for Azure RBAC for authorization. Getting the deployed was pretty straightforward with the hold-ups to deployment being more concerns about LLM security in general. Deployment typically looked like the architecture below.

Azure OpenAI Service

As folks started to build their AI applications, they tapped into other services under the Cognitive Services umbrella like Content Safety, Speech-to-Text, and the like. These services fit in nicely as they also fell under the Cognitive Services umbrella and had a similar architecture as the above, requiring deployment of the resource and the typical private endpoint and authentication/authorization (authN/authZ) configuration.

I like to think of this as stage 1 of the Microsoft’s AI offerings.

Microsoft then wanted to offer more models, including models they have built such and Phi and third-party models such as Mistral. This drove them to create a new resource called an AI Service resource. This resource fell under the Cognitive Services resource provider, and again inherited similar architectures as above. Beyond hosting third-party models, it also included and endpoint to consume OpenAI models and some of the pool of Cognitive Services. This is where we begin to see the collapse of Microsoft’s AI Services under a single top-level resource.

What about building AI apps though? This is where Foundry Hubs (FKA AI Studio) were introduced. The intent of Foundry Hubs were to be the one stop shop for developers to create their AI Apps. Here developers could experiment with LLMs using the playgrounds, build AI apps with Prompt Flow, build agents, or deploy 3rd party LLMs for Hugging Face. Foundry Hubs were a light overlay on top of the Azure Machine Learning (AML) service utilizing a new feature of AML built specifically for Foundry called AML Hubs. Foundry Hubs inherited a number of capabilities of AML such as its managed compute (to host 3rd party models and run prompt flows) and its managed virtual network (to host the managed compute).

Microsoft Foundry Hubs

While this worked, anyone who has built a secure AML deployment knows that shit ain’t easy. Getting the service working requires extensive knowledge of how its identity and networking configuration. This was a pain point for many customers in my experience. Many struggled to get it up and running due to the complexity.

Example of complexity of Microsoft Foundry IAM model

I think of the combination of AI Services and Microsoft Foundry Hubs as stage 2 of Microsoft’s journey.

Ok, shit was complicated, I ain’t gonna lie. Given this complexity and feedback from the customers, Microsoft got ambitious and decided to further consolidate and simplify. This introduced the concept of a new top-level resource called Microsoft Foundry Accounts. In public documentation and conversation this may be referred to as Foundry Projects or Foundry Resources. Since this is my blog I’m going to use my term which is Microsoft Foundry Accounts. With Microsoft Foundry accounts, Microsoft collapsed the AI Services and Foundry Hubs into a single top level resource. Not only did they consolidate these two resources, they also shifted Foundry Hubs from the Azure Machine Learning resource provider into the Cognitive Services resource provider. This move consolidated the Cognitive Services resource provider as the “AI” resource provider in my brain. It resulted in a new architecture which often looks something like the below.

Microsoft Foundry Accounts common architecture

This is what I like to refer to as stage 3, which is the current stage we are in with Microsoft’s AI offerings. We will continue to see this stage evolve which more features build and integrated into the Microsoft Foundry Account. I wouldn’t be surprised at all to see other services collapse into it as just another endpoint to a the singular resource.

Why do you care?

You might be asking, “Matt, why the hell do I care about this?” The reason you should care is because there are many customers who jumped into these products at different stages. I run across a ton of customers still playing in Foundry Hubs with only a vague understanding that Foundry Hubs are an earlier stage and they should begin transitioning to stage 3. This evolution is also helpful to understand because it gives an idea of the direction Microsoft is taking its generative AI services, which is key to how you should be planning you future of these services within Azure.

I’ll dive into far more detail in future posts about stage 3. I’ll share some of my learnings (and my many pains), some reference architectures that I’ve seen work, how I’ve seen customers successfully secure and scale usage of Foundry Accounts.

For now, I leave you with this evolution diagram I like to share with customers. For me, it really helps land the stages and the evolution, what is old and what is new, and what services I need to think about focusing on and which I should think about migrating off of.

Foundry evolution

Well folks, that wraps it up. Your takeaways today are:

  1. Assess which stage your implementation of generative AI is right now in Azure.
  2. Begin plans to migrate to stage 3 if you haven’t already. Know that there will be gaps in functionality with Foundry Hubs and Foundry Accounts. A good example is no more prompt flow. There are others, but many will eventually land in Foundry project.

See you next post!