Deep Dive into Azure AD and AWS SSO Integration – Part 5

Deep Dive into Azure AD and AWS SSO Integration – Part 5

I’m back yet again with the fifth entry into my series on integrating Azure AD and AWS SSO.  It’s been a journey and the series has covered a lot of ground.  It started with outlining the challenge with the initial integration of Azure AD and AWS using the AWS app in the Azure Marketplace.  From there it took a deep dive into the components of the solution and how it compares to a standard integration using your SAML provider of choice.  It continued with the steps necessary to configure Azure AD and AWS SSO to support the federated trust to enable single sign-on.  The fourth post explored the benefits of SCIM and went step by step on how to configure SCIM between the two services.  For this final post I’m going to cover a few different scenarios to demonstrate what’s possible with this new integration.

Before I jump into the scenarios, there is one final task that needs to be completed now that the federated trust and SCIM have been setup.  That task is setting up the permission sets in AWS SSO.  Permission sets are simply IAM policies (either AWS-managed or custom policies you create).  For those of you from the Microsoft Azure world, an IAM policy is a collection of permissions which define what a security principal (such as a user or role) is authorized to do.  They are most similar to an Azure RBAC role definition but more flexible and granular due to advanced features such as condition keys.  Permission sets are projected into the AWS accounts they are assigned to as AWS IAM roles.  These are the IAM roles the security principal assumes.

As I mentioned above, AWS SSO supports both AWS-managed IAM policies and custom IAM policies for permission sets.  If you go into the AWS Accounts menu option of AWS SSO you’ll see the accounts associated with the AWS Organization and which permission sets have been associated to the AWS accounts thus resulting in AWS IAM Roles being created within the AWS account.  In the image below you can see that I’ve provisioned two permission sets to account1 and account2.

accountassignments.pngThe permission sets tab displays the permission sets I’ve created and whether or not they’ve been provisioned to any accounts.  In the screenshot below you’ll see I’ve added four AWS-managed policies for Billing, SecurityAudit, AdministratorAccess, and NetworkAdministrator.  Additionally, I created a new permission set named SystemsAdmin which uses a custom IAM policy which restricts the principal assuming the rule to EC2, CloudWatch, and ELB activities.

permissionsets.png

Back on the AWS organization tab, if you click on an account you can see the AWS SSO Users or Groups that have been assigned to a permission set.  In the image below, you can see that I’ve assigned both the B2B Security Admins group and the Security Admins group to the AdministratorAccess permission set and the System Operators group to the SystemsAdmin permission set.

assignments.png

With permission sets out of the way, let’s jump into the scenarios.

Scenario 1 – Windows AD User, AD FS, Azure AD, AWS SSOscenario1.PNG

In this scenario the user is Bart Simpson who is a member of the System Operators group on-premises and exists authoritatively in a Windows AD forest.  A federated trust has been established with Azure AD using an instance of AD FS running on-premises. Azure AD has been integrated with AWS SSO for both SSO (via SAML) and provisioning (via SCIM).

Once Bart was logged into a domain-joined machine, I popped open a browser and navigated to My Apps portal at https://myapps.microsoft.com.  This redirected me to the Azure AD login screen.  Here I entered Bart’s user name.

bartazuread.PNG

Azure AD performed its home realm discovery process, identified that the domain jogcloud.com is configured for federated authentication, and redirected me to AD FS.  Take note I purposely broke integrated windows authentication here to show you each step.  In a correctly configured browser, you wouldn’t see this screen.

bartadfs.PNG

After I successfully authenticated to AD FS, I was bounced back over to Azure AD where the assertion was delivered.  Azure AD then whipped up a SAML assertion for AWS SSO, returned it to the browser, and redirected the browser to the AWS SSO assertion consumer URL.  AWS SSO consumed the assertion and authenticated Bart into AWS SSO displaying the AWS IAM Role selection page with the relevant roles he has permission to access.

bartawssso.PNG

Scenario 2 – Windows AD User, AD FS with Certificate MFA, Azure AD with Conditional Access, AWS SSO

scenario2.PNG

Scenario 1 is pretty simple, so let’s get fancy and layer on some security.  Here I added an access control policy into AD FS requiring certificate-based authentication for members of the Security Admins group.  Additionally, I added a conditional access policy in Azure AD requiring MFA for any user that is a member of that same group.

Since Homer Simpson regularly runs a nuclear reactor, he’s also the Security Admin for JOGCLOUD.  He has been made member of the Windows AD Security Admin group.

As a first step I again popped open a browser and navigated to the My Apps portal.  After Homer’s username was plugged in, Azure AD redirected me to the AD FS server.  I again broke IWA to capture each step in the process.

signin2

After the password challenge was satisfied, I was prompted to provide the appropriate user certificate.

signin3.PNG

From there I was authenticated to Azure AD and served up the My Apps portal.

myapps.PNG

Wondering why I wasn’t prompted for Azure MFA?  No, I didn’t misconfigure it (at least this time).  A not well documented feature (at least in my opinion) of Azure AD is that you can pass a claim asserting a user has satisfied the MFA requirement thus making for a better user experience because the user isn’t required to authenticate multiple times.  Yes folks, this means you can layer your traditional certificate-based authentication on top of Azure AD and AWS. 

mfaonprem.png

After selecting the AWS SSO app, I was signed into AWS SSO and presented with the role selection screen.

awsssosignin1.PNG

I then selected a one of the roles and was signed into the relevant AWS account assuming the AdministratorAccess IAM Role.

awsssosignin2

Scenario 3 – Azure AD B2B User, AWS SSO

scenario3.PNG

What if you have a multi-tenant situation due to an acquisition or merger or perhaps you farm out operations to a managed service provider?  No worries there, B2B is also supported with this pattern.  In this scenario I’m using a user sourced from tenant that has been invited via Azure AD’s B2B.  The user has been added to the B2B Security Admins group which exists authoritatively in the inviting tenant (jogcloud.com) and was synchronized to AWS SSO via SCIM.

Opening a browser and navigating to the My Apps portal kicks off Azure AD authentication and drops the user into their source tenant.  Once there I can change my tenant by selecting the profile icon and selecting the jogcloud tenant.

myappsmultiple.png

I’m then presented with the apps that I’m authorized to use in the jogcloud tenant, which includes the AWS SSO app.

guestmyapp.PNG

Azure AD kicks off the federated authentication and I’m presented with the AWS role selection page where I can choose to assume the AdministratorAccess role in two of the AWS accounts.

guestawsso.png

Scenario 4 – AWS CLI

I know what you’re saying now, “But what about CLI?”  Well folks, for that you can leverage the AWS CLI v2.  It’s still in preview right now, but I did test it using the user from scenario 2 and it worked flawlessly.  The experience is pretty anti-climatic so I’m not going to dive into it.  The user experience is similar to using the Azure PowerShell cmdlets in that a web browser instance is opened and guides you through the authentication process.

That will sum up this series.

Few technologies get me excited enough to write five posts, but this integration is really amazing.  With AWS hooking into Azure AD as effectively as they have (especially love the CLI integration), it reduces operational overhead and improves security which is a combination you rarely see together.  Most importantly, it puts the customer first by optimizing the user experience.  If you weren’t convinced on Azure AD’s capabilities as an IDaaS, hopefully this series has helped educate you as to the value of the platform.

With that I’ll sign off.  A big thanks to the AWS product team that worked on this integration.  You did an amazing job that will greatly benefit our mutual customers.

To the rest of you, I wish you happy holidays!

 

 

 

Deep Dive into Azure AD and AWS SSO Integration – Part 4

Deep Dive into Azure AD and AWS SSO Integration – Part 4

Today we continue exploring the new integration between Microsoft’s Azure AD (Azure Active Directory) and AWS (Amazon Web Services) SSO (Single Sign-On).  Over the past three posts I’ve covered the high level concepts of both platforms, the challenges the integration seeks to solve, and how to enable the federated trust which facilitates the single sign-on experience.  If you haven’t read through those posts, I recommend you before you dive into this one.  In this post I’ll be covering the neatest feature of the new integration, which is the support for automated provisioning.

If you’ve ever worked in the identity realm before, you know the pains that come with managing the life cycle of an identity from initial provisioning, changes resulting to the identity such as department and position changes, to the often forgotten stage of de-provisioning.  On-premises these problems were used solved by cobbled together scripts or complex identity management solution such as SailPoint Identity IQ or Microsoft Identity Manager.  While these tools were challenging to implement and operate, they did their job in the world of Windows Active Directory, LDAP, SQL databases and the like.

Then came cloud, and all bets were off.  Identity data stores skyrocketed from less than a hundred to hundreds and sometimes thousands (B2C has exploded far beyond event that).  Each new cloud service introduced into the enterprise introduced yet another identity management challenge.  While some of these offerings have APIs that support identity management operations, most do not, and those that do are proprietary in nature.  Writing custom code to each of the APIs is a huge challenge that most enterprises can’t keep up.  The result is often manual management of an identity life cycle, through uploading exported CSV files or some poor soul pointing and clicking a thousand times in a vendor portal.

Wouldn’t it be great if there was some mythical standard out that would help to solve this problem, use a standard API through REST, and support the JSON format?  Turns out there is and that standard is SCIM (System for Cross-domain Identity Management).  You may be surprised to know the standard has been around for a while now (technically 2011).  I recall hearing about it at a Gartner conference many many hears ago.  Unfortunately, it’s taken a long time to catch on but support is steadily increasing.

Thankfully for us, Microsoft has baked support into Azure AD and AWS recognized the value and took advantage of the feature.  By doing this, the identity life cycle challenges of managing an Azure AD and AWS integration has been heavily re-mediated and our lives made easier.

Azure AD Provisioning - Example

Azure AD Provisioning – Example

Let’s take a look at how set it up, shall we?

The first place you’ll need to go is into the AWS account which is the master for the organization and into the AWS SSO Settings.  In Settings you’ll see the provisioning option which is initially set as manual.  Select to enable automatic provisioning.

AWS SSO Settings - Provisioning

AWS SSO Settings – Provisioning

Once complete, a SCIM endpoint will be created.  This is the endpoint in AWS (referred to as the SCIM service provider in the SCIM standard) that the SCIM service on Azure AD (referred to as the client in the SCIM standard) will interact with to search for, create, modify, and delete AWS users and groups.  To interact with this endpoint, Azure AD must authenticate to it, which it does with a bearer access token that is issued by AWS SSO.  Be aware that the access token has a one year life span, so ensure you set some type of reminder.  A quick search through the boto3 API doesn’t show a way to query for issued access tokens (yes you can issue more than one at at time) so you won’t be able to automate the process as of yet.

awssso-scimendpoint.png

After SCIM is enabled, AWS SSO Settings for provisioning now reports SCIM in use.

awssso-scimenabled.png

Next you’ll need to bounce over to Azure AD and go into the enterprise app you created (refer to my third post for this process).   There you’ll navigate to the Provisioning blade and select Automatic as the provisioning method.

azuread-scimprov.png

You’ll then need to configure the URL and access token you collected from AWS and test the connection.  This will cause Azure AD to test querying the endpoint for a random user and group to validate functionality.

azuread-scimtest.png

If your test is successful you can then save the settings.

azuread-scimtestsucccess.PNG

You’re not done yet.  Next you have to configure a mapping which map attributes in Azure AD to the resource and attributes in the SCIM schema.  Yes folks, SCIM does have a schema for attributes and resources (like users and groups).  You can extend it as needed, but in this integration it looks to be using the default user and group resources.

azuread-scimmappings

Let’s take a look at what the group mappings look like.

azuread-scimgroupmappings.PNG

The attribute names on the left are the names of the attributes in Azure AD and the attributes on the right are the names of the attributes Azure AD will write the values of the attributes to in AWS SSO.  Nothing too surprising here.

How about the user mappings?

azuread-scimusermappings1azuread-scimusermappings2

Lots more attributes in the user mappings by default.  Now I’m not sure how many of these attributes AWS SSO supports.  According to the SCIM standard, a client can attempt to write whatever it wants and any attributes the service provider doesn’t understand is simply discarded.  The best list of attributes I could find were located here, and it’s not near this number.  I can’t speak to what the minimum required attributes are to make AWS work, because their official instructions on this integration doesn’t say.  I know some of the product team sometimes reads the blog, so maybe we’ll luck out and someone will respond with that answer.

The one tweak you’ll need to make here is to delete the mailNickName mapping and replace it with a mapping of objectId to externalId.  After you make the change, click the save icon.

I don’t know why AWS requires this so I can only theorize.  Maybe they’re using this attribute as a primary key in the back end database or perhaps they’re using it to map the users to the groups?  I’m not sure how Azure AD is writing the members attribute over to AWS.  Maybe in the future I’ll throw together a basic app to visualize what the service provider end looks like.

newmapping.PNG

Now you need to decide what users and groups you want to sync to AWS SSO.  Towards the bottom of the provisioning blade, you’ll see the option to toggle the provisioning status.  The scope drop down box has an option to sync all users and groups or to sync only assigned users and groups.  Best practice here is basic security, only sync what you need to sync, so leave the option on sync only assigned users and group.

The assigned users and groups refers to users that have been assigned to the enterprise application in Azure AD.  This is configured on the Users and Groups blade for the enterprise app.  I tested a few different scenarios using an Azure AD dynamic group, standard group, and a group synchronized from Windows AD.  All worked successfully and synchronized the relevant users over.

Once you’re happy with your settings, toggle the provisioning status and save the changes.  It may take some time depending on how much you’re syncing.

syncsuccess.PNG

If the sync is successful, you’ll be able to hop back over to AWS SSO and you’ll see your users and groups.

awssyncedusersawssyncedgroup

Microsoft’s official documentation does a great job explaining the end to end cycle.  The short of it is there’s an initial cycle which grabs all users and groups from Azure AD, then filters the list down to the users and groups assigned to the application.  From there it queries the target system to match the user with the matching attribute and if it isn’t found creates it, and if found and needs updating, updates it.

Incremental cycles are down from that point forward every 40 minutes.  I couldn’t find any documentation on how to adjust the synchronization frequency.  Be aware of that 40 minute sync and consider the end to end synchronization if you’re sourcing from Windows Active Directory.  In that case making changes in Windows AD could take just over an hour (assuming you’re using the 30 minute sync interval in Azure AD Connect) to fully synchronize.

awsssotime.PNG

As I described in my third post, I have a lab environment setup where a Windows Active Directory domain is syncing to Azure AD.  I used that environment to play out a few scenarios.

In the first scenario I disabled Marge Simpson’s account.  After waiting some time for changes to synchronize across both platforms, I saw in AWS SSO that Marge Simpson was now disabled.

margedisabled.PNG

For another scenario, I removed Barney Gumble from the Network Operators Active Directory group.  After waiting time for the sync to complete, the Network Operators group is now empty reflecting Barney’s removal from the group.

networkoperators.PNG

Recall that I assigned four groups to the app in Azure AD, Network Operators, Security Admins, Security Auditors, and Systems Operators.  These are the four groups syncing to AWS SSO.  Barney Gumble was only a member of the Network Operators group, which means removing him put him out of scope for the app assignment.  In AWS SSO, he now reports as being disabled.

barneydisabled.PNG

For our final scenario, let’s look at what happens when I deleted Barney Gumble from Windows Active Directory.  After waiting the required replication time, Barney Gumble’s user account was still present in AWS SSO, but set as disabled.  While Barney wouldn’t be able to login to AWS SSO, there would still be cleanup that would need to happen on the AWS SSO directory to remove the stale identity records.

barneydisabled.PNG

The last thing I want to cover is the logging capabilities of the SCIM service in Azure AD.  There are two separate logs you can reference.  The first are the Provisioning Logs which are currently in preview.  These logs are going to be your go to to troubleshoot issues with the provisioning process.  They’re available with an Azure AD P1 or above license and are kept for 30 days.  Supposedly they’re kept for free for 7 days, but the documentation isn’t clear whether or not you have the ability to consume them.  I also couldn’t find any documentation on if it’s possible to pull the logs from an API for longer term retention or analysis in Log Analytics or a 3rd party logging solution.

If you’ve ever used Azure AD, you’ll be familiar with the second source of logs.  In the Azure AD Audit logs, you get additional information, which while useful, is more catered to tracking the process vs troubleshooting the process like the provisioning logs.

Before I wrap up, let’s cover a few key findings:

  • The access token used to access the SCIM endpoint in AWS SSO has a one year lifetime.  There doesn’t seem to be a way to query what tokens have been issued by AWS SSO at this time, so you’ll need to manage the life cycle in another manner until the capability is introduced.
  • Users that are removed from the scope of the sync, either by unassigning them from the app or deleting their user object, become disabled in AWS SSO.  The records will need to be cleaned up via another process.
  • If synchronizing changes from a Windows AD the end to end synchronization process can take over an hour (30 minutes from Windows AD to Azure AD and 40 minutes from Azure AD to AWS SSO).

That will wrap up this post.  In my opinion the SCIM service available in Azure AD is extremely under utilized.  SCIM is a great specification that needs more love.  While there is a growing adoption from large enterprise software vendors, there is a real opportunity for your organization to take advantage of the features it offers in the same way AWS has.  It can greatly ease the pain your customers and enterprise users experience having to manage the life cycle of an identity and makes for a nice belt and suspenders to modern identity capabilities in an application.

In the last post of my series I’ll demonstrate a few scenarios showing how simple the end to end experience is for users.  I’ll include some examples of how you can incorporate some of the advanced security features of Azure AD to help protect your multi-cloud experience.

See you next post!

 

Capturing Azure Management Group Activity Logs Using Azure Automation – Part 2

Welcome back fellow geeks!

This post will be the second post in a series covering how to use Azure Automation to capture Azure Management Group Activity Logs.  In the first post I walked through what management groups are and the problems that they solve.  The key takeaway of that post is that management groups have their own Activity Logs and (at this time) they’re only accessible from within the Portal and over the Azure REST API.  Given that management groups are where we’re applying our Azure Policy for governance and compliance and our access controls via Azure RBAC, the Activity Logs are pretty critical.  So what is a geek to do?

In this post I’ll cover a solution I put together to solve the problem.  It uses an Azure Automation PowerShell Runbook to iterate through the management groups within an Azure Active Directory tenant, write the logs to Azure Storage, and optionally deliver the logs to Azure Monitor or Azure EventHubs.  The architecture is pictured below.

Capture.PNG

If you’re not familiar with Azure Automation it’s a service that provides a number of key capabilities within Azure such as configuration management, update management, and process automation.  If you’re coming from AWS, I’d compare it to a service somewhat similar to AWS Systems Manager.  For the purposes of this series of posts I’m going to focus on the process automation capability of the service delivered through Runbooks.  I’m not going to go too in-depth into Azure Automation, but I’ll provide a brief overview of the service features and tweaks relevant to the solution.

Runbooks are modules of code that can be strung together to perform a series of tasks such as performing maintenance on a collection of VMs.  The modules can be authored using either PowerShell or Python.  At this time only Python 2 is supported which makes me a sad panda.  Given that Python 2 enters end of life in two months, I’d recommend doing anything Python related in Azure Functions.  I could devote an entire blog post complaining about the lack of Python 3 in the year 2019, but I’ll spare you.  You’re going to want to author your Runbooks in PowerShell until/if Python 3 is supported is supported in the future.

The Azure Automation account acts as a logical container for the Runbooks created within it.  An Azure Automation Account can be provided with a RunAs account, which is simply a service principal in Azure Active Directory.   The service principal is configured with a certificate credential which is used by the Automation Account to authenticate to Azure AD and access Azure resources within the tenant.  Any Runbooks you create within the Automation account can assume the identity to execute tasks across your Azure resources.

You can automatically provision the RunAs account when the Azure Automation Account is provisioned, just be aware that the service principal will be granted the Contributor role on the Azure Subscription.  This is probably going to be way more permissions than are needed so I’d recommend removing that role assignment, creating a custom RBAC role, and assigning it at the appropriate scope.

Automation Accounts have a number of assets which are relevant for Runbooks.  These include variables, connections, credentials, and certificates.  The links I provided will give you detailed information on these assets, so I’ll summarize the relevant content to the solution.  Variables can come in a variety of types including strings and integers and can also optionally be encrypted.  For this solution I use encrypted variables to store the Event Hub connection string, Log Analytics Workspace Id, and Log Analytics Workspace Key.  Connections contain information required to connect to an external service or application.  The only connection asset used with this solution is the AzureServicePrincipal which is used by the RunAs account.  You can retrieve the  connection to get information such as the Azure AD tenant Id and application id (client id in the OAuth world).  Lastly, we have the certificate asset, which as the name describes, can be used to securely store a certificate that is used for authentication.  This solution uses the AzureRunAsCertificate certificate which contains the certificate asset used to authenticate the Automation Account RunAs account.

Each Automation Account comes with a predefined set of PowerShell modules and .NET libraries.  You can add additional modules and libraries by importing them to the Automation Account.  For this solution I added a number of .NET libraries including the ADAL and some libraries required to communicate with Event Hubs.  While PowerShell does a wonderful job of handling things at the management plane of Azure, it is severely lacking in the data plane requiring you to fall back on incorporating .NET code into your PowerShell script.

The above (including the links) should give you the bare minimum you need to understand to use this solution.  Let’s deep dive into the code.  Since this is a fairly lengthy script I’m not going to paste every line of code.  Instead I’m going to call out key sections of code that were particularly relevant or interesting to write.

The first function in the script is called Get-AdalToken and uses the .NET ADAL library to retrieve a token from Azure AD.  When I code in Python I typically use the MSAL library since I find it to be a bit more slick, but found the .NET version too cumbersome and difficult to use in in PowerShell.  If you’ve ever used .NET libraries in your PowerShell scripts, you know where I’m coming from.

The token retrieved by the function is used for calls to the Azure Management REST API.  The reason I went with ADAL vs pulling the access token from a session created using Add-AzAccount method as demonstrated here is I wanted code I could reuse for other purposes outside of the Azure REST API.

Once the token is retrieved it is stored in a variable for later use in the script.

adal

Next up we have the Get-AllManagementGroups function.  This function calls the Azure REST API to get a full listing of management groups.  Oddly enough there is an AzureRM cmdlet included in the AzureRm.Resources module that comes preinstalled with every new Automation Account.  However, even after updating the modules within the account (this link tells you how to do this and I highly recommend doing it whenever you create a new automation account) the cmdlet only ever reported back the tenant root group.  This occurred even when following the instructions to spit back all Management Groups.  I chalked it up to there being an issue with the cmdlet or user error on my part.  Either way, it was simple enough to whip up a call to the REST API.

Following the Get-AllManagementGroups function we have the Get-ManagementGroupActivityLog function.  Let me tell you folks, this one was an absolute pain to write.  According to this Azure feedback thread these logs have been accessible over the API since back in March of this year, but the REST API reference documentation doesn’t look to have been updated to reflect this.  I’m going to save you all a ton of headaches and hours of experimentation and searching the web.  When you want to get Activity Logs over the REST API you are going to use the following endpoint:


https://management.azure.com/providers/Microsoft.Management/

managementGroups/mgmtGroupId/providers/microsoft.insights/
eventtypes/management/values

The mgmtGroupId variable would be the name of your management group.  If your management group is named production then the value in that URL would be production.  Additionally, you’ll want to pass query parameters of api-version set to 2017-03-01-preview and a $filter query parameter constructed in the same way you would to query a subscription Activity Log.

activitylogquery.PNG

The SendTo-Storage function sends the Activity Log for each Management Group as a separate blob to Azure Storage.  The format of the Activity Log is raw JSON.

The SentTo-Workspace function sends the log data to Azure Monitor (really a Log Analytics Workspace) via the HTTP Data Collector API.  The product team was wonderful enough to include sample PowerShell code that made writing that function a breeze.

I did run into some weirdness with this function which was caused by the maximum size of an output stream in Runbooks which is 1MB.  When I pulled the Activity Log for 90 days, the entirety of the log was well over 1MB so it would cause the Runbook to fail three times and suspend.  Debugging this was a pain because the Runbook doesn’t report the error in an obvious way.  I got around this by collecting the log entries into a group and sending them at 200KB intervals.    Additionally, I also added some error checking and retry handling if it got throttled.

The final function is named SendTo-EventHub and delivers the logs to an Event Hub.  I couldn’t find any PowerShell cmdlets that could be used to send data to Event Hub.  This forced me to fall back to the .NET libraries.  In the end I got it working and got them streaming, but I’m sure someone more skilled in .NET than me (which isn’t difficult to be) could optimize and improve that code.

The main chunk of the solution strings everything together.  By default the solution writes the logs to Azure blob storage.  You can optionally deliver the data to Azure Monitor and Azure Event Hubs.

Well folks that brings us to the end of this post and series.  While I’m sure the product team is quickly coming out with this out of box integration, I learned a ton about Azure Automation and Runbooks working on this effort.  Runbooks are a wonderful tool if you’re a classic infrastructure / security tech new to the whole coding thing.  It’s a very simple and straightforward user experience for that audience and a good stepping stone into the coding world vs jumping directly into Azure Functions.

I’ve posted the solution up onto my Github.  For those folks without Github, I’ve put a static copy of the solution up on this website at this link.  Take it, test it, play with it, build upon it, and experiment with it.

Capturing Azure Management Group Activity Logs Using Azure Automation – Part 1

Capturing Azure Management Group Activity Logs Using Azure Automation – Part 1

Hello again fellow geeks!

Over the past few months I’ve been working with a customer who is just beginning their journey into the cloud.  We’ve had a ton of great conversations around security, governance, and operationalizing Microsoft Azure.  We recently finalized the RACI and identified the controls required by both their internal security policy and their industry compliance requirements.  With those two items complete, we put together our Azure RBAC model and narrowed down the Azure Policies we needed to put in place to satisfy our compliance controls.

After a lot of discussion about the customer’s organization, its geographical locations, business unit makeup, and how its developers and central IT operate, we came up with a subscription model.  This customer had decided on an Azure subscription model where each workload would exist in its own subscription.  Further, each workload’s production and non-production environment would be segmented in different subscriptions.  Keeping each workload in a different subscription ensures no workload will compete for resources with other workloads and hit any subscription limits.  Additionally, it allowed the customer to very easily track the costs associated with each workload.

Now why did we use separate production and non-production subscriptions for each workload?  One reason is to address the same risk as above where a non-production workload could potentially consume all resources within a subscription impacting a production workload.  The other more critical reason is it makes it easier for us to apply different governance and access controls on production workloads vs non-production workloads.  The way we do this is through the usage of Azure Management Groups.

Management Groups were introduced into general availability back in late 2018 to help address the challenges organizations were having operating subscriptions at scale.  They provided a hierarchal method to apply governance and access controls across a collection of subscriptions.  For those of you familiar with AWS, Management Groups are somewhat similar to AWS Organizations and Organizational Units.  For my fellow Windows AD peeps, you can think of Management Groups somewhat like the Active Directory container and organizational unit hierarchy in an Active Directory domain where you apply different access control entries and group policy at high levels in the OU hierarchy that is then enforced and inherited down to the children.  Management Groups work in a similar manner in that the Azure RBAC definitions and assignments and Azure Policy you assign to the parent Management Groups are inherited down into the children.

Every Azure AD tenant starts with a top-level management group called the tenant root group.  Additional management groups created within the tenant are children of the group up to a maximum of 10,000 management groups and up to six levels of depth.  Any RBAC assignment or Azure Policy assigned to the tenant root group applies to all children management group in the tenant.  It’s important to understand that Management Groups are a resource within the Azure AD tenant and not a resource of an Azure subscription.  This will matter for reasons we’ll see later.

The tenant root management group can only be administered by a Global Admin by default and even this requires a configuration change in the tenant.  The method is describe here and what it does is places the global administrator performing the action in the User Access Administrator RBAC role at the root of scope.  Once that is complete, the name of the root management group could be changed, role assignments created, or policy assigned.

Screen Shot 2019-10-17 at 9.59.59 PM

Administering Tenant Root Group

Now there is one aspect of Management Groups that is a bit funky.  If you’re very observant you probably noticed the menu option below.

Screen Shot 2019-10-17 at 9.59.59 PM.png

That’s right folks, Management Groups have their own Activity Log.  Every action you perform at the management group scope such creating an Azure RBAC role assignment or assigning or un-assigning an Azure Policy is captured in this Activity Log.  Now as of today, the only way to access these logs is viewing them through the portal or through the Azure REST API.  Unlike the Activity Logs associated with a subscription, there isn’t native integration with Event Hubs or Azure Storage.  Don’t be fooled by the Export To Event Hub link seen in the screenshot below, this will simply send you to the standard menu where you would configure subscription Activity Logs to be exported.

Screen Shot 2019-10-17 at 10.34.19 PM

Now you could log into the GUI every day and export the logs to a CSV (yes that does work with Management Groups) but that simply isn’t scalable and also prevents you from proactively monitoring the logs.  So how do we deal with this gap while the product team works on incorporating the feature?  This will be the challenge we address in this series.

Over the next few posts I’ll walk through the solution I put together using Azure Automation Runbooks to capture these Activity Logs and send them to Azure Storage for retention and an Azure Log Analytics Workspace for analysis and monitoring using Azure Monitor.

Continue the series in my second post.