Welcome back fellow geeks!
This post will be the second post in a series covering how to use Azure Automation to capture Azure Management Group Activity Logs. In the first post I walked through what management groups are and the problems that they solve. The key takeaway of that post is that management groups have their own Activity Logs and (at this time) they’re only accessible from within the Portal and over the Azure REST API. Given that management groups are where we’re applying our Azure Policy for governance and compliance and our access controls via Azure RBAC, the Activity Logs are pretty critical. So what is a geek to do?
In this post I’ll cover a solution I put together to solve the problem. It uses an Azure Automation PowerShell Runbook to iterate through the management groups within an Azure Active Directory tenant, write the logs to Azure Storage, and optionally deliver the logs to Azure Monitor or Azure EventHubs. The architecture is pictured below.
If you’re not familiar with Azure Automation it’s a service that provides a number of key capabilities within Azure such as configuration management, update management, and process automation. If you’re coming from AWS, I’d compare it to a service somewhat similar to AWS Systems Manager. For the purposes of this series of posts I’m going to focus on the process automation capability of the service delivered through Runbooks. I’m not going to go too in-depth into Azure Automation, but I’ll provide a brief overview of the service features and tweaks relevant to the solution.
Runbooks are modules of code that can be strung together to perform a series of tasks such as performing maintenance on a collection of VMs. The modules can be authored using either PowerShell or Python. At this time only Python 2 is supported which makes me a sad panda. Given that Python 2 enters end of life in two months, I’d recommend doing anything Python related in Azure Functions. I could devote an entire blog post complaining about the lack of Python 3 in the year 2019, but I’ll spare you. You’re going to want to author your Runbooks in PowerShell until/if Python 3 is supported is supported in the future.
The Azure Automation account acts as a logical container for the Runbooks created within it. An Azure Automation Account can be provided with a RunAs account, which is simply a service principal in Azure Active Directory. The service principal is configured with a certificate credential which is used by the Automation Account to authenticate to Azure AD and access Azure resources within the tenant. Any Runbooks you create within the Automation account can assume the identity to execute tasks across your Azure resources.
You can automatically provision the RunAs account when the Azure Automation Account is provisioned, just be aware that the service principal will be granted the Contributor role on the Azure Subscription. This is probably going to be way more permissions than are needed so I’d recommend removing that role assignment, creating a custom RBAC role, and assigning it at the appropriate scope.
Automation Accounts have a number of assets which are relevant for Runbooks. These include variables, connections, credentials, and certificates. The links I provided will give you detailed information on these assets, so I’ll summarize the relevant content to the solution. Variables can come in a variety of types including strings and integers and can also optionally be encrypted. For this solution I use encrypted variables to store the Event Hub connection string, Log Analytics Workspace Id, and Log Analytics Workspace Key. Connections contain information required to connect to an external service or application. The only connection asset used with this solution is the AzureServicePrincipal which is used by the RunAs account. You can retrieve the connection to get information such as the Azure AD tenant Id and application id (client id in the OAuth world). Lastly, we have the certificate asset, which as the name describes, can be used to securely store a certificate that is used for authentication. This solution uses the AzureRunAsCertificate certificate which contains the certificate asset used to authenticate the Automation Account RunAs account.
Each Automation Account comes with a predefined set of PowerShell modules and .NET libraries. You can add additional modules and libraries by importing them to the Automation Account. For this solution I added a number of .NET libraries including the ADAL and some libraries required to communicate with Event Hubs. While PowerShell does a wonderful job of handling things at the management plane of Azure, it is severely lacking in the data plane requiring you to fall back on incorporating .NET code into your PowerShell script.
The above (including the links) should give you the bare minimum you need to understand to use this solution. Let’s deep dive into the code. Since this is a fairly lengthy script I’m not going to paste every line of code. Instead I’m going to call out key sections of code that were particularly relevant or interesting to write.
The first function in the script is called Get-AdalToken and uses the .NET ADAL library to retrieve a token from Azure AD. When I code in Python I typically use the MSAL library since I find it to be a bit more slick, but found the .NET version too cumbersome and difficult to use in in PowerShell. If you’ve ever used .NET libraries in your PowerShell scripts, you know where I’m coming from.
The token retrieved by the function is used for calls to the Azure Management REST API. The reason I went with ADAL vs pulling the access token from a session created using Add-AzAccount method as demonstrated here is I wanted code I could reuse for other purposes outside of the Azure REST API.
Once the token is retrieved it is stored in a variable for later use in the script.
Next up we have the Get-AllManagementGroups function. This function calls the Azure REST API to get a full listing of management groups. Oddly enough there is an AzureRM cmdlet included in the AzureRm.Resources module that comes preinstalled with every new Automation Account. However, even after updating the modules within the account (this link tells you how to do this and I highly recommend doing it whenever you create a new automation account) the cmdlet only ever reported back the tenant root group. This occurred even when following the instructions to spit back all Management Groups. I chalked it up to there being an issue with the cmdlet or user error on my part. Either way, it was simple enough to whip up a call to the REST API.
Following the Get-AllManagementGroups function we have the Get-ManagementGroupActivityLog function. Let me tell you folks, this one was an absolute pain to write. According to this Azure feedback thread these logs have been accessible over the API since back in March of this year, but the REST API reference documentation doesn’t look to have been updated to reflect this. I’m going to save you all a ton of headaches and hours of experimentation and searching the web. When you want to get Activity Logs over the REST API you are going to use the following endpoint:
The mgmtGroupId variable would be the name of your management group. If your management group is named production then the value in that URL would be production. Additionally, you’ll want to pass query parameters of api-version set to 2017-03-01-preview and a $filter query parameter constructed in the same way you would to query a subscription Activity Log.
The SendTo-Storage function sends the Activity Log for each Management Group as a separate blob to Azure Storage. The format of the Activity Log is raw JSON.
The SentTo-Workspace function sends the log data to Azure Monitor (really a Log Analytics Workspace) via the HTTP Data Collector API. The product team was wonderful enough to include sample PowerShell code that made writing that function a breeze.
I did run into some weirdness with this function which was caused by the maximum size of an output stream in Runbooks which is 1MB. When I pulled the Activity Log for 90 days, the entirety of the log was well over 1MB so it would cause the Runbook to fail three times and suspend. Debugging this was a pain because the Runbook doesn’t report the error in an obvious way. I got around this by collecting the log entries into a group and sending them at 200KB intervals. Additionally, I also added some error checking and retry handling if it got throttled.
The final function is named SendTo-EventHub and delivers the logs to an Event Hub. I couldn’t find any PowerShell cmdlets that could be used to send data to Event Hub. This forced me to fall back to the .NET libraries. In the end I got it working and got them streaming, but I’m sure someone more skilled in .NET than me (which isn’t difficult to be) could optimize and improve that code.
The main chunk of the solution strings everything together. By default the solution writes the logs to Azure blob storage. You can optionally deliver the data to Azure Monitor and Azure Event Hubs.
Well folks that brings us to the end of this post and series. While I’m sure the product team is quickly coming out with this out of box integration, I learned a ton about Azure Automation and Runbooks working on this effort. Runbooks are a wonderful tool if you’re a classic infrastructure / security tech new to the whole coding thing. It’s a very simple and straightforward user experience for that audience and a good stepping stone into the coding world vs jumping directly into Azure Functions.
I’ve posted the solution up onto my Github. For those folks without Github, I’ve put a static copy of the solution up on this website at this link. Take it, test it, play with it, build upon it, and experiment with it.