Deep Dive into Azure AD and AWS SSO Integration – Part 5

Deep Dive into Azure AD and AWS SSO Integration – Part 5

I’m back yet again with the fifth entry into my series on integrating Azure AD and AWS SSO.  It’s been a journey and the series has covered a lot of ground.  It started with outlining the challenge with the initial integration of Azure AD and AWS using the AWS app in the Azure Marketplace.  From there it took a deep dive into the components of the solution and how it compares to a standard integration using your SAML provider of choice.  It continued with the steps necessary to configure Azure AD and AWS SSO to support the federated trust to enable single sign-on.  The fourth post explored the benefits of SCIM and went step by step on how to configure SCIM between the two services.  For this final post I’m going to cover a few different scenarios to demonstrate what’s possible with this new integration.

Before I jump into the scenarios, there is one final task that needs to be completed now that the federated trust and SCIM have been setup.  That task is setting up the permission sets in AWS SSO.  Permission sets are simply IAM policies (either AWS-managed or custom policies you create).  For those of you from the Microsoft Azure world, an IAM policy is a collection of permissions which define what a security principal (such as a user or role) is authorized to do.  They are most similar to an Azure RBAC role definition but more flexible and granular due to advanced features such as condition keys.  Permission sets are projected into the AWS accounts they are assigned to as AWS IAM roles.  These are the IAM roles the security principal assumes.

As I mentioned above, AWS SSO supports both AWS-managed IAM policies and custom IAM policies for permission sets.  If you go into the AWS Accounts menu option of AWS SSO you’ll see the accounts associated with the AWS Organization and which permission sets have been associated to the AWS accounts thus resulting in AWS IAM Roles being created within the AWS account.  In the image below you can see that I’ve provisioned two permission sets to account1 and account2.

accountassignments.pngThe permission sets tab displays the permission sets I’ve created and whether or not they’ve been provisioned to any accounts.  In the screenshot below you’ll see I’ve added four AWS-managed policies for Billing, SecurityAudit, AdministratorAccess, and NetworkAdministrator.  Additionally, I created a new permission set named SystemsAdmin which uses a custom IAM policy which restricts the principal assuming the rule to EC2, CloudWatch, and ELB activities.

permissionsets.png

Back on the AWS organization tab, if you click on an account you can see the AWS SSO Users or Groups that have been assigned to a permission set.  In the image below, you can see that I’ve assigned both the B2B Security Admins group and the Security Admins group to the AdministratorAccess permission set and the System Operators group to the SystemsAdmin permission set.

assignments.png

With permission sets out of the way, let’s jump into the scenarios.

Scenario 1 – Windows AD User, AD FS, Azure AD, AWS SSOscenario1.PNG

In this scenario the user is Bart Simpson who is a member of the System Operators group on-premises and exists authoritatively in a Windows AD forest.  A federated trust has been established with Azure AD using an instance of AD FS running on-premises. Azure AD has been integrated with AWS SSO for both SSO (via SAML) and provisioning (via SCIM).

Once Bart was logged into a domain-joined machine, I popped open a browser and navigated to My Apps portal at https://myapps.microsoft.com.  This redirected me to the Azure AD login screen.  Here I entered Bart’s user name.

bartazuread.PNG

Azure AD performed its home realm discovery process, identified that the domain jogcloud.com is configured for federated authentication, and redirected me to AD FS.  Take note I purposely broke integrated windows authentication here to show you each step.  In a correctly configured browser, you wouldn’t see this screen.

bartadfs.PNG

After I successfully authenticated to AD FS, I was bounced back over to Azure AD where the assertion was delivered.  Azure AD then whipped up a SAML assertion for AWS SSO, returned it to the browser, and redirected the browser to the AWS SSO assertion consumer URL.  AWS SSO consumed the assertion and authenticated Bart into AWS SSO displaying the AWS IAM Role selection page with the relevant roles he has permission to access.

bartawssso.PNG

Scenario 2 – Windows AD User, AD FS with Certificate MFA, Azure AD with Conditional Access, AWS SSO

scenario2.PNG

Scenario 1 is pretty simple, so let’s get fancy and layer on some security.  Here I added an access control policy into AD FS requiring certificate-based authentication for members of the Security Admins group.  Additionally, I added a conditional access policy in Azure AD requiring MFA for any user that is a member of that same group.

Since Homer Simpson regularly runs a nuclear reactor, he’s also the Security Admin for JOGCLOUD.  He has been made member of the Windows AD Security Admin group.

As a first step I again popped open a browser and navigated to the My Apps portal.  After Homer’s username was plugged in, Azure AD redirected me to the AD FS server.  I again broke IWA to capture each step in the process.

signin2

After the password challenge was satisfied, I was prompted to provide the appropriate user certificate.

signin3.PNG

From there I was authenticated to Azure AD and served up the My Apps portal.

myapps.PNG

Wondering why I wasn’t prompted for Azure MFA?  No, I didn’t misconfigure it (at least this time).  A not well documented feature (at least in my opinion) of Azure AD is that you can pass a claim asserting a user has satisfied the MFA requirement thus making for a better user experience because the user isn’t required to authenticate multiple times.  Yes folks, this means you can layer your traditional certificate-based authentication on top of Azure AD and AWS. 

mfaonprem.png

After selecting the AWS SSO app, I was signed into AWS SSO and presented with the role selection screen.

awsssosignin1.PNG

I then selected a one of the roles and was signed into the relevant AWS account assuming the AdministratorAccess IAM Role.

awsssosignin2

Scenario 3 – Azure AD B2B User, AWS SSO

scenario3.PNG

What if you have a multi-tenant situation due to an acquisition or merger or perhaps you farm out operations to a managed service provider?  No worries there, B2B is also supported with this pattern.  In this scenario I’m using a user sourced from tenant that has been invited via Azure AD’s B2B.  The user has been added to the B2B Security Admins group which exists authoritatively in the inviting tenant (jogcloud.com) and was synchronized to AWS SSO via SCIM.

Opening a browser and navigating to the My Apps portal kicks off Azure AD authentication and drops the user into their source tenant.  Once there I can change my tenant by selecting the profile icon and selecting the jogcloud tenant.

myappsmultiple.png

I’m then presented with the apps that I’m authorized to use in the jogcloud tenant, which includes the AWS SSO app.

guestmyapp.PNG

Azure AD kicks off the federated authentication and I’m presented with the AWS role selection page where I can choose to assume the AdministratorAccess role in two of the AWS accounts.

guestawsso.png

Scenario 4 – AWS CLI

I know what you’re saying now, “But what about CLI?”  Well folks, for that you can leverage the AWS CLI v2.  It’s still in preview right now, but I did test it using the user from scenario 2 and it worked flawlessly.  The experience is pretty anti-climatic so I’m not going to dive into it.  The user experience is similar to using the Azure PowerShell cmdlets in that a web browser instance is opened and guides you through the authentication process.

That will sum up this series.

Few technologies get me excited enough to write five posts, but this integration is really amazing.  With AWS hooking into Azure AD as effectively as they have (especially love the CLI integration), it reduces operational overhead and improves security which is a combination you rarely see together.  Most importantly, it puts the customer first by optimizing the user experience.  If you weren’t convinced on Azure AD’s capabilities as an IDaaS, hopefully this series has helped educate you as to the value of the platform.

With that I’ll sign off.  A big thanks to the AWS product team that worked on this integration.  You did an amazing job that will greatly benefit our mutual customers.

To the rest of you, I wish you happy holidays!

 

 

 

Deep Dive into Azure AD and AWS SSO Integration – Part 4

Deep Dive into Azure AD and AWS SSO Integration – Part 4

Today we continue exploring the new integration between Microsoft’s Azure AD (Azure Active Directory) and AWS (Amazon Web Services) SSO (Single Sign-On).  Over the past three posts I’ve covered the high level concepts of both platforms, the challenges the integration seeks to solve, and how to enable the federated trust which facilitates the single sign-on experience.  If you haven’t read through those posts, I recommend you before you dive into this one.  In this post I’ll be covering the neatest feature of the new integration, which is the support for automated provisioning.

If you’ve ever worked in the identity realm before, you know the pains that come with managing the life cycle of an identity from initial provisioning, changes resulting to the identity such as department and position changes, to the often forgotten stage of de-provisioning.  On-premises these problems were used solved by cobbled together scripts or complex identity management solution such as SailPoint Identity IQ or Microsoft Identity Manager.  While these tools were challenging to implement and operate, they did their job in the world of Windows Active Directory, LDAP, SQL databases and the like.

Then came cloud, and all bets were off.  Identity data stores skyrocketed from less than a hundred to hundreds and sometimes thousands (B2C has exploded far beyond event that).  Each new cloud service introduced into the enterprise introduced yet another identity management challenge.  While some of these offerings have APIs that support identity management operations, most do not, and those that do are proprietary in nature.  Writing custom code to each of the APIs is a huge challenge that most enterprises can’t keep up.  The result is often manual management of an identity life cycle, through uploading exported CSV files or some poor soul pointing and clicking a thousand times in a vendor portal.

Wouldn’t it be great if there was some mythical standard out that would help to solve this problem, use a standard API through REST, and support the JSON format?  Turns out there is and that standard is SCIM (System for Cross-domain Identity Management).  You may be surprised to know the standard has been around for a while now (technically 2011).  I recall hearing about it at a Gartner conference many many hears ago.  Unfortunately, it’s taken a long time to catch on but support is steadily increasing.

Thankfully for us, Microsoft has baked support into Azure AD and AWS recognized the value and took advantage of the feature.  By doing this, the identity life cycle challenges of managing an Azure AD and AWS integration has been heavily re-mediated and our lives made easier.

Azure AD Provisioning - Example

Azure AD Provisioning – Example

Let’s take a look at how set it up, shall we?

The first place you’ll need to go is into the AWS account which is the master for the organization and into the AWS SSO Settings.  In Settings you’ll see the provisioning option which is initially set as manual.  Select to enable automatic provisioning.

AWS SSO Settings - Provisioning

AWS SSO Settings – Provisioning

Once complete, a SCIM endpoint will be created.  This is the endpoint in AWS (referred to as the SCIM service provider in the SCIM standard) that the SCIM service on Azure AD (referred to as the client in the SCIM standard) will interact with to search for, create, modify, and delete AWS users and groups.  To interact with this endpoint, Azure AD must authenticate to it, which it does with a bearer access token that is issued by AWS SSO.  Be aware that the access token has a one year life span, so ensure you set some type of reminder.  A quick search through the boto3 API doesn’t show a way to query for issued access tokens (yes you can issue more than one at at time) so you won’t be able to automate the process as of yet.

awssso-scimendpoint.png

After SCIM is enabled, AWS SSO Settings for provisioning now reports SCIM in use.

awssso-scimenabled.png

Next you’ll need to bounce over to Azure AD and go into the enterprise app you created (refer to my third post for this process).   There you’ll navigate to the Provisioning blade and select Automatic as the provisioning method.

azuread-scimprov.png

You’ll then need to configure the URL and access token you collected from AWS and test the connection.  This will cause Azure AD to test querying the endpoint for a random user and group to validate functionality.

azuread-scimtest.png

If your test is successful you can then save the settings.

azuread-scimtestsucccess.PNG

You’re not done yet.  Next you have to configure a mapping which map attributes in Azure AD to the resource and attributes in the SCIM schema.  Yes folks, SCIM does have a schema for attributes and resources (like users and groups).  You can extend it as needed, but in this integration it looks to be using the default user and group resources.

azuread-scimmappings

Let’s take a look at what the group mappings look like.

azuread-scimgroupmappings.PNG

The attribute names on the left are the names of the attributes in Azure AD and the attributes on the right are the names of the attributes Azure AD will write the values of the attributes to in AWS SSO.  Nothing too surprising here.

How about the user mappings?

azuread-scimusermappings1azuread-scimusermappings2

Lots more attributes in the user mappings by default.  Now I’m not sure how many of these attributes AWS SSO supports.  According to the SCIM standard, a client can attempt to write whatever it wants and any attributes the service provider doesn’t understand is simply discarded.  The best list of attributes I could find were located here, and it’s not near this number.  I can’t speak to what the minimum required attributes are to make AWS work, because their official instructions on this integration doesn’t say.  I know some of the product team sometimes reads the blog, so maybe we’ll luck out and someone will respond with that answer.

The one tweak you’ll need to make here is to delete the mailNickName mapping and replace it with a mapping of objectId to externalId.  After you make the change, click the save icon.

I don’t know why AWS requires this so I can only theorize.  Maybe they’re using this attribute as a primary key in the back end database or perhaps they’re using it to map the users to the groups?  I’m not sure how Azure AD is writing the members attribute over to AWS.  Maybe in the future I’ll throw together a basic app to visualize what the service provider end looks like.

newmapping.PNG

Now you need to decide what users and groups you want to sync to AWS SSO.  Towards the bottom of the provisioning blade, you’ll see the option to toggle the provisioning status.  The scope drop down box has an option to sync all users and groups or to sync only assigned users and groups.  Best practice here is basic security, only sync what you need to sync, so leave the option on sync only assigned users and group.

The assigned users and groups refers to users that have been assigned to the enterprise application in Azure AD.  This is configured on the Users and Groups blade for the enterprise app.  I tested a few different scenarios using an Azure AD dynamic group, standard group, and a group synchronized from Windows AD.  All worked successfully and synchronized the relevant users over.

Once you’re happy with your settings, toggle the provisioning status and save the changes.  It may take some time depending on how much you’re syncing.

syncsuccess.PNG

If the sync is successful, you’ll be able to hop back over to AWS SSO and you’ll see your users and groups.

awssyncedusersawssyncedgroup

Microsoft’s official documentation does a great job explaining the end to end cycle.  The short of it is there’s an initial cycle which grabs all users and groups from Azure AD, then filters the list down to the users and groups assigned to the application.  From there it queries the target system to match the user with the matching attribute and if it isn’t found creates it, and if found and needs updating, updates it.

Incremental cycles are down from that point forward every 40 minutes.  I couldn’t find any documentation on how to adjust the synchronization frequency.  Be aware of that 40 minute sync and consider the end to end synchronization if you’re sourcing from Windows Active Directory.  In that case making changes in Windows AD could take just over an hour (assuming you’re using the 30 minute sync interval in Azure AD Connect) to fully synchronize.

awsssotime.PNG

As I described in my third post, I have a lab environment setup where a Windows Active Directory domain is syncing to Azure AD.  I used that environment to play out a few scenarios.

In the first scenario I disabled Marge Simpson’s account.  After waiting some time for changes to synchronize across both platforms, I saw in AWS SSO that Marge Simpson was now disabled.

margedisabled.PNG

For another scenario, I removed Barney Gumble from the Network Operators Active Directory group.  After waiting time for the sync to complete, the Network Operators group is now empty reflecting Barney’s removal from the group.

networkoperators.PNG

Recall that I assigned four groups to the app in Azure AD, Network Operators, Security Admins, Security Auditors, and Systems Operators.  These are the four groups syncing to AWS SSO.  Barney Gumble was only a member of the Network Operators group, which means removing him put him out of scope for the app assignment.  In AWS SSO, he now reports as being disabled.

barneydisabled.PNG

For our final scenario, let’s look at what happens when I deleted Barney Gumble from Windows Active Directory.  After waiting the required replication time, Barney Gumble’s user account was still present in AWS SSO, but set as disabled.  While Barney wouldn’t be able to login to AWS SSO, there would still be cleanup that would need to happen on the AWS SSO directory to remove the stale identity records.

barneydisabled.PNG

The last thing I want to cover is the logging capabilities of the SCIM service in Azure AD.  There are two separate logs you can reference.  The first are the Provisioning Logs which are currently in preview.  These logs are going to be your go to to troubleshoot issues with the provisioning process.  They’re available with an Azure AD P1 or above license and are kept for 30 days.  Supposedly they’re kept for free for 7 days, but the documentation isn’t clear whether or not you have the ability to consume them.  I also couldn’t find any documentation on if it’s possible to pull the logs from an API for longer term retention or analysis in Log Analytics or a 3rd party logging solution.

If you’ve ever used Azure AD, you’ll be familiar with the second source of logs.  In the Azure AD Audit logs, you get additional information, which while useful, is more catered to tracking the process vs troubleshooting the process like the provisioning logs.

Before I wrap up, let’s cover a few key findings:

  • The access token used to access the SCIM endpoint in AWS SSO has a one year lifetime.  There doesn’t seem to be a way to query what tokens have been issued by AWS SSO at this time, so you’ll need to manage the life cycle in another manner until the capability is introduced.
  • Users that are removed from the scope of the sync, either by unassigning them from the app or deleting their user object, become disabled in AWS SSO.  The records will need to be cleaned up via another process.
  • If synchronizing changes from a Windows AD the end to end synchronization process can take over an hour (30 minutes from Windows AD to Azure AD and 40 minutes from Azure AD to AWS SSO).

That will wrap up this post.  In my opinion the SCIM service available in Azure AD is extremely under utilized.  SCIM is a great specification that needs more love.  While there is a growing adoption from large enterprise software vendors, there is a real opportunity for your organization to take advantage of the features it offers in the same way AWS has.  It can greatly ease the pain your customers and enterprise users experience having to manage the life cycle of an identity and makes for a nice belt and suspenders to modern identity capabilities in an application.

In the last post of my series I’ll demonstrate a few scenarios showing how simple the end to end experience is for users.  I’ll include some examples of how you can incorporate some of the advanced security features of Azure AD to help protect your multi-cloud experience.

See you next post!

 

Capturing Azure Management Group Activity Logs Using Azure Automation – Part 2

Welcome back fellow geeks!

This post will be the second post in a series covering how to use Azure Automation to capture Azure Management Group Activity Logs.  In the first post I walked through what management groups are and the problems that they solve.  The key takeaway of that post is that management groups have their own Activity Logs and (at this time) they’re only accessible from within the Portal and over the Azure REST API.  Given that management groups are where we’re applying our Azure Policy for governance and compliance and our access controls via Azure RBAC, the Activity Logs are pretty critical.  So what is a geek to do?

In this post I’ll cover a solution I put together to solve the problem.  It uses an Azure Automation PowerShell Runbook to iterate through the management groups within an Azure Active Directory tenant, write the logs to Azure Storage, and optionally deliver the logs to Azure Monitor or Azure EventHubs.  The architecture is pictured below.

Capture.PNG

If you’re not familiar with Azure Automation it’s a service that provides a number of key capabilities within Azure such as configuration management, update management, and process automation.  If you’re coming from AWS, I’d compare it to a service somewhat similar to AWS Systems Manager.  For the purposes of this series of posts I’m going to focus on the process automation capability of the service delivered through Runbooks.  I’m not going to go too in-depth into Azure Automation, but I’ll provide a brief overview of the service features and tweaks relevant to the solution.

Runbooks are modules of code that can be strung together to perform a series of tasks such as performing maintenance on a collection of VMs.  The modules can be authored using either PowerShell or Python.  At this time only Python 2 is supported which makes me a sad panda.  Given that Python 2 enters end of life in two months, I’d recommend doing anything Python related in Azure Functions.  I could devote an entire blog post complaining about the lack of Python 3 in the year 2019, but I’ll spare you.  You’re going to want to author your Runbooks in PowerShell until/if Python 3 is supported is supported in the future.

The Azure Automation account acts as a logical container for the Runbooks created within it.  An Azure Automation Account can be provided with a RunAs account, which is simply a service principal in Azure Active Directory.   The service principal is configured with a certificate credential which is used by the Automation Account to authenticate to Azure AD and access Azure resources within the tenant.  Any Runbooks you create within the Automation account can assume the identity to execute tasks across your Azure resources.

You can automatically provision the RunAs account when the Azure Automation Account is provisioned, just be aware that the service principal will be granted the Contributor role on the Azure Subscription.  This is probably going to be way more permissions than are needed so I’d recommend removing that role assignment, creating a custom RBAC role, and assigning it at the appropriate scope.

Automation Accounts have a number of assets which are relevant for Runbooks.  These include variables, connections, credentials, and certificates.  The links I provided will give you detailed information on these assets, so I’ll summarize the relevant content to the solution.  Variables can come in a variety of types including strings and integers and can also optionally be encrypted.  For this solution I use encrypted variables to store the Event Hub connection string, Log Analytics Workspace Id, and Log Analytics Workspace Key.  Connections contain information required to connect to an external service or application.  The only connection asset used with this solution is the AzureServicePrincipal which is used by the RunAs account.  You can retrieve the  connection to get information such as the Azure AD tenant Id and application id (client id in the OAuth world).  Lastly, we have the certificate asset, which as the name describes, can be used to securely store a certificate that is used for authentication.  This solution uses the AzureRunAsCertificate certificate which contains the certificate asset used to authenticate the Automation Account RunAs account.

Each Automation Account comes with a predefined set of PowerShell modules and .NET libraries.  You can add additional modules and libraries by importing them to the Automation Account.  For this solution I added a number of .NET libraries including the ADAL and some libraries required to communicate with Event Hubs.  While PowerShell does a wonderful job of handling things at the management plane of Azure, it is severely lacking in the data plane requiring you to fall back on incorporating .NET code into your PowerShell script.

The above (including the links) should give you the bare minimum you need to understand to use this solution.  Let’s deep dive into the code.  Since this is a fairly lengthy script I’m not going to paste every line of code.  Instead I’m going to call out key sections of code that were particularly relevant or interesting to write.

The first function in the script is called Get-AdalToken and uses the .NET ADAL library to retrieve a token from Azure AD.  When I code in Python I typically use the MSAL library since I find it to be a bit more slick, but found the .NET version too cumbersome and difficult to use in in PowerShell.  If you’ve ever used .NET libraries in your PowerShell scripts, you know where I’m coming from.

The token retrieved by the function is used for calls to the Azure Management REST API.  The reason I went with ADAL vs pulling the access token from a session created using Add-AzAccount method as demonstrated here is I wanted code I could reuse for other purposes outside of the Azure REST API.

Once the token is retrieved it is stored in a variable for later use in the script.

adal

Next up we have the Get-AllManagementGroups function.  This function calls the Azure REST API to get a full listing of management groups.  Oddly enough there is an AzureRM cmdlet included in the AzureRm.Resources module that comes preinstalled with every new Automation Account.  However, even after updating the modules within the account (this link tells you how to do this and I highly recommend doing it whenever you create a new automation account) the cmdlet only ever reported back the tenant root group.  This occurred even when following the instructions to spit back all Management Groups.  I chalked it up to there being an issue with the cmdlet or user error on my part.  Either way, it was simple enough to whip up a call to the REST API.

Following the Get-AllManagementGroups function we have the Get-ManagementGroupActivityLog function.  Let me tell you folks, this one was an absolute pain to write.  According to this Azure feedback thread these logs have been accessible over the API since back in March of this year, but the REST API reference documentation doesn’t look to have been updated to reflect this.  I’m going to save you all a ton of headaches and hours of experimentation and searching the web.  When you want to get Activity Logs over the REST API you are going to use the following endpoint:


https://management.azure.com/providers/Microsoft.Management/

managementGroups/mgmtGroupId/providers/microsoft.insights/
eventtypes/management/values

The mgmtGroupId variable would be the name of your management group.  If your management group is named production then the value in that URL would be production.  Additionally, you’ll want to pass query parameters of api-version set to 2017-03-01-preview and a $filter query parameter constructed in the same way you would to query a subscription Activity Log.

activitylogquery.PNG

The SendTo-Storage function sends the Activity Log for each Management Group as a separate blob to Azure Storage.  The format of the Activity Log is raw JSON.

The SentTo-Workspace function sends the log data to Azure Monitor (really a Log Analytics Workspace) via the HTTP Data Collector API.  The product team was wonderful enough to include sample PowerShell code that made writing that function a breeze.

I did run into some weirdness with this function which was caused by the maximum size of an output stream in Runbooks which is 1MB.  When I pulled the Activity Log for 90 days, the entirety of the log was well over 1MB so it would cause the Runbook to fail three times and suspend.  Debugging this was a pain because the Runbook doesn’t report the error in an obvious way.  I got around this by collecting the log entries into a group and sending them at 200KB intervals.    Additionally, I also added some error checking and retry handling if it got throttled.

The final function is named SendTo-EventHub and delivers the logs to an Event Hub.  I couldn’t find any PowerShell cmdlets that could be used to send data to Event Hub.  This forced me to fall back to the .NET libraries.  In the end I got it working and got them streaming, but I’m sure someone more skilled in .NET than me (which isn’t difficult to be) could optimize and improve that code.

The main chunk of the solution strings everything together.  By default the solution writes the logs to Azure blob storage.  You can optionally deliver the data to Azure Monitor and Azure Event Hubs.

Well folks that brings us to the end of this post and series.  While I’m sure the product team is quickly coming out with this out of box integration, I learned a ton about Azure Automation and Runbooks working on this effort.  Runbooks are a wonderful tool if you’re a classic infrastructure / security tech new to the whole coding thing.  It’s a very simple and straightforward user experience for that audience and a good stepping stone into the coding world vs jumping directly into Azure Functions.

I’ve posted the solution up onto my Github.  For those folks without Github, I’ve put a static copy of the solution up on this website at this link.  Take it, test it, play with it, build upon it, and experiment with it.

Capturing Azure Management Group Activity Logs Using Azure Automation – Part 1

Capturing Azure Management Group Activity Logs Using Azure Automation – Part 1

Hello again fellow geeks!

Over the past few months I’ve been working with a customer who is just beginning their journey into the cloud.  We’ve had a ton of great conversations around security, governance, and operationalizing Microsoft Azure.  We recently finalized the RACI and identified the controls required by both their internal security policy and their industry compliance requirements.  With those two items complete, we put together our Azure RBAC model and narrowed down the Azure Policies we needed to put in place to satisfy our compliance controls.

After a lot of discussion about the customer’s organization, its geographical locations, business unit makeup, and how its developers and central IT operate, we came up with a subscription model.  This customer had decided on an Azure subscription model where each workload would exist in its own subscription.  Further, each workload’s production and non-production environment would be segmented in different subscriptions.  Keeping each workload in a different subscription ensures no workload will compete for resources with other workloads and hit any subscription limits.  Additionally, it allowed the customer to very easily track the costs associated with each workload.

Now why did we use separate production and non-production subscriptions for each workload?  One reason is to address the same risk as above where a non-production workload could potentially consume all resources within a subscription impacting a production workload.  The other more critical reason is it makes it easier for us to apply different governance and access controls on production workloads vs non-production workloads.  The way we do this is through the usage of Azure Management Groups.

Management Groups were introduced into general availability back in late 2018 to help address the challenges organizations were having operating subscriptions at scale.  They provided a hierarchal method to apply governance and access controls across a collection of subscriptions.  For those of you familiar with AWS, Management Groups are somewhat similar to AWS Organizations and Organizational Units.  For my fellow Windows AD peeps, you can think of Management Groups somewhat like the Active Directory container and organizational unit hierarchy in an Active Directory domain where you apply different access control entries and group policy at high levels in the OU hierarchy that is then enforced and inherited down to the children.  Management Groups work in a similar manner in that the Azure RBAC definitions and assignments and Azure Policy you assign to the parent Management Groups are inherited down into the children.

Every Azure AD tenant starts with a top-level management group called the tenant root group.  Additional management groups created within the tenant are children of the group up to a maximum of 10,000 management groups and up to six levels of depth.  Any RBAC assignment or Azure Policy assigned to the tenant root group applies to all children management group in the tenant.  It’s important to understand that Management Groups are a resource within the Azure AD tenant and not a resource of an Azure subscription.  This will matter for reasons we’ll see later.

The tenant root management group can only be administered by a Global Admin by default and even this requires a configuration change in the tenant.  The method is describe here and what it does is places the global administrator performing the action in the User Access Administrator RBAC role at the root of scope.  Once that is complete, the name of the root management group could be changed, role assignments created, or policy assigned.

Screen Shot 2019-10-17 at 9.59.59 PM

Administering Tenant Root Group

Now there is one aspect of Management Groups that is a bit funky.  If you’re very observant you probably noticed the menu option below.

Screen Shot 2019-10-17 at 9.59.59 PM.png

That’s right folks, Management Groups have their own Activity Log.  Every action you perform at the management group scope such creating an Azure RBAC role assignment or assigning or un-assigning an Azure Policy is captured in this Activity Log.  Now as of today, the only way to access these logs is viewing them through the portal or through the Azure REST API.  Unlike the Activity Logs associated with a subscription, there isn’t native integration with Event Hubs or Azure Storage.  Don’t be fooled by the Export To Event Hub link seen in the screenshot below, this will simply send you to the standard menu where you would configure subscription Activity Logs to be exported.

Screen Shot 2019-10-17 at 10.34.19 PM

Now you could log into the GUI every day and export the logs to a CSV (yes that does work with Management Groups) but that simply isn’t scalable and also prevents you from proactively monitoring the logs.  So how do we deal with this gap while the product team works on incorporating the feature?  This will be the challenge we address in this series.

Over the next few posts I’ll walk through the solution I put together using Azure Automation Runbooks to capture these Activity Logs and send them to Azure Storage for retention and an Azure Log Analytics Workspace for analysis and monitoring using Azure Monitor.

Continue the series in my second post.

Deep Dive into Azure Managed Identities – Part 2

Welcome back fellow geeks for the second installment in my series on Azure Managed Identities.  In the first post I covered the business problem and the risks Managed Identities address and in this post I’ll be how managed identities are represented in Azure.

Let’s start by walking through the components that make managed identities possible.

The foundational component of any identity is the data store in which the identity lives in.  In the case of managed identities, like much of the rest of the identity data for the Microsoft cloud, the data store is Azure Active Directory.  For those of you coming from the traditional on-premises environment and who have had experience with your traditional directories such as Active Directory or one of the many flavors of LDAP, Azure Active Directory (Azure AD) is an Identity-as-a-Service which includes a directory component we can think of as a next generation directory.  This means it’s designed to be highly scalable, available, and resilient and be provided to you in “as a service” model where a simple management layer sits in front of all the complexities of the compute, network, and storage infrastructure that makes up the directory.  There are a whole bunch of other cool features such as modern authentication, contextual authorization, adaptive authentication, and behavioral analytics that come along with the solution so check out the official documentation to learn about those capabilities.  If you want to nerd out on the design of that infrastructure you can check out this whitepaper and this article.

It’s worthwhile to take a moment to cover Azure AD’s relationship to Azure.  Every resource in Azure is associated with an Azure subscription.  An Azure subscription acts as a legal and payment agreement (think type of Azure subscription, pay-as-you-go, Visual Studio, CSP, etc), boundary of scale (think limits to resources you can create in a subscription), and administrative boundary.  Each Azure subscription is associated with a single instance of Azure AD.  Azure AD acts as the security boundary for an organization’s space in Azure and serves as the identity backend for the Azure subscription.  You’ll often hear it referred to as “your tenant” (if you’re not familiar with the general cloud concept of tenancy check out this CSA article).

Azure AD stores lots of different object types including users, groups, and devices.  The object type we are interested in for the purposes of managed identity are service principals.  Service principals act as the security principals for non-humans (such as applications or Azure resources like a VM) in Azure AD.  These service principals are then granted permissions to access resources in Azure by being assigned permissions to Azure resources such as an instance of Azure Key Vault or an Azure Storage account.  Service principals are used for a number of purposes beyond just Managed Identities such as identities for custom developed applications or third-party applications

Given that the service principals can be used for different purposes, it only makes sense that the service principal object type includes an attribute called the serviceprincipaltype.  For example, a third-party or custom developed application that is registered with Azure AD uses the service principal type of Application while a managed identity has the value set to ManagedIdentity.  Let’s take a look at an example of the serviceprincipaltypes in a tenant.

In my Geek In The Weeds tenant I’ve created a few application identities by registering the applications and I’ve created a few managed identities.  Everything else within the tenant is default out of the box.  To list the service principals in the directory I used the AzureAD PowerShell module.  The cmdlet that can be used to list out the service principals is the Get-AzureADServicePrincipal.  By default the cmdlet will only return the 100 results, so you need to set the All parameter to true.  Every application, whether it’s Exchange Online or Power BI, it needs an identity in your tenant to interact with it and resources you create that are associated with the tenant.  Here are the serviceprincipaltypes in my Geek In The Weeds tenant.

serviceprincipaltype.PNG

Now we know the security principal used by a Managed Identity is stored in Azure AD and is represented by a service principal object.  We also know that service principal objects have different types depending on how they’re being used and the type that represents a managed identity has a type of ManagedIdentity.  If we want to know what managed identities exist in our directory, we can use this information to pull a list using the Get-AzureADServicePrincipal.

We’re not done yet!  Managed Identities also come in multiple flavors, either system-assigned or user-assigned.  System-assigned managed identities are the cooler of the two in that they share the lifecycle of the resource they’re used by.  For example, a system-assigned managed identity can be created when an Azure Function is created thus that the identity will be deleted once the Azure VM is deleted.  This presents a great option for mitigating the challenge of identity lifecycle management.  By Microsoft handling the lifecyle of these identities each resource could potentially have its own identity making it easier to troubleshoot issues with the identity, avoid potential outages caused by modifying the identity, adhering to least privilege and giving the identity only the permissions the resource requires, and cutting back on support requests by developers to info sec for the creation of identities.

Sometimes it may be desirable to share a managed identity amongst multiple Azure resources such as an application running on multiple Azure VMs.  This use case calls for the other type of managed identity, user-assigned.  These identities do not share the lifecycle of the resources using them.

Let’s take a look at the differences between a service principal object for a user-assigned vs a system-assigned managed identity.  Here I ran another Get-AzureADServicePrincipal and limited the results to serviceprincipaltype of ManagedIdentity.

ObjectId                           : a3e9d372-242e-424b-b97a-135116995d4b
ObjectType                         : ServicePrincipal
AccountEnabled                     : True
AlternativeNames                   : {isExplicit=False, /subscriptions//resourcegroups/managedidentity/providers/Microsoft.Compute/virtualMachines/systemmis}
AppId                              : b7fa9389-XXXX
AppRoleAssignmentRequired          : False
DisplayName                        : systemmis
KeyCredentials                     : {class KeyCredential {
                                       CustomKeyIdentifier: System.Byte[]
                                       EndDate: 11/11/2019 12:39:00 AM
                                       KeyId: f8e439a8-071b-45e0-9f8e-ac10b058a5fb
                                       StartDate: 8/13/2019 12:39:00 AM
                                       Type: AsymmetricX509Cert
                                       Usage: Verify
                                       Value:
                                     }
                                     }
ServicePrincipalNames              : {b7fa9389-XXXX, https://identity.azure.net/XXXX}
ServicePrincipalType               : ManagedIdentity
------------------------------------------------
ObjectId                           : ac960ac7-ca03-4ac0-a7b8-d458635b293b
ObjectType                         : ServicePrincipal
AccountEnabled                     : True
AlternativeNames                   : {isExplicit=True,
                                     /subscriptions//resourcegroups/managedidentity/providers/Microsoft.ManagedIdentity/userAssignedIdentities/testing1234}
AppId                              : fff84e09-XXXX
AppRoleAssignmentRequired          : False
AppRoles                           : {}
DisplayName                        : testing1234
KeyCredentials                     : {class KeyCredential {
                                       CustomKeyIdentifier: System.Byte[]
                                       EndDate: 11/7/2019 1:49:00 AM
                                       KeyId: b3c1808d-6778-4004-b23f-4d339ed0a91f
                                       StartDate: 8/9/2019 1:49:00 AM
                                       Type: AsymmetricX509Cert
                                       Usage: Verify
                                       Value:
                                     }
                                     }
ServicePrincipalNames              : {fff84e09-XXXX, https://identity.azure.net/XXXX}
ServicePrincipalType               : ManagedIdentity


In the above results we can see that the main difference between the user-assigned (testing1234) and system-assigned (systemmis) is the within the AlternativeNames property.  For the system-assigned identity has values of isExplicit set to False and has another value of /subscriptions//resourcegroups/managedidentity/
providers/Microsoft.Compute/virtualMachines/systemmis
Notice the bolded portion specifies this is being used by a virtual machine named systemmis.  The user-assigned identity has the isExplicit set to True and another property with the value of /subscriptions//resourcegroups/managedidentity/
providers/Microsoft.ManagedIdentity/userAssignedIdentities/testing1234
.  Here we can see the identity is an “explicit” managed identity and is not directly linked to an Azure resource.

This difference gives us the ability to quickly report on the number of system-assigned and user-assigned managed identities in a tenant by using the following command.

Get-AzureADServicePrincipal -All $True | Where-Object AlternativeNames -like “isExplicit=True*”

True would give us user-assigned and False would give us system-assigned.  Neat right?

Let’s summarize what we’ve learned:

  • An object in Azure Active Directory is created for each managed identity and represents its security principal
  • The type of object created is a service principal
  • There are multiple service principal types and the one used by a Managed Identity is called ManagedIdentity
  • There are two types of managed identities, user-assigned and system-assigned
  • System-assigned managed identities share the lifecycle of the resource they are associated with while user-assigned managed identities are created separately from the resource, do not share the resource lifecycle, and can be used across multiple resources
  • The object representing a user-assigned managed identity has a unique value of isExplicit=True for the AlternativeNames property while a system-assigned managed identity has that value of isExplicit=False.

That’s it for this post folks.  In the next post I’ll walk through the process of creating a managed identity for an Azure VM and will demonstrate with a bit of Python code how we can use the managed identity to access a secret stored in Azure Key Vault.

See you next post!

Deep Dive into Azure Managed Identities – Part 1

“I love the overhead of password management” said no one ever.

Password management is hard.  It’s even harder when you’re managing the credentials for non-humans, such as those used by an application.  Back in the olden days when the developer needed a way to access an enterprise database or file share, they’d put in a request with help desk or information security to have an account (often referred to as a service account) provisioned in Windows Active Directory, an LDAP, or a SQL database.  The request would go through a business approval and some support person would created the account, set the password, and email the information to the developer.  This process came with a number of risks:

  • Risk of compromise of the account
  • Risk of abuse of the account
  • Risk of a significant outage

These risks arise due to the following gaps in the process:

  • Multiple parties knowing the password (the party who provisions the account and the developer)
  • The password for the account being communicated to the developer unencrypted such as plain text in an email
  • The password not being changed after it is initially set due to the inability or difficult to change the password
  • The password not being regularly rotated due to concerns over application outages
  • The password being shared with other developers and the account then being used across multiple applications without the dependency being documented

Organizations tried to mitigate the risk of compromise by performing such actions as requiring a long and complex password, delivering the password in an encrypted format such as an encrypted Microsoft Office document, instituting policy requiring the password to be changed (exceptions with this one are frequent due to outage concerns), implementing password vaulting and management such as CyberArk Enterprise Password Vault or Hashicorp Vault, and instituting behavioral monitoring solutions to check for abuse.  Password rotation and monitoring are some of the more effective mitigations but can also be extremely challenging and costly to institute at a scale even with a vaulting and management solution.  Even then, there are always the exceptions to the systems with legacy applications which are not compatible (sadly these are often some of the more critical systems).

When the public cloud came around the credential management challenge for application accounts exploded due to the most favored traits of a public cloud which include on-demand self-service and rapid elasticity and scalability.  The challenge that was a few hundred application identities has grown quickly into thousands of applications and especially containers and serverless functions such as AWS Lambda and Azure Functions.  Beyond the volume of applications, the public cloud also changes the traditional security boundary due to its broad network access trait.  Instead of the cozy feeling multiple firewalls gave you, you now have developers using cloud services such as storage or databases which are directly administered via the cloud management plane which is exposed directly to the Internet.  It doesn’t stop here folks, you also have developers heavily using SaaS-based version control solutions to store the code which may have credentials hardcoded into it potentially publicly exposing those credentials.

Thankfully the public cloud providers have heard the cries of us security folk and have been working hard to help address the problem.  One method in use is the creation of security principals which are designed around the use of temporary credentials.  This way there are no long standing credentials to share, compromise, or abuse.  Amazon has robust use of this concept in AWS using IAM Roles.  Instead of hardcoding a set of IAM User credentials in a Lambda or an application running on an EC2 instance, a role can be created with the necessary permissions required for the application and be assumed by either the Lambda service or EC2 instance.

For this series of posts I’m going to be focusing in one of Microsoft Azure’s solutions to this problem which are called Managed Identities.  For you folk that are more familiar with AWS, Managed Identities conceptually work the same was as IAM Roles.  A security principal is created, permissions are granted, and the identity is assumed by a resource such as an Azure Web App or an Azure VM.  There are some features that differ from IAM Roles that add to the appeal of Managed Identities such as associating the identity lifecycle of the Managed Identity to the resource such that when the resource is created, the managed identity is created, and when the resource is destroyed, the identity is destroy.

In this series of posts I’ll be demonstrating how Managed Identities are created, how they are used, and how they differ (sometimes for the better and sometimes not) from AWS IAM Roles.  Hope you enjoy the series and except the next entry in the series early next week.

See you soon fellow geek!

Visualizing AWS Logging Data in Azure Monitor – Part 2

Visualizing AWS Logging Data in Azure Monitor – Part 2

Welcome back folks!

In this post I’ll be continuing my series on how Azure Monitor can be used to visualize log data generated by other cloud services.  In my last post I covered the challenges that multicloud brings and what Azure can do to help with it.  I also gave an overview of Azure Monitor and covered the design of the demo I put together and will be walking through in this post.  Please take a read through that post if you haven’t already.  If you want to follow along, I’ve put the solution up on Github.

Let’s quickly review the design of the solution.

Capture

This solution uses some simple Python code to pull information about the usage of AWS IAM User access id and secret keys from an AWS account.  The code runs via a Lambda and stores the Azure Log Analytics Workspace id and key in environment variables of the Lambda that are encrypted with an AWS KMS key.  The data is pulled from the AWS API using the Boto3 SDK and is transformed to JSON format.  It’s then delivered to the HTTP Data Collector API which places it into the Log Analytics Workspace.  From there, it becomes available to Azure Monitor to query and visualize.

Setting up an Azure environment for this integration is very simple.  You’ll need an active Azure subscription.  If you don’t have one, you can setup a free Azure account to play around.  Once you’re set with the Azure subscription, you’ll need to create an Azure Log Analytics Workspace.  Instructions for that can be found in this Microsoft article.  After the workspace has been setup, you’ll need to get the workspace id and key as referenced in the Obtain workspace ID and key section of this Microsoft article.  You’ll use this workspace ID and key to authenticate to the HTTP Data Collector API.

If you have a sandbox AWS account and would like to follow along, I’ve included a CloudFormation template that will setup the AWS environment.  You’ll need to have an AWS account with sufficient permissions to run the template and provision the resources.  Prior to running the template, you will need to zip up the lambda_function.py and put it on an AWS S3 bucket you have permissions on.  When you run the template you’ll be prompted to provide the S3 bucket name, the name of the ZIP file, the Log Analytics Workspace ID and key, and the name you want the API to assign to the log in the workspace.

The Python code backing the solution is pretty simple.  It uses all standard Python modules except for the boto3 module used to interact with AWS.

import json
import logging
import re
import csv
import boto3
import os
import hmac
import base64
import hashlib
import datetime

from io import StringIO
from datetime import datetime
from botocore.vendored import requests

The first function in the code parses the ARN (Amazon Resource Name) to extract the AWS account number.  This information is later included in the log data written to Azure.

# Parse the IAM User ARN to extract the AWS account number
def parse_arn(arn_string):
    acct_num = re.findall(r'(?<=:)[0-9]{12}',arn_string)
    return acct_num[0]

The second function uses the strftime method to transform the timestamp returned from the AWS API to a format that the Azure Monitor API will detect as a timestamp and make that particular field for each record in the Log Analytics Workspace a datetime type.

# Convert timestamp to one more compatible with Azure Monitor
def transform_datetime(awsdatetime):
transf_time = awsdatetime.strftime("%Y-%m-%dT%H:%M:%S")
return transf_time

The next function queries the AWS API for a listing of AWS IAM Users setup in the account and creates dictionary object representing data about that user. That object is added to a list which holds each object representing each user.

# Query for a list of AWS IAM Users
def query_iam_users():
    
    todaydate = (datetime.now()).strftime("%Y-%m-%d")
    users = []
    client = boto3.client(
        'iam'
    )

    paginator = client.get_paginator('list_users')
    response_iterator = paginator.paginate()
    for page in response_iterator:
        for user in page['Users']:
            user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
            users.append(user_rec)
    return users

The query_access_keys function queries the AWS API for a listing of the access keys that have been provisioned the AWS IAM User as well as the status of those keys and some metrics around the usage.  The resulting data is then added to a dictionary object and the object added to a list.  Each item in the list represents a record for an AWS access id.

# Query for a list of access keys and information on access keys for an AWS IAM User
def query_access_keys(user):
    keys = []
    client = boto3.client(
        'iam'
    )
    paginator = client.get_paginator('list_access_keys')
    response_iterator = paginator.paginate(
        UserName = user['username']
    )

    # Get information on access key usage
    for page in response_iterator:
        for key in page['AccessKeyMetadata']:
            response = client.get_access_key_last_used(
                AccessKeyId = key['AccessKeyId']
            )
            # Santize key before sending it along for export

            sanitizedacctkey = key['AccessKeyId'][:4] + '...' + key['AccessKeyId'][-4:]
            # Create new dictonionary object with access key information
            if 'LastUsedDate' in response.get('AccessKeyLastUsed'):

                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),
                'LastUsedDate':(transform_datetime(response['AccessKeyLastUsed']['LastUsedDate'])),
                'Region':response['AccessKeyLastUsed']['Region'],'Status':key['Status'],
                'ServiceName':response['AccessKeyLastUsed']['ServiceName']}
                keys.append(key_rec)
            else:
                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),'Status':key['Status']}
                keys.append(key_rec)
    return keys

The next two functions contain the code that creates and submits the request to the Azure Monitor API.  The product team was awesome enough to provide some sample code in the in the public documentation for this part.  The code is intended for Python 2 but only required a few small changes to make it compatible with Python 3.

Let’s first talk about the build_signature function.  At this time the API uses HTTP request signing using the Log Analytics Workspace id and key to authenticate to the API.  In short this means you’ll have two sets of shared keys per workspace, so consider the workspace your authorization boundary and prioritize proper key management (aka use a different workspace for each workload, track key usage, and rotate keys as your internal policies require).

Breaking down the code below, we the string that will act as the header includes the HTTP method, length of request content, a custom header of x-ms-date, and the REST resource endpoint.  The string is then converted to a bytes object, and an HMAC is created using SHA256 which is then base-64 encoded.  The result is the authorization header which is returned by the function.

def build_signature(customer_id, shared_key, date, content_length, method, content_type, resource):
    x_headers = 'x-ms-date:' + date
    string_to_hash = method + "\n" + str(content_length) + "\n" + content_type + "\n" + x_headers + "\n" + resource
    bytes_to_hash = bytes(string_to_hash, encoding="utf-8")  
    decoded_key = base64.b64decode(shared_key)
    encoded_hash = base64.b64encode(
        hmac.new(decoded_key, bytes_to_hash, digestmod=hashlib.sha256).digest()).decode()
    authorization = "SharedKey {}:{}".format(customer_id,encoded_hash)
    return authorization

Not much needs to be said about the post_data function beyond that it uses the Python requests module to post the log content to the API.  Take note of the limits around the data that can be included in the body of the request.  Key takeaways here is if you plan pushing a lot of data to the API you’ll need to chunk your data to fit within the limits.

def post_data(customer_id, shared_key, body, log_type):
    method = 'POST'
    content_type = 'application/json'
    resource = '/api/logs'
    rfc1123date = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
    content_length = len(body)
    signature = build_signature(customer_id, shared_key, rfc1123date, content_length, method, content_type, resource)
    uri = 'https://' + customer_id + '.ods.opinsights.azure.com' + resource + '?api-version=2016-04-01'

    headers = {
        'content-type': content_type,
        'Authorization': signature,
        'Log-Type': log_type,
        'x-ms-date': rfc1123date
    }

    response = requests.post(uri,data=body, headers=headers)
    if (response.status_code >= 200 and response.status_code <= 299):
        print("Accepted")
    else:
        print("Response code: {}".format(response.status_code))

Last but not least we have the lambda_handler function which brings everything together. It first gets a listing of users, loops through each user to information about the access id and secret keys usage, creates a log record containing information about each key, converts the data from a dict to a JSON string, and writes it to the API. If the content is successfully delivered, the log for the Lambda will note that it was accepted.

def lambda_handler(event, context):

    # Enable logging to console
    logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    try:

        # Initialize empty records array
        #
        key_records = []
        
        # Retrieve list of IAM Users
        logging.info("Retrieving a list of IAM Users...")
        users = query_iam_users()

        # Retrieve list of access keys for each IAM User and add to record
        logging.info("Retrieving a listing of access keys for each IAM User...")
        for user in users:
            key_records.extend(query_access_keys(user))
        # Prepare data for sending to Azure Monitor HTTP Data Collector API
        body = json.dumps(key_records)
        post_data(os.environ['WorkspaceId'], os.environ['WorkspaceKey'], body, os.environ['LogName'])

    except Exception as e:
        logging.error("Execution error",exc_info=True)

Once the data is delivered, it will take a few minutes for it to be processed and appear in the Log Analytics Workspace. In my tests it only took around 2-5 minutes, but I wasn’t writing much data to the API.  After the data processes you’ll see a new entry under the listing of Custom Logs in the Log Analytics Workspace.  The entry will be the log name you picked and with a _CL at the end.  Expanding the entry will display the columns that were created based upon the log entry.  Note that the columns consumed from the data you passed will end with an underscore and a character denoting the data type.

mylog

Now that the data is in the workspace, I can start querying it and creating some visualizations.  Azure Monitor uses the Kusto Query Language (KQL).  If you’ve ever created queries in Splunk, the language will feel familiar.

The log I created in AWS and pushed to the API has the following schema.  Note the addition of the underscore followed by a character denoting the column data type.

  • logged_Date (string) – The date the Lambda ran
  • user_s (string) – The AWS IAM User the key belongs to
  • account_number_s (string) – The AWS Account number the IAM Users belong to
  • AccessKeyId (string) – The id of the access key associated with the user which has been sanitized to show just the first 4 and last 4 characters
  • CreateDate_t (timestamp) – The date and time when the access key was created
  • LastUsedDate_t (timestamp) – The date and time the key was last used
  • Region_s (string) – The region where the access key was last used
  • Status_s (string) – Whether the key is enabled or disabled
  • ServiceName_s (string) – The AWS service where the access key was last used

In addition to what I’ve pushed, Azure Monitor adds a TimeGenerated field to each record which is the time the log entry was sent to Azure Monitor.  You can override this behavior and provide a field for Azure Monitor to use for this if you like (see here).  There are some other miscellaneous fields are inherited from whatever schema the API is drawing from.  These are fields such as TenantId and SourceSystem, which in this case is populated with RestAPI.

Since my personal AWS environment is quite small and the AWS IAM Users usage are very limited, my data sets aren’t huge.  To address this I created a number of IAM Users with access keys for the purpose blog.  I’m getting that out of the way so my AWS friends don’t hate on me. 🙂

One of core best practices in key management with shared keys is to ensure you rotate them.  The first data point I wanted to extract was which keys that existed in my AWS account were over 90 days old.  To do that I put together the following query:

AWS_Access_Key_Report_CL
| extend key_age = datetime_diff('day',now(),CreateDate_t)
| project Age=key_age,AccessKey=AccessKeyId_s, User=user_s
| where Age > 90
| sort by Age

Let’s walk through the query.  The first line tells the query engine to run this query against the AWS_Access_Key_Report_CL.  The next line creates a new field that contains the age of the key by determining the amount of time that has passed between the creation date of the key and today’s date.  The line after that instructs the engine to pull back only the key_age field I just created and the AccessKeyId_s, user_s , and status_s fields.  The results are then further culled down to pull only records where the key age is greater than 90 days and finally the results are sorted by the age of the key.

query1

Looks like it’s time to rotate that access key in use by Azure AD. 🙂

I can then pin this query to a new shared dashboard for other users to consume.  Cool and easy right?  How about we create something visual?

Looking at the trends in access key creation can provide some valuable insights into what is the norm and what is not.  Let’s take a look a the metrics for key creation (of the keys still exist in an enabled/disabled state).  For that I’m going to use the following query:

AWS_Access_Key_Report_CL
| make-series AccessKeys=count() default=0 on CreateDate_t from datetime(2019-01-01) to datetime(2020-01-01) step 1d

In this query I’m using the make-series operator to count the number of access keys created each day and assigning a default value of 0 if there are no keys created on that date.  The result of the query isn’t very useful when looking at it in tabular form.

query2.PNG

By selecting the Line drop down box, I can transform the date into a line grab which shows me spikes of creation in log creation.  If this was real data, investigation into the spike of key creations on 6/30 may be warranted.

quer2_2.PNG

I put together a few other visuals and tables and created a custom dashboard like the below.  Creating the dashboard took about an hour so, with much of the time invested in figuring out the query language.

dashboard

What you’ve seen here is a demonstration of the power and simplicity of Azure Monitor.  By adding a simple to use API, Microsoft has exponentially increased the agility of the tool by allowing it to become a single pane of glass for monitoring across clouds.   It’s also worth noting that Microsoft’s BI (business intelligence) tool Power BI has direct integration with Azure Log Analytics.  This allows you to pull that log data into PowerBI and perform more in-depth analysis and to create even richer visualizations.

Well folks, I hope you’ve found this series of value.  I really enjoyed creating it and already have a few additional use cases in mind.  Make sure to follow me on Github as I’ll be posting all of the code and solutions I put together there for your general consumption.

Have a great day!