Azure Key Vault, Certificates, and Python, oh my!

Azure Key Vault, Certificates, and Python, oh my!

Welcome back folks! I recently had a few customers ask me about using certificates with Azure Key Vault and switching from using a client secret to a client certificate for their Azure AD (Active Directory) service principals. The questions put me on a path of diving deeper around the topics which results in some great learning and opportunity to create some Python code samples.

Azure Key Vault is Microsoft’s solution for secure secret, key, and credential management. If you’re coming from the AWS (Amazon Web Services) realm, you can think of it as AWS KMS (Key Management Services) with a little bit of AWS Secrets Manager and AWS Certificate Manager thrown in there. The use cases for secrets and keys are fairly well known and straightforward, so I’m going instead focus time on the certificates use case.

In a world where passwordless is the newest buzzword, there is an increasing usage of secrets (or passwords) in the non-human world. These secrets are often used to programmatically interact with APIs. In the Microsoft world you have your service principals and client secrets, in the AWS world you have your IAM Users with secret access keys, and many more third-parties out there require similar patterns that require the use of an access key. Vendors like Microsoft and AWS have worked to mitigate this growing problem in the scope of their APIs by introducing features such as Azure Managed Identities and AWS IAM Roles which use short lived dynamic secrets. However, both of these solutions work only if your workload is running within the relevant public cloud and the service it’s running within supports the feature. What about third-party APIs, multi-cloud workloads, or on-premises workloads? In those instances you’re many of times forced to fall back to the secret keys.

There is a better option to secret keys, and that is client certificates. While a secret falls into the “something you know” category, client certificates fall into the “something you have” category. They provide an higher assurance of identity (assuming you exercise good key management practices) and can have more flexibility in their secure storage and usage. Azure Service Principals support certificate-based authentication in addition to client secrets and Azure Key Vault supports the secure storage certificates. Used in combination, it can yield some pretty cool patterns.

Before I get into those patterns, I want to cover some of the basics in how Azure Key Vault stores certificates. There are some nuances to how it’s designed that is incredibly useful to understand. I’m not going to provide a deep dive on the inner workings of Key Vault, the public documentation does a decent enough job of that, but I am going to cover some of the basics which will help get you up and running.

Certificates can be both imported into and generated within Azure Key Vault. These certificates generated can be self-signed, generated from a selection of public CAs (certificate authorities) it is integrated with, or can be used to generate a CSR (certificate signing request) you can full-fill with your own CA. These processes are well detailed in the documentation, so I won’t be touching further on them.

Once you’ve imported or generated a certificate and private key into Key Vault, we get into the interesting stuff. The components of the certificate and private key are exposed in different ways through different interfaces as seen below.

Key Vault and certificates

Metadata about the certificate and the certificate itself are accessible via the certificates interface. This information includes the certificate itself provided in DER (distinguished encoded rules) format, properties of the certificate such as the expiration date, and metadata about the private key. You’ll use this interface to get a copy of the certificate (minus private key) or pull specific properties of the certificate such as the thumbprint.

Operations using the private key such as sign, verify, encrypt, and decrypt, are made available through the key interface. Say you want to sign a JWT (JSON Web Token) to authenticate to an API, you would use this interface.

Lastly, the private key is available through the secret interface. This is where you could retrieve the private key in PEM (privacy enhanced mail) or PKCS#12 (public key cryptography standards) format if you’ve set the private key to be exportable. Maybe you’re using a library like MSAL (Microsoft Authentication Library) which requires the private key as an input when obtaining an OAuth access token using a confidential client.

Now that you understand those basics, let’s look at some patterns that you could leverage.

In the first pattern consider that you have a CI/CD (continuous integration / continuous delivery) running on-premises that you wish to use to provision resources in Azure. You have a strict requirement from your security team that the infrastructure remain on-premises. In this scenario you could provision a service principal that is configured for certificate authentication and use the MSAL libraries to authenticate to Azure AD to obtain the access tokens needed to access the ARM API (Azure Resource Manager). Here is Python sample code demonstrating this pattern.

On-premises certificate authentication with MSAL

In the next pattern let’s consider you have a workload running in the Azure AD tenant you dedicate to internal enterprise workloads. You have a separate Azure AD tenant used for customer workloads. Within an Azure subscription associated with the customer tenant, there is an instance of Azure Event Hub you need to access from a workload running in the enterprise tenant. For this scenario you could use a pattern where the workload running in the enterprise tenant uses an Azure Managed Identity to retrieve a client certificate and private key from Key Vault to use with the MSAL library to obtain an access token for a service principal in the customer tenant which it will use to access the Event Hub.

Here is some sample Python code that could be used to demonstrate this pattern.

For the last pattern, let’s consider you have the same use case as above, but you are using the Premium SKU of Azure Key Vault because you have a regulatory requirement that the private key never leaves the HSM (hardware security module) and all cryptographic operations are performed on the HSM. This takes MSAL out of the picture because MSAL requires the private key be provided as a variable when using a client certificate for authentication of the OAuth client. In this scenario you can use the key interface of Key Vault to sign the JWT used to obtain the access token from Azure AD. This same pattern could be leveraged for other third-party APIs that support certificate-based authentication.

Here is a Python code sample of this pattern.

Well folks I’m going to keep it short and sweet. Hopefully this brief blog post has helped to show you the value of Key Vault and provide some options to you for moving away from secret-based credentials for your non-human access to APIs. Additionally, I really hope you get some value out of the Python code samples. I know there is a fairly significant gap in Python sample code for these types of operations, so hopefully this begins filling it.

Thanks!

Python Sample Web App and API for Azure AD B2C

Python Sample Web App and API for Azure AD B2C

Hello again folks.

I’ve recently had a number of inquiries on Microsoft’s AAD (Azure Active Directory) B2C (Business-To-Consumer) offering. For those infrastructure folks who have had to manage customer identities in the past, you know the pain of managing these identities with legacy solutions such as LDAP (Lighweight Directory Access Protocol) servers or even a collection of Windows AD (Active Directory) forests. Developers have suffered along with us carrying the burden of securely implementing the technologies into their code.

AAD B2C exists to make the process easier by providing a modern IDaaS (identity-as-a-service) offering complete with a modern directory accessible over a Restful API, support for modern authentication and authorization protocols such as SAML, Open ID Connect, and OAuth, advanced features such as step-up authentication, and a ton of other bells and whistles. Along with these features, Microsoft also provides a great library in the form of the Microsoft Authentication Library (MSAL).

It had been just about 4 years since I last experimented with AAD B2C, so I was do for a refresher. Like many people, I learn best from reading and doing. For the doing step, I needed an application I could experiment with. My first stop was the samples Microsoft provides. The Python pickings are very slim. There is a basic web application Ray Lou put together which does a great job demonstrating basic authentication (and I used as a base for my web app). However, I wanted to test additional features like step-up authentication and securing a custom-built API with AAD B2C.

So began my journey to create the web app and web API I’ll be walking through setting up with this post. Over the past few weeks I spent time diving into the Flask web framework and putting my subpar Python skills to work. After many late nights and long weekends spent reading documentation and troubleshooting with Fiddler, I finished the solution which costs of a web app and web API.

Solution Design

The solution is quite simple . It is intended to simulate a scenario where a financial services institution is providing a customer access to their insurance policy information. The insurance policy information is pulled from a legacy system that has been wrapped in a modern web-based API. The customer can view their account information and change the beneficiary on the policy.

AAD B2C provides the authentication to the application via Open ID Connect. Delegated access to the user’s policy information in the web API is handled through OAuth via an access token obtained by the web app from AAD B2C. Finally, when changing the beneficiary, the user is prompted for MFA which demonstrates the step-up authentication capabilities of AAD B2C.

The solution uses four User Flows. It has a profile editing user flow which allows the user to change information stored in the AAD B2C directory about the user, a password reset flow to allow the user to change the password for their local AAD B2C identity and two sign-up / sign-in flows. One sign-in / sign-up flow is enabled for MFA (multi-factor authentication) and one is not. The non-MFA enabled flow is the kicked off at login to the web app while the MFA enabled flow is used when the user attempts to change the beneficiary.

With the basics on the solution explained, let’s jump in to how to set it up. Keep in mind I’ll be referring to public documentation where it makes sense to avoid reinventing the wheel. For prerequisites you’ll need to have Python 3 installed and running on your machine, a code editor (personally I use Visual Studio Code), and an Azure AD tenant and Azure subscription.

The first think you’ll want to do is setup a AAD B2C directory as per these instructions. Once that is complete, you’ll now want to register the web app and web API. To do this you’ll want to switch to your B2C directory and select the App registrations link. On the next screen select the New registration link.

In the Register an application screen you’ll need to fill in some information. You can name the application whatever you’d like. Leave the Who can use this application or access this API set to the option in the screenshot since this will be a B2C app. In the redirect URI put in http://localhost:5000/getAToken. This is the URI AAD B2C will redirect the user to after the user successfully authenticates and obtains an ID token and access token. Leave the Grant admin consent to openid and offline_access permissions option checked. Once complete hit the Register button. This process creates an identity for the application in the B2C directory and authorizes it to obtain ID tokens and access tokens from B2C.

Register an application

Next we need to gather some information from the newly registered application. If you navigate back to App registration you’ll see your new app. Click on it to open up the application specific settings. On the next screen we get the Application ID (Client ID in OAuth terms). Record this because you’ll need it later.

App registration

So you have an client ID which identifies us to B2C, but you need to a credential to authenticate with. For that you’ll go to Certificates & secrets link. Click on the New client secret button to generate a new credential and save this for later.

You will need to register one additional redirect URI. This redirect URI is used when the user authenticates with MFA during the step-up process. Go back to the Overview and click on the Redirect URIs menu item on the top section.

Register additional URI

Once the new page loads, add a redirect URI which is found under the web section. The URI you will need to add is http://localhost:5000/getATokenMFA. Make sure to save your changes by hitting the Save button.

Now that your web app is registered, you need to register the web API. Once again, click the Register an application link under the Application view. You’ll want to select the same options except you do not need to provide a redirect URI. Click the Register button to complete the registration.

Once the app is registered, go into the application settings and click on the Expose an API link. Here you need to list two scopes which are included in the token and authorize access to the app to perform specific functions such as reading account information and writing account information in the API. Click the Add a scope button and you’ll be prompted to set an Application ID URI which you need to set to api. Once you’ve set it, hit the Save and continue button.

Application ID URI

The screen will refresh you’ll be able to add your first scope. I have defined two scopes within the web API, one called Accounts.Read which grants access to read account information and one for Accounts.Write which grants access to edit account information. Create the scope for the Accounts.Read and repeat the process for Accounts.Write.

By default B2C grants application registered with it the offline_access and openid permissions for Microsoft Graph. Since our API won’t be authenticating the user and will simply be verifying the access token passed by the web app, we could remove those permissions if we want. You can do this through the API permissions link which is located on the application settings page.

The last step you have in the B2C portion of Azure is to grant your web app the permission to request an access token for the Accounts.Read and Accounts.Write scopes used by the web API. To do this you need to go back into the application settings for the web app and go to the API permissions link. Click the Add a permission button. In the Request API permissions window, select My APIs option and select the web API you registered. From there select the two permissions and click the Add permissions button.

To finish up with the permissions piece you’ll grant admin consent to permissions. At the API permissions window, click the Grant admin consent for <YOUR TENANT NAME> button.

Admin consent to permissions

At this point we’ve registered the web app and web API with Azure B2C. We now need to enable some user flows. Azure B2C has an insanely powerful policy framework that powers the behavior of B2C behind the scenes that allow you to do pretty much whatever you can think of. User flows are predefined policies that provide common tasks without you having to go an learn a new policy framework.

For this solution you will need to setup four user flows. You’ll need to create a sign-up and sign-in flow with MFA disabled, another with MFA-enabled, a password reset flow, and profile editing flow. Name the flows as listed in the image below.

User flow list

The sign-up and sign-in flow can be configured to collect and return specific claim values in the user’s ID token. You configure this by clicking on the user flow you want to modify and navigating to the user attributes and application claims sections as seen in the image below.

Sign in / Sign up claims/attributes

For this solution ensure you are collecting and returning in claims at least the Display Name, Email Address, Given Name, and Surname. Configure this in both the MFA and non-MFA enabled sign-up and sign-in policies.

Now it’s time to get the apps up and running. For this section I’ll be assuming you’re using Visual Studio Code.

Open up an instance of Visual Studio Code and clone the repository https://github.com/mattfeltonma/python-b2c-sample. For lazy people like myself, I use the CTRL+SHIFT+P to open up the command windows in Visual Studio, search for the Git:Clone command and type in the directory.

Once the repo has cloned you’ll want to open two Terminal instances. You can do this with CTRL+SHIFT+` hotkey. In your first terminal navigate python-b2c-web-app directory. Create a new directory named flask_session. This directory will serve as the store for server side sessions for the flask-sessions module.

In the same directory create a new directory named env and create a Python virtual environment. Repeat the process in the other terminal under the python-simple-web-api directory.

You do this using the command:

python -m venv env

Once the virtual environments will need to activate the virtual environments. You will need to do this for both the web app and web API. Do this using the following command while you are under the relevant b2c-web-app or simple-web-api directories:

env\Scripts\activate

Next, load the required libraries. Do this using the command below. Repeat it again for both the web app and web api.

pip install -r requirements.txt

The environments are now ready to go. Next up you need to set some user variables. Within the terminal for the b2c-web-app create user variables for the following:

  • CLIENT_ID set to client secret of the web app. This is available in the B2C portal
  • CLIENT_SECRET set to the secret of the web app. If you didn’t record this when you generated it you will need to create a new credential.
  • B2C_DIR set to the name of your B2C tenant. This should be a single label name such as myb2c
  • FLASK_APP set too app.py. This tells Flask what file to execute.

Within the terminal for the simple-web-api create user variables for the following:

  • TENANT_NAME set to the name of your B2C tenant. This should be a single label name such as myb2c
  • TENANT_ID set to the tenant ID of your B2C tenant.
  • B2C_POLICY set to the name of your non-MFA B2C policy. For example, B2C_1_signupsignin1.
  • CLIENT_ID set to the client ID of the web API. This is available in the B2C portal
  • FLASK_APP set to app.py. This tells Flask what file to execute.

In Windows you can set these variables by using the following command:

set FLASK_APP=app.py

The last step you need to take is to modify the accounts.json file in the python-simple-web-api directory. You will need to add sample records for each user identity you want to test. The email value of the account object must match the email address being presented in the access token.

Now you need to startup the web server. To do this you’ll use the flask command. In the terminal you setup for the python-b2c-web-app, run the following command:

flask run -h localhost -p 5000

Then in the terminal for the python-simple-web-api, run the following command:

flask run -h localhost -p 5001

You’re now ready to test the app! Open up a web browser and go to http://localhost:5000. You’ll be redirected to the login page seen below.

Clicking the Sign-In button will open up the B2C sign-in page. Here you can sign-in with an existing B2C account or create a new one.

After successfully logging in you’ll be presented with a simple home page. The Test API link will bring you to the public endpoint of the web API validating that the API is reachable and running. The Edit Profile link will redirect you to the B2C Edit Profile experience. Clicking the My Claims link will display the claims in your ID token as seen below.

Clicking the My Account calls the web API and queries the accounts.json file for a record for the user. The contents of the record are then displayed as seen below.

Clicking on the Change Beneficiary button will kick off the second MFA-enabled sign-in and sign-up user flow prompting the user for MFA. After successful MFA, the user is redirected to a page where they make the change to the record. Clicking the submit button modifies the accounts.json file with the new value and the user is redirected to the My Account page.

Well folks that’s it! Experiment, enjoy, and improve it. Over the next few weeks I’m going planning on putting some newly practiced Docker skills to the test by containerizing the applications.

You can get the solution here.

Thanks everyone!

Debugging Azure SDK for Python Using Fiddler

Debugging Azure SDK for Python Using Fiddler

Hi there folks.  Recently I was experimenting with the Azure Python SDK when I was writing a solution to pull information about Azure resources within a subscription.  A function within the solution was used to pull a list of virtual machines in a given Azure subscription.  While writing the function, I recalled that I hadn’t yet had experience handling paged results the Azure REST API which is the underlining API being used by the SDK.

I hopped over to the public documentation to see how the API handles paging.  Come to find out the Azure REST API handles paging in a similar way as the Microsoft Graph API by returning a nextLink property which contains a reference used to retrieve the next page of results.  The Azure REST API will typically return paged results for operations such as list when the items being returned exceed 1,000 items (note this can vary depending on the method called).

So great, I knew how paging was used.  The next question was how the SDK would handle paged results.  Would it be my responsibility or would it by handled by the SDK itself?

If you have experience with AWS’s Boto3 SDK for Python (absolutely stellar SDK by the way) and you’ve worked in large environments, you are probably familiar with the paginator subclass.  Paginators exist for most of the AWS service classes such as IAM and S3.  Here is an example of a code snipped from a solution I wrote to report on aws access keys.

def query_iam_users():

todaydate = (datetime.now()).strftime("%Y-%m-%d")
users = []
client = boto3.client(
'iam'
)

paginator = client.get_paginator('list_users')
response_iterator = paginator.paginate()
for page in response_iterator:
for user in page['Users']:
user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
users.append(user_rec)
return users

Paginators make handling paged results a breeze and allow for extensive flexibility in controlling how paging is handled by the underlining AWS API.

Circling back to the Azure SDK for Python, my next step was to hop over to the SDK public documentation.  Navigating the documentation for the Azure SDK (at least for the Python SDK, I can’ t speak for the other languages) is a bit challenging.  There are a ton of excellent code samples, but if you want to get down and dirty and create something new you’re going to have dig around a bit to find what you need.  To pull a listing of virtual machines, I would be using the list_all method in VirtualMachinesOperations class.  Unfortunately I couldn’t find any reference in the documentation to how paging is handled with the method or class.

So where to now?  Well next step was the public Github repo for the SDK.  After poking around the repo I located the documentation on the VirtualMachineOperations class.  Searching the class definition, I was able to locate the code for the list_all() method.  Right at the top of the definition was this comment:

Use the nextLink property in the response to get the next page of virtual
machines.

Sounds like handling paging is on you right?  Not so fast.  Digging further into the method I came across the function below.  It looks like the method is handling paging itself releasing the consumer of the SDK of the overhead of writing additional code.

        def internal_paging(next_link=None):
            request = prepare_request(next_link)

            response = self._client.send(request, stream=False, **operation_config)

            if response.status_code not in [200]:
                exp = CloudError(response)
                exp.request_id = response.headers.get('x-ms-request-id')
                raise exp

            return response

I wanted to validate the behavior but unfortunately I couldn’t find any documentation on how to control the page size within the Azure REST API.  I wasn’t about to create 1,001 virtual machines so instead I decided to use another class and method in the SDK.  So what type of service would be a service that would return a hell of a lot of items?  Logging of course!  This meant using the list method of the ActivityLogsOperations class which is a subclass of the module for Azure Monitor and is used to pull log entries from the Azure Activity Log.  Before I experimented with the class, I hopped back over to Github and pulled up the source code for the class.  Low and behold we an internal_paging function within the list method that looks very similar to the one for the list_all vms.

        def internal_paging(next_link=None):
            request = prepare_request(next_link)

            response = self._client.send(request, stream=False, **operation_config)

            if response.status_code not in [200]:
                raise models.ErrorResponseException(self._deserialize, response)

            return response

Awesome, so I have a method that will likely create paged results, but how do I validate it is creating paged results and the SDK is handling them?  For that I broke out one of my favorite tools Telerik’s Fiddler.

There are plenty of guides on Fiddler out there so I’m going to skip the basics of how to install it and get it running.  Since the calls from the SDK are over HTTPS I needed to configure Fiddler to intercept secure web traffic.  Once Fiddler was up and running I popped open Visual Studio Code, setup a new workspace, configured a Python virtual environment, and threw together the lines of code below to get the Activity Logs.

from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.monitor import MonitorManagementClient

TENANT_ID = 'mytenant.com'
CLIENT = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
KEY = 'XXXXXX'
SUBSCRIPTION = 'XXXXXX-XXXX-XXXX-XXXX-XXXXXXXX'

credentials = ServicePrincipalCredentials(
    client_id = CLIENT,
    secret = KEY,
    tenant = TENANT_ID
)
client = MonitorManagementClient(
    credentials = credentials,
    subscription_id = SUBSCRIPTION
)

log = client.activity_logs.list(
    filter="eventTimestamp ge '2019-08-01T00:00:00.0000000Z' and eventTimestamp le '2019-08-24T00:00:00.0000000Z'"
)

for entry in log:
    print(entry)

Let me walk through the code quickly.  To make the call I used an Azure AD Service Principal I had setup that was granted Reader permissions over the Azure subscription I was querying.  After obtaining an access token for the service principal, I setup a MonitorManagementClient that was associated with the Azure subscription and dumped the contents of the Activity Log for the past 20ish days.  Finally I incremented through the results to print out each log entry.

When I ran the code in Visual Studio Code an exception was thrown stating there was an certificate verification error.

requests.exceptions.SSLError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /mytenant.com/oauth2/token (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)')))

The exception is being thrown by the Python requests module which is being used underneath the covers by the SDK.  The module performs certificate validation by default.  The reason certificate verification is failing is Fiddler uses a self-signed certificate when configured to intercept secure traffic when its being used as a proxy.  This allows it to decrypt secure web traffic sent by the client.

Python doesn’t use the Computer or User Windows certificate store so even after you trust the self-signed certificate created by Fiddler, certificate validation still fails.  Like most cross platform solutions it uses its own certificate store which has to be managed separately as described in this Stack Overflow article.  You should use the method described in the article for any production level code where you may be running into this error, such as when going through a corporate web proxy.

For the purposes of testing you can also pass the parameter verify with the value of False as seen below.  I can’t stress this enough, be smart and do not bypass certificate validation outside of a lab environment scenario.

requests.get('https://somewebsite.org', verify=False)

So this is all well and good when you’re using the requests module directly, but what if you’re using the Azure SDK?  To do it within the SDK we have to pass extra parameters called kwargs which the SDK refers to as an Operation config.  The additional parameters passed will be passed downstream to the methods such as the methods used by the requests module.

Here I modified the earlier code to tell the requests methods to ignore certificate validation for the calls to obtain the access token and call the list method.

from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.monitor import MonitorManagementClient

TENANT_ID = 'mytenant.com'
CLIENT = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
KEY = 'XXXXXX'
SUBSCRIPTION = 'XXXXXX-XXXX-XXXX-XXXX-XXXXXXXX'

credentials = ServicePrincipalCredentials(
    client_id = CLIENT,
    secret = KEY,
    tenant = TENANT_ID,
    verify = False
)
client = MonitorManagementClient(
    credentials = credentials,
    subscription_id = SUBSCRIPTION,
    verify = False
)

log = client.activity_logs.list(
    filter="eventTimestamp ge '2019-08-01T00:00:00.0000000Z' and eventTimestamp le '2019-08-24T00:00:00.0000000Z'",
    verify = False
)

for entry in log:
    print(entry)

After the modifications the code ran successfully and I was able to verify that the SDK was handling paging for me.

fiddler.png

Let’s sum up what we learned:

  • When using an Azure SDK leverage the Azure REST API reference to better understand the calls the SDK is making
  • Use Fiddler to analyze and debug issues with the Azure SDK
  • Never turn off certificate verification in a production environment and instead validate the certificate verification error is legitimate and if so add the certificate to the trusted store
  • In lab environments, certificate verification can be disabled by passing an additional parameter of verify=False with the SDK method

Hope that helps folks.  See you next time!

Visualizing AWS Logging Data in Azure Monitor – Part 2

Visualizing AWS Logging Data in Azure Monitor – Part 2

Welcome back folks!

In this post I’ll be continuing my series on how Azure Monitor can be used to visualize log data generated by other cloud services.  In my last post I covered the challenges that multicloud brings and what Azure can do to help with it.  I also gave an overview of Azure Monitor and covered the design of the demo I put together and will be walking through in this post.  Please take a read through that post if you haven’t already.  If you want to follow along, I’ve put the solution up on Github.

Let’s quickly review the design of the solution.

Capture

This solution uses some simple Python code to pull information about the usage of AWS IAM User access id and secret keys from an AWS account.  The code runs via a Lambda and stores the Azure Log Analytics Workspace id and key in environment variables of the Lambda that are encrypted with an AWS KMS key.  The data is pulled from the AWS API using the Boto3 SDK and is transformed to JSON format.  It’s then delivered to the HTTP Data Collector API which places it into the Log Analytics Workspace.  From there, it becomes available to Azure Monitor to query and visualize.

Setting up an Azure environment for this integration is very simple.  You’ll need an active Azure subscription.  If you don’t have one, you can setup a free Azure account to play around.  Once you’re set with the Azure subscription, you’ll need to create an Azure Log Analytics Workspace.  Instructions for that can be found in this Microsoft article.  After the workspace has been setup, you’ll need to get the workspace id and key as referenced in the Obtain workspace ID and key section of this Microsoft article.  You’ll use this workspace ID and key to authenticate to the HTTP Data Collector API.

If you have a sandbox AWS account and would like to follow along, I’ve included a CloudFormation template that will setup the AWS environment.  You’ll need to have an AWS account with sufficient permissions to run the template and provision the resources.  Prior to running the template, you will need to zip up the lambda_function.py and put it on an AWS S3 bucket you have permissions on.  When you run the template you’ll be prompted to provide the S3 bucket name, the name of the ZIP file, the Log Analytics Workspace ID and key, and the name you want the API to assign to the log in the workspace.

The Python code backing the solution is pretty simple.  It uses all standard Python modules except for the boto3 module used to interact with AWS.

import json
import logging
import re
import csv
import boto3
import os
import hmac
import base64
import hashlib
import datetime

from io import StringIO
from datetime import datetime
from botocore.vendored import requests

The first function in the code parses the ARN (Amazon Resource Name) to extract the AWS account number.  This information is later included in the log data written to Azure.

# Parse the IAM User ARN to extract the AWS account number
def parse_arn(arn_string):
    acct_num = re.findall(r'(?<=:)[0-9]{12}',arn_string)
    return acct_num[0]

The second function uses the strftime method to transform the timestamp returned from the AWS API to a format that the Azure Monitor API will detect as a timestamp and make that particular field for each record in the Log Analytics Workspace a datetime type.

# Convert timestamp to one more compatible with Azure Monitor
def transform_datetime(awsdatetime):
transf_time = awsdatetime.strftime("%Y-%m-%dT%H:%M:%S")
return transf_time

The next function queries the AWS API for a listing of AWS IAM Users setup in the account and creates dictionary object representing data about that user. That object is added to a list which holds each object representing each user.

# Query for a list of AWS IAM Users
def query_iam_users():
    
    todaydate = (datetime.now()).strftime("%Y-%m-%d")
    users = []
    client = boto3.client(
        'iam'
    )

    paginator = client.get_paginator('list_users')
    response_iterator = paginator.paginate()
    for page in response_iterator:
        for user in page['Users']:
            user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
            users.append(user_rec)
    return users

The query_access_keys function queries the AWS API for a listing of the access keys that have been provisioned the AWS IAM User as well as the status of those keys and some metrics around the usage.  The resulting data is then added to a dictionary object and the object added to a list.  Each item in the list represents a record for an AWS access id.

# Query for a list of access keys and information on access keys for an AWS IAM User
def query_access_keys(user):
    keys = []
    client = boto3.client(
        'iam'
    )
    paginator = client.get_paginator('list_access_keys')
    response_iterator = paginator.paginate(
        UserName = user['username']
    )

    # Get information on access key usage
    for page in response_iterator:
        for key in page['AccessKeyMetadata']:
            response = client.get_access_key_last_used(
                AccessKeyId = key['AccessKeyId']
            )
            # Santize key before sending it along for export

            sanitizedacctkey = key['AccessKeyId'][:4] + '...' + key['AccessKeyId'][-4:]
            # Create new dictonionary object with access key information
            if 'LastUsedDate' in response.get('AccessKeyLastUsed'):

                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),
                'LastUsedDate':(transform_datetime(response['AccessKeyLastUsed']['LastUsedDate'])),
                'Region':response['AccessKeyLastUsed']['Region'],'Status':key['Status'],
                'ServiceName':response['AccessKeyLastUsed']['ServiceName']}
                keys.append(key_rec)
            else:
                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),'Status':key['Status']}
                keys.append(key_rec)
    return keys

The next two functions contain the code that creates and submits the request to the Azure Monitor API.  The product team was awesome enough to provide some sample code in the in the public documentation for this part.  The code is intended for Python 2 but only required a few small changes to make it compatible with Python 3.

Let’s first talk about the build_signature function.  At this time the API uses HTTP request signing using the Log Analytics Workspace id and key to authenticate to the API.  In short this means you’ll have two sets of shared keys per workspace, so consider the workspace your authorization boundary and prioritize proper key management (aka use a different workspace for each workload, track key usage, and rotate keys as your internal policies require).

Breaking down the code below, we the string that will act as the header includes the HTTP method, length of request content, a custom header of x-ms-date, and the REST resource endpoint.  The string is then converted to a bytes object, and an HMAC is created using SHA256 which is then base-64 encoded.  The result is the authorization header which is returned by the function.

def build_signature(customer_id, shared_key, date, content_length, method, content_type, resource):
    x_headers = 'x-ms-date:' + date
    string_to_hash = method + "\n" + str(content_length) + "\n" + content_type + "\n" + x_headers + "\n" + resource
    bytes_to_hash = bytes(string_to_hash, encoding="utf-8")  
    decoded_key = base64.b64decode(shared_key)
    encoded_hash = base64.b64encode(
        hmac.new(decoded_key, bytes_to_hash, digestmod=hashlib.sha256).digest()).decode()
    authorization = "SharedKey {}:{}".format(customer_id,encoded_hash)
    return authorization

Not much needs to be said about the post_data function beyond that it uses the Python requests module to post the log content to the API.  Take note of the limits around the data that can be included in the body of the request.  Key takeaways here is if you plan pushing a lot of data to the API you’ll need to chunk your data to fit within the limits.

def post_data(customer_id, shared_key, body, log_type):
    method = 'POST'
    content_type = 'application/json'
    resource = '/api/logs'
    rfc1123date = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
    content_length = len(body)
    signature = build_signature(customer_id, shared_key, rfc1123date, content_length, method, content_type, resource)
    uri = 'https://' + customer_id + '.ods.opinsights.azure.com' + resource + '?api-version=2016-04-01'

    headers = {
        'content-type': content_type,
        'Authorization': signature,
        'Log-Type': log_type,
        'x-ms-date': rfc1123date
    }

    response = requests.post(uri,data=body, headers=headers)
    if (response.status_code >= 200 and response.status_code <= 299):
        print("Accepted")
    else:
        print("Response code: {}".format(response.status_code))

Last but not least we have the lambda_handler function which brings everything together. It first gets a listing of users, loops through each user to information about the access id and secret keys usage, creates a log record containing information about each key, converts the data from a dict to a JSON string, and writes it to the API. If the content is successfully delivered, the log for the Lambda will note that it was accepted.

def lambda_handler(event, context):

    # Enable logging to console
    logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    try:

        # Initialize empty records array
        #
        key_records = []
        
        # Retrieve list of IAM Users
        logging.info("Retrieving a list of IAM Users...")
        users = query_iam_users()

        # Retrieve list of access keys for each IAM User and add to record
        logging.info("Retrieving a listing of access keys for each IAM User...")
        for user in users:
            key_records.extend(query_access_keys(user))
        # Prepare data for sending to Azure Monitor HTTP Data Collector API
        body = json.dumps(key_records)
        post_data(os.environ['WorkspaceId'], os.environ['WorkspaceKey'], body, os.environ['LogName'])

    except Exception as e:
        logging.error("Execution error",exc_info=True)

Once the data is delivered, it will take a few minutes for it to be processed and appear in the Log Analytics Workspace. In my tests it only took around 2-5 minutes, but I wasn’t writing much data to the API.  After the data processes you’ll see a new entry under the listing of Custom Logs in the Log Analytics Workspace.  The entry will be the log name you picked and with a _CL at the end.  Expanding the entry will display the columns that were created based upon the log entry.  Note that the columns consumed from the data you passed will end with an underscore and a character denoting the data type.

mylog

Now that the data is in the workspace, I can start querying it and creating some visualizations.  Azure Monitor uses the Kusto Query Language (KQL).  If you’ve ever created queries in Splunk, the language will feel familiar.

The log I created in AWS and pushed to the API has the following schema.  Note the addition of the underscore followed by a character denoting the column data type.

  • logged_Date (string) – The date the Lambda ran
  • user_s (string) – The AWS IAM User the key belongs to
  • account_number_s (string) – The AWS Account number the IAM Users belong to
  • AccessKeyId (string) – The id of the access key associated with the user which has been sanitized to show just the first 4 and last 4 characters
  • CreateDate_t (timestamp) – The date and time when the access key was created
  • LastUsedDate_t (timestamp) – The date and time the key was last used
  • Region_s (string) – The region where the access key was last used
  • Status_s (string) – Whether the key is enabled or disabled
  • ServiceName_s (string) – The AWS service where the access key was last used

In addition to what I’ve pushed, Azure Monitor adds a TimeGenerated field to each record which is the time the log entry was sent to Azure Monitor.  You can override this behavior and provide a field for Azure Monitor to use for this if you like (see here).  There are some other miscellaneous fields are inherited from whatever schema the API is drawing from.  These are fields such as TenantId and SourceSystem, which in this case is populated with RestAPI.

Since my personal AWS environment is quite small and the AWS IAM Users usage are very limited, my data sets aren’t huge.  To address this I created a number of IAM Users with access keys for the purpose blog.  I’m getting that out of the way so my AWS friends don’t hate on me. 🙂

One of core best practices in key management with shared keys is to ensure you rotate them.  The first data point I wanted to extract was which keys that existed in my AWS account were over 90 days old.  To do that I put together the following query:

AWS_Access_Key_Report_CL
| extend key_age = datetime_diff('day',now(),CreateDate_t)
| project Age=key_age,AccessKey=AccessKeyId_s, User=user_s
| where Age > 90
| sort by Age

Let’s walk through the query.  The first line tells the query engine to run this query against the AWS_Access_Key_Report_CL.  The next line creates a new field that contains the age of the key by determining the amount of time that has passed between the creation date of the key and today’s date.  The line after that instructs the engine to pull back only the key_age field I just created and the AccessKeyId_s, user_s , and status_s fields.  The results are then further culled down to pull only records where the key age is greater than 90 days and finally the results are sorted by the age of the key.

query1

Looks like it’s time to rotate that access key in use by Azure AD. 🙂

I can then pin this query to a new shared dashboard for other users to consume.  Cool and easy right?  How about we create something visual?

Looking at the trends in access key creation can provide some valuable insights into what is the norm and what is not.  Let’s take a look a the metrics for key creation (of the keys still exist in an enabled/disabled state).  For that I’m going to use the following query:

AWS_Access_Key_Report_CL
| make-series AccessKeys=count() default=0 on CreateDate_t from datetime(2019-01-01) to datetime(2020-01-01) step 1d

In this query I’m using the make-series operator to count the number of access keys created each day and assigning a default value of 0 if there are no keys created on that date.  The result of the query isn’t very useful when looking at it in tabular form.

query2.PNG

By selecting the Line drop down box, I can transform the date into a line grab which shows me spikes of creation in log creation.  If this was real data, investigation into the spike of key creations on 6/30 may be warranted.

quer2_2.PNG

I put together a few other visuals and tables and created a custom dashboard like the below.  Creating the dashboard took about an hour so, with much of the time invested in figuring out the query language.

dashboard

What you’ve seen here is a demonstration of the power and simplicity of Azure Monitor.  By adding a simple to use API, Microsoft has exponentially increased the agility of the tool by allowing it to become a single pane of glass for monitoring across clouds.   It’s also worth noting that Microsoft’s BI (business intelligence) tool Power BI has direct integration with Azure Log Analytics.  This allows you to pull that log data into PowerBI and perform more in-depth analysis and to create even richer visualizations.

Well folks, I hope you’ve found this series of value.  I really enjoyed creating it and already have a few additional use cases in mind.  Make sure to follow me on Github as I’ll be posting all of the code and solutions I put together there for your general consumption.

Have a great day!