Deep Dive into Azure Managed Identities – Part 2

Posted on August 12, 2019 by mattfeltonma

Welcome back fellow geeks for the second installment in my series on Azure Managed Identities. In the first post I covered the business problem and the risks Managed Identities address and in this post I’ll be how managed identities are represented in Azure.

Let’s start by walking through the components that make managed identities possible.

The foundational component of any identity is the data store in which the identity lives in. In the case of managed identities, like much of the rest of the identity data for the Microsoft cloud, the data store is Azure Active Directory. For those of you coming from the traditional on-premises environment and who have had experience with your traditional directories such as Active Directory or one of the many flavors of LDAP, Azure Active Directory (Azure AD) is an Identity-as-a-Service which includes a directory component we can think of as a next generation directory. This means it’s designed to be highly scalable, available, and resilient and be provided to you in “as a service” model where a simple management layer sits in front of all the complexities of the compute, network, and storage infrastructure that makes up the directory. There are a whole bunch of other cool features such as modern authentication, contextual authorization, adaptive authentication, and behavioral analytics that come along with the solution so check out the official documentation to learn about those capabilities. If you want to nerd out on the design of that infrastructure you can check out this whitepaper and this article.

It’s worthwhile to take a moment to cover Azure AD’s relationship to Azure. Every resource in Azure is associated with an Azure subscription. An Azure subscription acts as a legal and payment agreement (think type of Azure subscription, pay-as-you-go, Visual Studio, CSP, etc), boundary of scale (think limits to resources you can create in a subscription), and administrative boundary. Each Azure subscription is associated with a single instance of Azure AD. Azure AD acts as the security boundary for an organization’s space in Azure and serves as the identity backend for the Azure subscription. You’ll often hear it referred to as “your tenant” (if you’re not familiar with the general cloud concept of tenancy check out this CSA article).

Azure AD stores lots of different object types including users, groups, and devices. The object type we are interested in for the purposes of managed identity are service principals. Service principals act as the security principals for non-humans (such as applications or Azure resources like a VM) in Azure AD. These service principals are then granted permissions to access resources in Azure by being assigned permissions to Azure resources such as an instance of Azure Key Vault or an Azure Storage account. Service principals are used for a number of purposes beyond just Managed Identities such as identities for custom developed applications or third-party applications.

Given that the service principals can be used for different purposes, it only makes sense that the service principal object type includes an attribute called the serviceprincipaltype. For example, a third-party or custom developed application that is registered with Azure AD uses the service principal type of Application while a managed identity has the value set to ManagedIdentity. Let’s take a look at an example of the serviceprincipaltypes in a tenant.

In my Geek In The Weeds tenant I’ve created a few application identities by registering the applications and I’ve created a few managed identities. Everything else within the tenant is default out of the box. To list the service principals in the directory I used the AzureAD PowerShell module. The cmdlet that can be used to list out the service principals is the Get-AzureADServicePrincipal. By default the cmdlet will only return the 100 results, so you need to set the All parameter to true. Every application, whether it’s Exchange Online or Power BI, it needs an identity in your tenant to interact with it and resources you create that are associated with the tenant. Here are the serviceprincipaltypes in my Geek In The Weeds tenant.

Now we know the security principal used by a Managed Identity is stored in Azure AD and is represented by a service principal object. We also know that service principal objects have different types depending on how they’re being used and the type that represents a managed identity has a type of ManagedIdentity. If we want to know what managed identities exist in our directory, we can use this information to pull a list using the Get-AzureADServicePrincipal.

We’re not done yet! Managed Identities also come in multiple flavors, either system-assigned or user-assigned. System-assigned managed identities are the cooler of the two in that they share the lifecycle of the resource they’re used by. For example, a system-assigned managed identity can be created when an Azure Function is created thus that the identity will be deleted once the Azure VM is deleted. This presents a great option for mitigating the challenge of identity lifecycle management. By Microsoft handling the lifecyle of these identities each resource could potentially have its own identity making it easier to troubleshoot issues with the identity, avoid potential outages caused by modifying the identity, adhering to least privilege and giving the identity only the permissions the resource requires, and cutting back on support requests by developers to info sec for the creation of identities.

Sometimes it may be desirable to share a managed identity amongst multiple Azure resources such as an application running on multiple Azure VMs. This use case calls for the other type of managed identity, user-assigned. These identities do not share the lifecycle of the resources using them.

Let’s take a look at the differences between a service principal object for a user-assigned vs a system-assigned managed identity. Here I ran another Get-AzureADServicePrincipal and limited the results to serviceprincipaltype of ManagedIdentity.

ObjectId                           : a3e9d372-242e-424b-b97a-135116995d4b
ObjectType                         : ServicePrincipal
AccountEnabled                     : True
AlternativeNames                   : {isExplicit=False, /subscriptions//resourcegroups/managedidentity/providers/Microsoft.Compute/virtualMachines/systemmis}
AppId                              : b7fa9389-XXXX
AppRoleAssignmentRequired          : False
DisplayName                        : systemmis
KeyCredentials                     : {class KeyCredential {
                                       CustomKeyIdentifier: System.Byte[]
                                       EndDate: 11/11/2019 12:39:00 AM
                                       KeyId: f8e439a8-071b-45e0-9f8e-ac10b058a5fb
                                       StartDate: 8/13/2019 12:39:00 AM
                                       Type: AsymmetricX509Cert
                                       Usage: Verify
                                       Value:
                                     }
                                     }
ServicePrincipalNames              : {b7fa9389-XXXX, https://identity.azure.net/XXXX}
ServicePrincipalType               : ManagedIdentity
------------------------------------------------
ObjectId                           : ac960ac7-ca03-4ac0-a7b8-d458635b293b
ObjectType                         : ServicePrincipal
AccountEnabled                     : True
AlternativeNames                   : {isExplicit=True,
                                     /subscriptions//resourcegroups/managedidentity/providers/Microsoft.ManagedIdentity/userAssignedIdentities/testing1234}
AppId                              : fff84e09-XXXX
AppRoleAssignmentRequired          : False
AppRoles                           : {}
DisplayName                        : testing1234
KeyCredentials                     : {class KeyCredential {
                                       CustomKeyIdentifier: System.Byte[]
                                       EndDate: 11/7/2019 1:49:00 AM
                                       KeyId: b3c1808d-6778-4004-b23f-4d339ed0a91f
                                       StartDate: 8/9/2019 1:49:00 AM
                                       Type: AsymmetricX509Cert
                                       Usage: Verify
                                       Value:
                                     }
                                     }
ServicePrincipalNames              : {fff84e09-XXXX, https://identity.azure.net/XXXX}
ServicePrincipalType               : ManagedIdentity

In the above results we can see that the main difference between the user-assigned (testing1234) and system-assigned (systemmis) is the within the AlternativeNames property. For the system-assigned identity has values of isExplicit set to False and has another value of /subscriptions//resourcegroups/managedidentity/
providers/Microsoft.Compute/virtualMachines/systemmis. Notice the bolded portion specifies this is being used by a virtual machine named systemmis. The user-assigned identity has the isExplicit set to True and another property with the value of /subscriptions//resourcegroups/managedidentity/
providers/Microsoft.ManagedIdentity/userAssignedIdentities/testing1234. Here we can see the identity is an “explicit” managed identity and is not directly linked to an Azure resource.

This difference gives us the ability to quickly report on the number of system-assigned and user-assigned managed identities in a tenant by using the following command.

Get-AzureADServicePrincipal -All $True | Where-Object AlternativeNames -like “isExplicit=True*”

True would give us user-assigned and False would give us system-assigned. Neat right?

Let’s summarize what we’ve learned:

An object in Azure Active Directory is created for each managed identity and represents its security principal
The type of object created is a service principal
There are multiple service principal types and the one used by a Managed Identity is called ManagedIdentity
There are two types of managed identities, user-assigned and system-assigned
System-assigned managed identities share the lifecycle of the resource they are associated with while user-assigned managed identities are created separately from the resource, do not share the resource lifecycle, and can be used across multiple resources
The object representing a user-assigned managed identity has a unique value of isExplicit=True for the AlternativeNames property while a system-assigned managed identity has that value of isExplicit=False.

That’s it for this post folks. In the next post I’ll walk through the process of creating a managed identity for an Azure VM and will demonstrate with a bit of Python code how we can use the managed identity to access a secret stored in Azure Key Vault.

See you next post!

Deep Dive into Azure Managed Identities – Part 1

Posted on August 7, 2019 by mattfeltonma

“I love the overhead of password management” said no one ever.

Password management is hard. It’s even harder when you’re managing the credentials for non-humans, such as those used by an application. Back in the olden days when the developer needed a way to access an enterprise database or file share, they’d put in a request with help desk or information security to have an account (often referred to as a service account) provisioned in Windows Active Directory, an LDAP, or a SQL database. The request would go through a business approval and some support person would created the account, set the password, and email the information to the developer. This process came with a number of risks:

Risk of compromise of the account
Risk of abuse of the account
Risk of a significant outage

These risks arise due to the following gaps in the process:

Multiple parties knowing the password (the party who provisions the account and the developer)
The password for the account being communicated to the developer unencrypted such as plain text in an email
The password not being changed after it is initially set due to the inability or difficult to change the password
The password not being regularly rotated due to concerns over application outages
The password being shared with other developers and the account then being used across multiple applications without the dependency being documented

Organizations tried to mitigate the risk of compromise by performing such actions as requiring a long and complex password, delivering the password in an encrypted format such as an encrypted Microsoft Office document, instituting policy requiring the password to be changed (exceptions with this one are frequent due to outage concerns), implementing password vaulting and management such as CyberArk Enterprise Password Vault or Hashicorp Vault, and instituting behavioral monitoring solutions to check for abuse. Password rotation and monitoring are some of the more effective mitigations but can also be extremely challenging and costly to institute at a scale even with a vaulting and management solution. Even then, there are always the exceptions to the systems with legacy applications which are not compatible (sadly these are often some of the more critical systems).

When the public cloud came around the credential management challenge for application accounts exploded due to the most favored traits of a public cloud which include on-demand self-service and rapid elasticity and scalability. The challenge that was a few hundred application identities has grown quickly into thousands of applications and especially containers and serverless functions such as AWS Lambda and Azure Functions. Beyond the volume of applications, the public cloud also changes the traditional security boundary due to its broad network access trait. Instead of the cozy feeling multiple firewalls gave you, you now have developers using cloud services such as storage or databases which are directly administered via the cloud management plane which is exposed directly to the Internet. It doesn’t stop here folks, you also have developers heavily using SaaS-based version control solutions to store the code which may have credentials hardcoded into it potentially publicly exposing those credentials.

Thankfully the public cloud providers have heard the cries of us security folk and have been working hard to help address the problem. One method in use is the creation of security principals which are designed around the use of temporary credentials. This way there are no long standing credentials to share, compromise, or abuse. Amazon has robust use of this concept in AWS using IAM Roles. Instead of hardcoding a set of IAM User credentials in a Lambda or an application running on an EC2 instance, a role can be created with the necessary permissions required for the application and be assumed by either the Lambda service or EC2 instance.

For this series of posts I’m going to be focusing on one of Microsoft Azure’s solutions to this problem, which are called Managed Identities. For you folks that are more familiar with AWS, Managed Identities conceptually work the same was as IAM Roles. A security principal is created, permissions are granted, and the identity is assumed by a resource such as an Azure Web App or an Azure VM. There are some features that differ from IAM Roles that add to the appeal of Managed Identities such as associating the identity lifecycle of the Managed Identity to the resource such that when the resource is created, the managed identity is created, and when the resource is destroyed, the identity is destroy.

In the next entry I will do a deeper dive into what a managed identity looks like behind the scenes.

See you soon fellow geek!

Visualizing AWS Logging Data in Azure Monitor – Part 2

Posted on July 8, 2019 by mattfeltonma

Welcome back folks!

In this post I’ll be continuing my series on how Azure Monitor can be used to visualize log data generated by other cloud services. In my last post I covered the challenges that multicloud brings and what Azure can do to help with it. I also gave an overview of Azure Monitor and covered the design of the demo I put together and will be walking through in this post. Please take a read through that post if you haven’t already. If you want to follow along, I’ve put the solution up on Github.

Let’s quickly review the design of the solution.

This solution uses some simple Python code to pull information about the usage of AWS IAM User access id and secret keys from an AWS account. The code runs via a Lambda and stores the Azure Log Analytics Workspace id and key in environment variables of the Lambda that are encrypted with an AWS KMS key. The data is pulled from the AWS API using the Boto3 SDK and is transformed to JSON format. It’s then delivered to the HTTP Data Collector API which places it into the Log Analytics Workspace. From there, it becomes available to Azure Monitor to query and visualize.

Setting up an Azure environment for this integration is very simple. You’ll need an active Azure subscription. If you don’t have one, you can setup a free Azure account to play around. Once you’re set with the Azure subscription, you’ll need to create an Azure Log Analytics Workspace. Instructions for that can be found in this Microsoft article. After the workspace has been setup, you’ll need to get the workspace id and key as referenced in the Obtain workspace ID and key section of this Microsoft article. You’ll use this workspace ID and key to authenticate to the HTTP Data Collector API.

If you have a sandbox AWS account and would like to follow along, I’ve included a CloudFormation template that will setup the AWS environment. You’ll need to have an AWS account with sufficient permissions to run the template and provision the resources. Prior to running the template, you will need to zip up the lambda_function.py and put it on an AWS S3 bucket you have permissions on. When you run the template you’ll be prompted to provide the S3 bucket name, the name of the ZIP file, the Log Analytics Workspace ID and key, and the name you want the API to assign to the log in the workspace.

The Python code backing the solution is pretty simple. It uses all standard Python modules except for the boto3 module used to interact with AWS.

import json
import logging
import re
import csv
import boto3
import os
import hmac
import base64
import hashlib
import datetime

from io import StringIO
from datetime import datetime
from botocore.vendored import requests

The first function in the code parses the ARN (Amazon Resource Name) to extract the AWS account number. This information is later included in the log data written to Azure.

# Parse the IAM User ARN to extract the AWS account number
def parse_arn(arn_string):
    acct_num = re.findall(r'(?<=:)[0-9]{12}',arn_string)
    return acct_num[0]

The second function uses the strftime method to transform the timestamp returned from the AWS API to a format that the Azure Monitor API will detect as a timestamp and make that particular field for each record in the Log Analytics Workspace a datetime type.

# Convert timestamp to one more compatible with Azure Monitor
def transform_datetime(awsdatetime):
transf_time = awsdatetime.strftime("%Y-%m-%dT%H:%M:%S")
return transf_time

The next function queries the AWS API for a listing of AWS IAM Users setup in the account and creates dictionary object representing data about that user. That object is added to a list which holds each object representing each user.

# Query for a list of AWS IAM Users
def query_iam_users():
    
    todaydate = (datetime.now()).strftime("%Y-%m-%d")
    users = []
    client = boto3.client(
        'iam'
    )

    paginator = client.get_paginator('list_users')
    response_iterator = paginator.paginate()
    for page in response_iterator:
        for user in page['Users']:
            user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
            users.append(user_rec)
    return users

The query_access_keys function queries the AWS API for a listing of the access keys that have been provisioned the AWS IAM User as well as the status of those keys and some metrics around the usage. The resulting data is then added to a dictionary object and the object added to a list. Each item in the list represents a record for an AWS access id.

# Query for a list of access keys and information on access keys for an AWS IAM User
def query_access_keys(user):
    keys = []
    client = boto3.client(
        'iam'
    )
    paginator = client.get_paginator('list_access_keys')
    response_iterator = paginator.paginate(
        UserName = user['username']
    )

    # Get information on access key usage
    for page in response_iterator:
        for key in page['AccessKeyMetadata']:
            response = client.get_access_key_last_used(
                AccessKeyId = key['AccessKeyId']
            )
            # Santize key before sending it along for export

            sanitizedacctkey = key['AccessKeyId'][:4] + '...' + key['AccessKeyId'][-4:]
            # Create new dictonionary object with access key information
            if 'LastUsedDate' in response.get('AccessKeyLastUsed'):

                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),
                'LastUsedDate':(transform_datetime(response['AccessKeyLastUsed']['LastUsedDate'])),
                'Region':response['AccessKeyLastUsed']['Region'],'Status':key['Status'],
                'ServiceName':response['AccessKeyLastUsed']['ServiceName']}
                keys.append(key_rec)
            else:
                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),'Status':key['Status']}
                keys.append(key_rec)
    return keys

The next two functions contain the code that creates and submits the request to the Azure Monitor API. The product team was awesome enough to provide some sample code in the in the public documentation for this part. The code is intended for Python 2 but only required a few small changes to make it compatible with Python 3.

Let’s first talk about the build_signature function. At this time the API uses HTTP request signing using the Log Analytics Workspace id and key to authenticate to the API. In short this means you’ll have two sets of shared keys per workspace, so consider the workspace your authorization boundary and prioritize proper key management (aka use a different workspace for each workload, track key usage, and rotate keys as your internal policies require).

Breaking down the code below, we the string that will act as the header includes the HTTP method, length of request content, a custom header of x-ms-date, and the REST resource endpoint. The string is then converted to a bytes object, and an HMAC is created using SHA256 which is then base-64 encoded. The result is the authorization header which is returned by the function.

def build_signature(customer_id, shared_key, date, content_length, method, content_type, resource):
    x_headers = 'x-ms-date:' + date
    string_to_hash = method + "\n" + str(content_length) + "\n" + content_type + "\n" + x_headers + "\n" + resource
    bytes_to_hash = bytes(string_to_hash, encoding="utf-8")  
    decoded_key = base64.b64decode(shared_key)
    encoded_hash = base64.b64encode(
        hmac.new(decoded_key, bytes_to_hash, digestmod=hashlib.sha256).digest()).decode()
    authorization = "SharedKey {}:{}".format(customer_id,encoded_hash)
    return authorization

Not much needs to be said about the post_data function beyond that it uses the Python requests module to post the log content to the API. Take note of the limits around the data that can be included in the body of the request. Key takeaways here is if you plan pushing a lot of data to the API you’ll need to chunk your data to fit within the limits.

def post_data(customer_id, shared_key, body, log_type):
    method = 'POST'
    content_type = 'application/json'
    resource = '/api/logs'
    rfc1123date = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
    content_length = len(body)
    signature = build_signature(customer_id, shared_key, rfc1123date, content_length, method, content_type, resource)
    uri = 'https://' + customer_id + '.ods.opinsights.azure.com' + resource + '?api-version=2016-04-01'

    headers = {
        'content-type': content_type,
        'Authorization': signature,
        'Log-Type': log_type,
        'x-ms-date': rfc1123date
    }

    response = requests.post(uri,data=body, headers=headers)
    if (response.status_code >= 200 and response.status_code <= 299):
        print("Accepted")
    else:
        print("Response code: {}".format(response.status_code))

Last but not least we have the lambda_handler function which brings everything together. It first gets a listing of users, loops through each user to information about the access id and secret keys usage, creates a log record containing information about each key, converts the data from a dict to a JSON string, and writes it to the API. If the content is successfully delivered, the log for the Lambda will note that it was accepted.

def lambda_handler(event, context):

    # Enable logging to console
    logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    try:

        # Initialize empty records array
        #
        key_records = []
        
        # Retrieve list of IAM Users
        logging.info("Retrieving a list of IAM Users...")
        users = query_iam_users()

        # Retrieve list of access keys for each IAM User and add to record
        logging.info("Retrieving a listing of access keys for each IAM User...")
        for user in users:
            key_records.extend(query_access_keys(user))
        # Prepare data for sending to Azure Monitor HTTP Data Collector API
        body = json.dumps(key_records)
        post_data(os.environ['WorkspaceId'], os.environ['WorkspaceKey'], body, os.environ['LogName'])

    except Exception as e:
        logging.error("Execution error",exc_info=True)

Once the data is delivered, it will take a few minutes for it to be processed and appear in the Log Analytics Workspace. In my tests it only took around 2-5 minutes, but I wasn’t writing much data to the API. After the data processes you’ll see a new entry under the listing of Custom Logs in the Log Analytics Workspace. The entry will be the log name you picked and with a _CL at the end. Expanding the entry will display the columns that were created based upon the log entry. Note that the columns consumed from the data you passed will end with an underscore and a character denoting the data type.

Now that the data is in the workspace, I can start querying it and creating some visualizations. Azure Monitor uses the Kusto Query Language (KQL). If you’ve ever created queries in Splunk, the language will feel familiar.

The log I created in AWS and pushed to the API has the following schema. Note the addition of the underscore followed by a character denoting the column data type.

logged_Date (string) – The date the Lambda ran
user_s (string) – The AWS IAM User the key belongs to
account_number_s (string) – The AWS Account number the IAM Users belong to
AccessKeyId (string) – The id of the access key associated with the user which has been sanitized to show just the first 4 and last 4 characters
CreateDate_t (timestamp) – The date and time when the access key was created
LastUsedDate_t (timestamp) – The date and time the key was last used
Region_s (string) – The region where the access key was last used
Status_s (string) – Whether the key is enabled or disabled
ServiceName_s (string) – The AWS service where the access key was last used

In addition to what I’ve pushed, Azure Monitor adds a TimeGenerated field to each record which is the time the log entry was sent to Azure Monitor. You can override this behavior and provide a field for Azure Monitor to use for this if you like (see here). There are some other miscellaneous fields are inherited from whatever schema the API is drawing from. These are fields such as TenantId and SourceSystem, which in this case is populated with RestAPI.

Since my personal AWS environment is quite small and the AWS IAM Users usage are very limited, my data sets aren’t huge. To address this I created a number of IAM Users with access keys for the purpose blog. I’m getting that out of the way so my AWS friends don’t hate on me. 🙂

One of core best practices in key management with shared keys is to ensure you rotate them. The first data point I wanted to extract was which keys that existed in my AWS account were over 90 days old. To do that I put together the following query:

AWS_Access_Key_Report_CL
| extend key_age = datetime_diff('day',now(),CreateDate_t)
| project Age=key_age,AccessKey=AccessKeyId_s, User=user_s
| where Age > 90
| sort by Age

Let’s walk through the query. The first line tells the query engine to run this query against the AWS_Access_Key_Report_CL. The next line creates a new field that contains the age of the key by determining the amount of time that has passed between the creation date of the key and today’s date. The line after that instructs the engine to pull back only the key_age field I just created and the AccessKeyId_s, user_s , and status_s fields. The results are then further culled down to pull only records where the key age is greater than 90 days and finally the results are sorted by the age of the key.

Looks like it’s time to rotate that access key in use by Azure AD. 🙂

I can then pin this query to a new shared dashboard for other users to consume. Cool and easy right? How about we create something visual?

Looking at the trends in access key creation can provide some valuable insights into what is the norm and what is not. Let’s take a look a the metrics for key creation (of the keys still exist in an enabled/disabled state). For that I’m going to use the following query:

AWS_Access_Key_Report_CL
| make-series AccessKeys=count() default=0 on CreateDate_t from datetime(2019-01-01) to datetime(2020-01-01) step 1d

In this query I’m using the make-series operator to count the number of access keys created each day and assigning a default value of 0 if there are no keys created on that date. The result of the query isn’t very useful when looking at it in tabular form.

By selecting the Line drop down box, I can transform the date into a line grab which shows me spikes of creation in log creation. If this was real data, investigation into the spike of key creations on 6/30 may be warranted.

I put together a few other visuals and tables and created a custom dashboard like the below. Creating the dashboard took about an hour so, with much of the time invested in figuring out the query language.

What you’ve seen here is a demonstration of the power and simplicity of Azure Monitor. By adding a simple to use API, Microsoft has exponentially increased the agility of the tool by allowing it to become a single pane of glass for monitoring across clouds. It’s also worth noting that Microsoft’s BI (business intelligence) tool Power BI has direct integration with Azure Log Analytics. This allows you to pull that log data into PowerBI and perform more in-depth analysis and to create even richer visualizations.

Well folks, I hope you’ve found this series of value. I really enjoyed creating it and already have a few additional use cases in mind. Make sure to follow me on Github as I’ll be posting all of the code and solutions I put together there for your general consumption.

Have a great day!

Visualizing AWS Logging Data in Azure Monitor – Part 1

Posted on July 5, 2019 by mattfeltonma

Hi folks!

2019 is more than halfway over and it feels like it has happened in a flash. It’s been an awesome year with tons of change and even more learning. I started the year neck deep in AWS and began transitioning into Azure back in April when I joined on with Microsoft. Having the opportunity to explore both clouds and learn the capabilities of each offering has been an amazing experience that I’m incredibly thankful for. As I’ve tried to do for the past 8 years, I’m going to share some of those learning with you. Today we’re going to explore one of the capabilities that differentiates Azure from its competition.

One of the key takeaways I’ve had from my experiences with AWS and Microsoft is enterprises have become multicloud. Workloads are quickly being spread out among public and private clouds. While the business benefits greatly from a multicloud approach where workloads can go to the most appropriate environment where the cost, risks, and time tables best suit it, it presents a major challenge to the technical orchestration behind the scenes. With different APIs (application programmatic interface), varying levels of compliance, great and not so great capabilities around monitoring and alerting, and a major industry gap in multicloud skills sets, it can become quite a headache to successfully execute this approach.

One area Microsoft Azure differentiates itself is its ability to easy the challenge of monitoring and alerting in a multicloud environment. Azure Monitor is one of the key products behind this capability. With this post I’m going to demonstrate Azure Monitor’s capabilities in this realm by walking you through a pattern of delivering, visualizing, and analyzing log data collected from AWS. The pattern I’ll be demonstrating is reusable for most any cloud (and potentially on-premises) offering. Now sit back, put your geek hat on, and let’s dive in.

First I want to briefly talk about what Azure Monitor is? Azure Monitor is a solution which brings together a collection of tools that can be used to collect and analyze the large abundance of telemetry available today. This telemetry could be metrics in regards to a virtual machine’s performance or audit logs for Azure Active Directory. The product team has put together the excellent diagram below which explains the architecture of the solution.

Credit – https://docs.microsoft.com/en-us/azure/azure-monitor/overview

As you can see from the inputs on the left, Azure Monitor is capable of collecting and analyzing data from a variety of sources. You’ll find plenty of documentation the product team has made publicly available on the five gray items, so I’m going to instead focus on custom sources.

For those of you who have been playing in the AWS pool, you can think of Azure Monitor as something similar (but much more robust) to CloudWatch Metrics and CloudWatch Logs. I know, I know, you’re thinking I’ve drank the Microsft Kool-Aid.

While I do love to reminisce about cold glasses of Kool-Aid on hot summers in the 1980s, I’ll opt to instead demonstrate it in action and let you decide for yourself. To do this I’ll be leveraging the new API Microsoft introduced. The Azure Monitor HTTP Data Collector API was introduced a few months back and provides the capability of delivering log data to Azure where it can be analyzed by Azure Monitor.

With Azure Monitor logs are stored in an Azure resource called a Log Analytics Workspace. For you AWS folk, you can think of a Log Analytics Workspace as something similar to CloudWatch Log Groups where the data stored in a logical boundary where the data shares a retention and authorization boundary. Logs are sent to the API in JSON format and are placed in the Log Analytics Workspace you specify. A high level diagram of the flow can be seen below.

Credit – https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-collector-api

So now that you have a high level understanding of what Azure Monitor is, what it can do, and how the new API works, let’s talk about the demonstration.

If you’ve used AWS you’re very familiar with the capabilities CloudWatch Metrics Dashboards and the basic query language available to analyze CloudWatch Logs. To perform more complex queries and to create deeper visualizations, third-party solutions are often used such as ElasticSearch and Kibana. While these solutions work, they can be complex to implement and can create more operational overhead.

When a peer informed me about the new API a few weeks back, I was excited to try it out. I had just started to use Azure Monitor to put together some dashboards for my personal Office 365 and Azure subscriptions and was loving the power and simplicity of the analytics component of the solution. The new API opened up some neat opportunities to pipe logging data from AWS into Azure to create a single dashboard I could reference for both clouds. This became my use case and demonstration of the pattern of delivering logs from a third party to Azure Monitor with some simple Python code.

The logs I chose to deliver to the API were logs containing information surrounding the usage of AWS access ids and keys. I had previously put together some code to pull this data and write it to an S3 bucket.

Let’s take a look at the design of the solution. I had a few goals I wanted to make sure to hit if possible. My first goal was to keep the code simple. That mean limiting the usage of third-party modules and avoid over complicating the implementation.

My second goal was to limit the usage of static credentials. If I ran the code in Azure, I’d need to setup an AWS IAM User and provision an access id and secret key. While I’m aware of the workaround to use SAML authentication, I’m not a fan because in my personal opinion, it’s using SAML in such a way you are trying to hammer in a square peg in a round hole. Sure you can do it, but you really shouldn’t unless you’re out of options. Additionally, the solution requires some fairly sensitive permissions in AWS such as IAM:ListAccessKeys so the risk of the credentials being compromised could be significant. Given the risks and constraints of authentication methods to the AWS API, I opted to run my code as a Lambda and follow AWS best practices and assign the Lambda an IAM role.

On the Azure side, the Azure Monitor API for log delivery requires authentication using the Workspace ID and Workspace key. Ideally these would be encrypted and stored in AWS Secrets Manager or as a secure parameter in Parameter Store, but I decided to go the easy route and store them as environment variables for the Lambda and to encrypt them with AWS KMS. This cut back on the code and made the CloudFormation templates easier to put together.

With the decisions made the resulting design is pictured above.

I’m going to end the post here and save the dive into implementation and code for the next post. In the meantime, take a read through the Azure Monitor documentation and familiarize yourself with the basics. I’ve also put the whole solution up on Github if you’d like to follow along for next post.

See you next post!

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Category Archives: cloud

Deep Dive into Azure Managed Identities – Part 2

Deep Dive into Azure Managed Identities – Part 1

Visualizing AWS Logging Data in Azure Monitor – Part 2

Visualizing AWS Logging Data in Azure Monitor – Part 1