Debugging Azure SDK for Python Using Fiddler

Debugging Azure SDK for Python Using Fiddler

Hi there folks.  Recently I was experimenting with the Azure Python SDK when I was writing a solution to pull information about Azure resources within a subscription.  A function within the solution was used to pull a list of virtual machines in a given Azure subscription.  While writing the function, I recalled that I hadn’t yet had experience handling paged results the Azure REST API which is the underlining API being used by the SDK.

I hopped over to the public documentation to see how the API handles paging.  Come to find out the Azure REST API handles paging in a similar way as the Microsoft Graph API by returning a nextLink property which contains a reference used to retrieve the next page of results.  The Azure REST API will typically return paged results for operations such as list when the items being returned exceed 1,000 items (note this can vary depending on the method called).

So great, I knew how paging was used.  The next question was how the SDK would handle paged results.  Would it be my responsibility or would it by handled by the SDK itself?

If you have experience with AWS’s Boto3 SDK for Python (absolutely stellar SDK by the way) and you’ve worked in large environments, you are probably familiar with the paginator subclass.  Paginators exist for most of the AWS service classes such as IAM and S3.  Here is an example of a code snipped from a solution I wrote to report on aws access keys.

def query_iam_users():

todaydate = (datetime.now()).strftime("%Y-%m-%d")
users = []
client = boto3.client(
'iam'
)

paginator = client.get_paginator('list_users')
response_iterator = paginator.paginate()
for page in response_iterator:
for user in page['Users']:
user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
users.append(user_rec)
return users

Paginators make handling paged results a breeze and allow for extensive flexibility in controlling how paging is handled by the underlining AWS API.

Circling back to the Azure SDK for Python, my next step was to hop over to the SDK public documentation.  Navigating the documentation for the Azure SDK (at least for the Python SDK, I can’ t speak for the other languages) is a bit challenging.  There are a ton of excellent code samples, but if you want to get down and dirty and create something new you’re going to have dig around a bit to find what you need.  To pull a listing of virtual machines, I would be using the list_all method in VirtualMachinesOperations class.  Unfortunately I couldn’t find any reference in the documentation to how paging is handled with the method or class.

So where to now?  Well next step was the public Github repo for the SDK.  After poking around the repo I located the documentation on the VirtualMachineOperations class.  Searching the class definition, I was able to locate the code for the list_all() method.  Right at the top of the definition was this comment:

Use the nextLink property in the response to get the next page of virtual
machines.

Sounds like handling paging is on you right?  Not so fast.  Digging further into the method I came across the function below.  It looks like the method is handling paging itself releasing the consumer of the SDK of the overhead of writing additional code.

        def internal_paging(next_link=None):
            request = prepare_request(next_link)

            response = self._client.send(request, stream=False, **operation_config)

            if response.status_code not in [200]:
                exp = CloudError(response)
                exp.request_id = response.headers.get('x-ms-request-id')
                raise exp

            return response

I wanted to validate the behavior but unfortunately I couldn’t find any documentation on how to control the page size within the Azure REST API.  I wasn’t about to create 1,001 virtual machines so instead I decided to use another class and method in the SDK.  So what type of service would be a service that would return a hell of a lot of items?  Logging of course!  This meant using the list method of the ActivityLogsOperations class which is a subclass of the module for Azure Monitor and is used to pull log entries from the Azure Activity Log.  Before I experimented with the class, I hopped back over to Github and pulled up the source code for the class.  Low and behold we an internal_paging function within the list method that looks very similar to the one for the list_all vms.

        def internal_paging(next_link=None):
            request = prepare_request(next_link)

            response = self._client.send(request, stream=False, **operation_config)

            if response.status_code not in [200]:
                raise models.ErrorResponseException(self._deserialize, response)

            return response

Awesome, so I have a method that will likely create paged results, but how do I validate it is creating paged results and the SDK is handling them?  For that I broke out one of my favorite tools Telerik’s Fiddler.

There are plenty of guides on Fiddler out there so I’m going to skip the basics of how to install it and get it running.  Since the calls from the SDK are over HTTPS I needed to configure Fiddler to intercept secure web traffic.  Once Fiddler was up and running I popped open Visual Studio Code, setup a new workspace, configured a Python virtual environment, and threw together the lines of code below to get the Activity Logs.

from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.monitor import MonitorManagementClient

TENANT_ID = 'mytenant.com'
CLIENT = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
KEY = 'XXXXXX'
SUBSCRIPTION = 'XXXXXX-XXXX-XXXX-XXXX-XXXXXXXX'

credentials = ServicePrincipalCredentials(
    client_id = CLIENT,
    secret = KEY,
    tenant = TENANT_ID
)
client = MonitorManagementClient(
    credentials = credentials,
    subscription_id = SUBSCRIPTION
)

log = client.activity_logs.list(
    filter="eventTimestamp ge '2019-08-01T00:00:00.0000000Z' and eventTimestamp le '2019-08-24T00:00:00.0000000Z'"
)

for entry in log:
    print(entry)

Let me walk through the code quickly.  To make the call I used an Azure AD Service Principal I had setup that was granted Reader permissions over the Azure subscription I was querying.  After obtaining an access token for the service principal, I setup a MonitorManagementClient that was associated with the Azure subscription and dumped the contents of the Activity Log for the past 20ish days.  Finally I incremented through the results to print out each log entry.

When I ran the code in Visual Studio Code an exception was thrown stating there was an certificate verification error.

requests.exceptions.SSLError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /mytenant.com/oauth2/token (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)')))

The exception is being thrown by the Python requests module which is being used underneath the covers by the SDK.  The module performs certificate validation by default.  The reason certificate verification is failing is Fiddler uses a self-signed certificate when configured to intercept secure traffic when its being used as a proxy.  This allows it to decrypt secure web traffic sent by the client.

Python doesn’t use the Computer or User Windows certificate store so even after you trust the self-signed certificate created by Fiddler, certificate validation still fails.  Like most cross platform solutions it uses its own certificate store which has to be managed separately as described in this Stack Overflow article.  You should use the method described in the article for any production level code where you may be running into this error, such as when going through a corporate web proxy.

For the purposes of testing you can also pass the parameter verify with the value of False as seen below.  I can’t stress this enough, be smart and do not bypass certificate validation outside of a lab environment scenario.

requests.get('https://somewebsite.org', verify=False)

So this is all well and good when you’re using the requests module directly, but what if you’re using the Azure SDK?  To do it within the SDK we have to pass extra parameters called kwargs which the SDK refers to as an Operation config.  The additional parameters passed will be passed downstream to the methods such as the methods used by the requests module.

Here I modified the earlier code to tell the requests methods to ignore certificate validation for the calls to obtain the access token and call the list method.

from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.monitor import MonitorManagementClient

TENANT_ID = 'mytenant.com'
CLIENT = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
KEY = 'XXXXXX'
SUBSCRIPTION = 'XXXXXX-XXXX-XXXX-XXXX-XXXXXXXX'

credentials = ServicePrincipalCredentials(
    client_id = CLIENT,
    secret = KEY,
    tenant = TENANT_ID,
    verify = False
)
client = MonitorManagementClient(
    credentials = credentials,
    subscription_id = SUBSCRIPTION,
    verify = False
)

log = client.activity_logs.list(
    filter="eventTimestamp ge '2019-08-01T00:00:00.0000000Z' and eventTimestamp le '2019-08-24T00:00:00.0000000Z'",
    verify = False
)

for entry in log:
    print(entry)

After the modifications the code ran successfully and I was able to verify that the SDK was handling paging for me.

fiddler.png

Let’s sum up what we learned:

  • When using an Azure SDK leverage the Azure REST API reference to better understand the calls the SDK is making
  • Use Fiddler to analyze and debug issues with the Azure SDK
  • Never turn off certificate verification in a production environment and instead validate the certificate verification error is legitimate and if so add the certificate to the trusted store
  • In lab environments, certificate verification can be disabled by passing an additional parameter of verify=False with the SDK method

Hope that helps folks.  See you next time!

Deep Dive into Azure Managed Identities – Part 2

Welcome back fellow geeks for the second installment in my series on Azure Managed Identities.  In the first post I covered the business problem and the risks Managed Identities address and in this post I’ll be how managed identities are represented in Azure.

Let’s start by walking through the components that make managed identities possible.

The foundational component of any identity is the data store in which the identity lives in.  In the case of managed identities, like much of the rest of the identity data for the Microsoft cloud, the data store is Azure Active Directory.  For those of you coming from the traditional on-premises environment and who have had experience with your traditional directories such as Active Directory or one of the many flavors of LDAP, Azure Active Directory (Azure AD) is an Identity-as-a-Service which includes a directory component we can think of as a next generation directory.  This means it’s designed to be highly scalable, available, and resilient and be provided to you in “as a service” model where a simple management layer sits in front of all the complexities of the compute, network, and storage infrastructure that makes up the directory.  There are a whole bunch of other cool features such as modern authentication, contextual authorization, adaptive authentication, and behavioral analytics that come along with the solution so check out the official documentation to learn about those capabilities.  If you want to nerd out on the design of that infrastructure you can check out this whitepaper and this article.

It’s worthwhile to take a moment to cover Azure AD’s relationship to Azure.  Every resource in Azure is associated with an Azure subscription.  An Azure subscription acts as a legal and payment agreement (think type of Azure subscription, pay-as-you-go, Visual Studio, CSP, etc), boundary of scale (think limits to resources you can create in a subscription), and administrative boundary.  Each Azure subscription is associated with a single instance of Azure AD.  Azure AD acts as the security boundary for an organization’s space in Azure and serves as the identity backend for the Azure subscription.  You’ll often hear it referred to as “your tenant” (if you’re not familiar with the general cloud concept of tenancy check out this CSA article).

Azure AD stores lots of different object types including users, groups, and devices.  The object type we are interested in for the purposes of managed identity are service principals.  Service principals act as the security principals for non-humans (such as applications or Azure resources like a VM) in Azure AD.  These service principals are then granted permissions to access resources in Azure by being assigned permissions to Azure resources such as an instance of Azure Key Vault or an Azure Storage account.  Service principals are used for a number of purposes beyond just Managed Identities such as identities for custom developed applications or third-party applications

Given that the service principals can be used for different purposes, it only makes sense that the service principal object type includes an attribute called the serviceprincipaltype.  For example, a third-party or custom developed application that is registered with Azure AD uses the service principal type of Application while a managed identity has the value set to ManagedIdentity.  Let’s take a look at an example of the serviceprincipaltypes in a tenant.

In my Geek In The Weeds tenant I’ve created a few application identities by registering the applications and I’ve created a few managed identities.  Everything else within the tenant is default out of the box.  To list the service principals in the directory I used the AzureAD PowerShell module.  The cmdlet that can be used to list out the service principals is the Get-AzureADServicePrincipal.  By default the cmdlet will only return the 100 results, so you need to set the All parameter to true.  Every application, whether it’s Exchange Online or Power BI, it needs an identity in your tenant to interact with it and resources you create that are associated with the tenant.  Here are the serviceprincipaltypes in my Geek In The Weeds tenant.

serviceprincipaltype.PNG

Now we know the security principal used by a Managed Identity is stored in Azure AD and is represented by a service principal object.  We also know that service principal objects have different types depending on how they’re being used and the type that represents a managed identity has a type of ManagedIdentity.  If we want to know what managed identities exist in our directory, we can use this information to pull a list using the Get-AzureADServicePrincipal.

We’re not done yet!  Managed Identities also come in multiple flavors, either system-assigned or user-assigned.  System-assigned managed identities are the cooler of the two in that they share the lifecycle of the resource they’re used by.  For example, a system-assigned managed identity can be created when an Azure Function is created thus that the identity will be deleted once the Azure VM is deleted.  This presents a great option for mitigating the challenge of identity lifecycle management.  By Microsoft handling the lifecyle of these identities each resource could potentially have its own identity making it easier to troubleshoot issues with the identity, avoid potential outages caused by modifying the identity, adhering to least privilege and giving the identity only the permissions the resource requires, and cutting back on support requests by developers to info sec for the creation of identities.

Sometimes it may be desirable to share a managed identity amongst multiple Azure resources such as an application running on multiple Azure VMs.  This use case calls for the other type of managed identity, user-assigned.  These identities do not share the lifecycle of the resources using them.

Let’s take a look at the differences between a service principal object for a user-assigned vs a system-assigned managed identity.  Here I ran another Get-AzureADServicePrincipal and limited the results to serviceprincipaltype of ManagedIdentity.

ObjectId                           : a3e9d372-242e-424b-b97a-135116995d4b
ObjectType                         : ServicePrincipal
AccountEnabled                     : True
AlternativeNames                   : {isExplicit=False, /subscriptions//resourcegroups/managedidentity/providers/Microsoft.Compute/virtualMachines/systemmis}
AppId                              : b7fa9389-XXXX
AppRoleAssignmentRequired          : False
DisplayName                        : systemmis
KeyCredentials                     : {class KeyCredential {
                                       CustomKeyIdentifier: System.Byte[]
                                       EndDate: 11/11/2019 12:39:00 AM
                                       KeyId: f8e439a8-071b-45e0-9f8e-ac10b058a5fb
                                       StartDate: 8/13/2019 12:39:00 AM
                                       Type: AsymmetricX509Cert
                                       Usage: Verify
                                       Value:
                                     }
                                     }
ServicePrincipalNames              : {b7fa9389-XXXX, https://identity.azure.net/XXXX}
ServicePrincipalType               : ManagedIdentity
------------------------------------------------
ObjectId                           : ac960ac7-ca03-4ac0-a7b8-d458635b293b
ObjectType                         : ServicePrincipal
AccountEnabled                     : True
AlternativeNames                   : {isExplicit=True,
                                     /subscriptions//resourcegroups/managedidentity/providers/Microsoft.ManagedIdentity/userAssignedIdentities/testing1234}
AppId                              : fff84e09-XXXX
AppRoleAssignmentRequired          : False
AppRoles                           : {}
DisplayName                        : testing1234
KeyCredentials                     : {class KeyCredential {
                                       CustomKeyIdentifier: System.Byte[]
                                       EndDate: 11/7/2019 1:49:00 AM
                                       KeyId: b3c1808d-6778-4004-b23f-4d339ed0a91f
                                       StartDate: 8/9/2019 1:49:00 AM
                                       Type: AsymmetricX509Cert
                                       Usage: Verify
                                       Value:
                                     }
                                     }
ServicePrincipalNames              : {fff84e09-XXXX, https://identity.azure.net/XXXX}
ServicePrincipalType               : ManagedIdentity


In the above results we can see that the main difference between the user-assigned (testing1234) and system-assigned (systemmis) is the within the AlternativeNames property.  For the system-assigned identity has values of isExplicit set to False and has another value of /subscriptions//resourcegroups/managedidentity/
providers/Microsoft.Compute/virtualMachines/systemmis
Notice the bolded portion specifies this is being used by a virtual machine named systemmis.  The user-assigned identity has the isExplicit set to True and another property with the value of /subscriptions//resourcegroups/managedidentity/
providers/Microsoft.ManagedIdentity/userAssignedIdentities/testing1234
.  Here we can see the identity is an “explicit” managed identity and is not directly linked to an Azure resource.

This difference gives us the ability to quickly report on the number of system-assigned and user-assigned managed identities in a tenant by using the following command.

Get-AzureADServicePrincipal -All $True | Where-Object AlternativeNames -like “isExplicit=True*”

True would give us user-assigned and False would give us system-assigned.  Neat right?

Let’s summarize what we’ve learned:

  • An object in Azure Active Directory is created for each managed identity and represents its security principal
  • The type of object created is a service principal
  • There are multiple service principal types and the one used by a Managed Identity is called ManagedIdentity
  • There are two types of managed identities, user-assigned and system-assigned
  • System-assigned managed identities share the lifecycle of the resource they are associated with while user-assigned managed identities are created separately from the resource, do not share the resource lifecycle, and can be used across multiple resources
  • The object representing a user-assigned managed identity has a unique value of isExplicit=True for the AlternativeNames property while a system-assigned managed identity has that value of isExplicit=False.

That’s it for this post folks.  In the next post I’ll walk through the process of creating a managed identity for an Azure VM and will demonstrate with a bit of Python code how we can use the managed identity to access a secret stored in Azure Key Vault.

See you next post!

Deep Dive into Azure Managed Identities – Part 1

“I love the overhead of password management” said no one ever.

Password management is hard.  It’s even harder when you’re managing the credentials for non-humans, such as those used by an application.  Back in the olden days when the developer needed a way to access an enterprise database or file share, they’d put in a request with help desk or information security to have an account (often referred to as a service account) provisioned in Windows Active Directory, an LDAP, or a SQL database.  The request would go through a business approval and some support person would created the account, set the password, and email the information to the developer.  This process came with a number of risks:

  • Risk of compromise of the account
  • Risk of abuse of the account
  • Risk of a significant outage

These risks arise due to the following gaps in the process:

  • Multiple parties knowing the password (the party who provisions the account and the developer)
  • The password for the account being communicated to the developer unencrypted such as plain text in an email
  • The password not being changed after it is initially set due to the inability or difficult to change the password
  • The password not being regularly rotated due to concerns over application outages
  • The password being shared with other developers and the account then being used across multiple applications without the dependency being documented

Organizations tried to mitigate the risk of compromise by performing such actions as requiring a long and complex password, delivering the password in an encrypted format such as an encrypted Microsoft Office document, instituting policy requiring the password to be changed (exceptions with this one are frequent due to outage concerns), implementing password vaulting and management such as CyberArk Enterprise Password Vault or Hashicorp Vault, and instituting behavioral monitoring solutions to check for abuse.  Password rotation and monitoring are some of the more effective mitigations but can also be extremely challenging and costly to institute at a scale even with a vaulting and management solution.  Even then, there are always the exceptions to the systems with legacy applications which are not compatible (sadly these are often some of the more critical systems).

When the public cloud came around the credential management challenge for application accounts exploded due to the most favored traits of a public cloud which include on-demand self-service and rapid elasticity and scalability.  The challenge that was a few hundred application identities has grown quickly into thousands of applications and especially containers and serverless functions such as AWS Lambda and Azure Functions.  Beyond the volume of applications, the public cloud also changes the traditional security boundary due to its broad network access trait.  Instead of the cozy feeling multiple firewalls gave you, you now have developers using cloud services such as storage or databases which are directly administered via the cloud management plane which is exposed directly to the Internet.  It doesn’t stop here folks, you also have developers heavily using SaaS-based version control solutions to store the code which may have credentials hardcoded into it potentially publicly exposing those credentials.

Thankfully the public cloud providers have heard the cries of us security folk and have been working hard to help address the problem.  One method in use is the creation of security principals which are designed around the use of temporary credentials.  This way there are no long standing credentials to share, compromise, or abuse.  Amazon has robust use of this concept in AWS using IAM Roles.  Instead of hardcoding a set of IAM User credentials in a Lambda or an application running on an EC2 instance, a role can be created with the necessary permissions required for the application and be assumed by either the Lambda service or EC2 instance.

For this series of posts I’m going to be focusing in one of Microsoft Azure’s solutions to this problem which are called Managed Identities.  For you folk that are more familiar with AWS, Managed Identities conceptually work the same was as IAM Roles.  A security principal is created, permissions are granted, and the identity is assumed by a resource such as an Azure Web App or an Azure VM.  There are some features that differ from IAM Roles that add to the appeal of Managed Identities such as associating the identity lifecycle of the Managed Identity to the resource such that when the resource is created, the managed identity is created, and when the resource is destroyed, the identity is destroy.

In this series of posts I’ll be demonstrating how Managed Identities are created, how they are used, and how they differ (sometimes for the better and sometimes not) from AWS IAM Roles.  Hope you enjoy the series and except the next entry in the series early next week.

See you soon fellow geek!

Visualizing AWS Logging Data in Azure Monitor – Part 2

Visualizing AWS Logging Data in Azure Monitor – Part 2

Welcome back folks!

In this post I’ll be continuing my series on how Azure Monitor can be used to visualize log data generated by other cloud services.  In my last post I covered the challenges that multicloud brings and what Azure can do to help with it.  I also gave an overview of Azure Monitor and covered the design of the demo I put together and will be walking through in this post.  Please take a read through that post if you haven’t already.  If you want to follow along, I’ve put the solution up on Github.

Let’s quickly review the design of the solution.

Capture

This solution uses some simple Python code to pull information about the usage of AWS IAM User access id and secret keys from an AWS account.  The code runs via a Lambda and stores the Azure Log Analytics Workspace id and key in environment variables of the Lambda that are encrypted with an AWS KMS key.  The data is pulled from the AWS API using the Boto3 SDK and is transformed to JSON format.  It’s then delivered to the HTTP Data Collector API which places it into the Log Analytics Workspace.  From there, it becomes available to Azure Monitor to query and visualize.

Setting up an Azure environment for this integration is very simple.  You’ll need an active Azure subscription.  If you don’t have one, you can setup a free Azure account to play around.  Once you’re set with the Azure subscription, you’ll need to create an Azure Log Analytics Workspace.  Instructions for that can be found in this Microsoft article.  After the workspace has been setup, you’ll need to get the workspace id and key as referenced in the Obtain workspace ID and key section of this Microsoft article.  You’ll use this workspace ID and key to authenticate to the HTTP Data Collector API.

If you have a sandbox AWS account and would like to follow along, I’ve included a CloudFormation template that will setup the AWS environment.  You’ll need to have an AWS account with sufficient permissions to run the template and provision the resources.  Prior to running the template, you will need to zip up the lambda_function.py and put it on an AWS S3 bucket you have permissions on.  When you run the template you’ll be prompted to provide the S3 bucket name, the name of the ZIP file, the Log Analytics Workspace ID and key, and the name you want the API to assign to the log in the workspace.

The Python code backing the solution is pretty simple.  It uses all standard Python modules except for the boto3 module used to interact with AWS.

import json
import logging
import re
import csv
import boto3
import os
import hmac
import base64
import hashlib
import datetime

from io import StringIO
from datetime import datetime
from botocore.vendored import requests

The first function in the code parses the ARN (Amazon Resource Name) to extract the AWS account number.  This information is later included in the log data written to Azure.

# Parse the IAM User ARN to extract the AWS account number
def parse_arn(arn_string):
    acct_num = re.findall(r'(?<=:)[0-9]{12}',arn_string)
    return acct_num[0]

The second function uses the strftime method to transform the timestamp returned from the AWS API to a format that the Azure Monitor API will detect as a timestamp and make that particular field for each record in the Log Analytics Workspace a datetime type.

# Convert timestamp to one more compatible with Azure Monitor
def transform_datetime(awsdatetime):
transf_time = awsdatetime.strftime("%Y-%m-%dT%H:%M:%S")
return transf_time

The next function queries the AWS API for a listing of AWS IAM Users setup in the account and creates dictionary object representing data about that user. That object is added to a list which holds each object representing each user.

# Query for a list of AWS IAM Users
def query_iam_users():
    
    todaydate = (datetime.now()).strftime("%Y-%m-%d")
    users = []
    client = boto3.client(
        'iam'
    )

    paginator = client.get_paginator('list_users')
    response_iterator = paginator.paginate()
    for page in response_iterator:
        for user in page['Users']:
            user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
            users.append(user_rec)
    return users

The query_access_keys function queries the AWS API for a listing of the access keys that have been provisioned the AWS IAM User as well as the status of those keys and some metrics around the usage.  The resulting data is then added to a dictionary object and the object added to a list.  Each item in the list represents a record for an AWS access id.

# Query for a list of access keys and information on access keys for an AWS IAM User
def query_access_keys(user):
    keys = []
    client = boto3.client(
        'iam'
    )
    paginator = client.get_paginator('list_access_keys')
    response_iterator = paginator.paginate(
        UserName = user['username']
    )

    # Get information on access key usage
    for page in response_iterator:
        for key in page['AccessKeyMetadata']:
            response = client.get_access_key_last_used(
                AccessKeyId = key['AccessKeyId']
            )
            # Santize key before sending it along for export

            sanitizedacctkey = key['AccessKeyId'][:4] + '...' + key['AccessKeyId'][-4:]
            # Create new dictonionary object with access key information
            if 'LastUsedDate' in response.get('AccessKeyLastUsed'):

                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),
                'LastUsedDate':(transform_datetime(response['AccessKeyLastUsed']['LastUsedDate'])),
                'Region':response['AccessKeyLastUsed']['Region'],'Status':key['Status'],
                'ServiceName':response['AccessKeyLastUsed']['ServiceName']}
                keys.append(key_rec)
            else:
                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),'Status':key['Status']}
                keys.append(key_rec)
    return keys

The next two functions contain the code that creates and submits the request to the Azure Monitor API.  The product team was awesome enough to provide some sample code in the in the public documentation for this part.  The code is intended for Python 2 but only required a few small changes to make it compatible with Python 3.

Let’s first talk about the build_signature function.  At this time the API uses HTTP request signing using the Log Analytics Workspace id and key to authenticate to the API.  In short this means you’ll have two sets of shared keys per workspace, so consider the workspace your authorization boundary and prioritize proper key management (aka use a different workspace for each workload, track key usage, and rotate keys as your internal policies require).

Breaking down the code below, we the string that will act as the header includes the HTTP method, length of request content, a custom header of x-ms-date, and the REST resource endpoint.  The string is then converted to a bytes object, and an HMAC is created using SHA256 which is then base-64 encoded.  The result is the authorization header which is returned by the function.

def build_signature(customer_id, shared_key, date, content_length, method, content_type, resource):
    x_headers = 'x-ms-date:' + date
    string_to_hash = method + "\n" + str(content_length) + "\n" + content_type + "\n" + x_headers + "\n" + resource
    bytes_to_hash = bytes(string_to_hash, encoding="utf-8")  
    decoded_key = base64.b64decode(shared_key)
    encoded_hash = base64.b64encode(
        hmac.new(decoded_key, bytes_to_hash, digestmod=hashlib.sha256).digest()).decode()
    authorization = "SharedKey {}:{}".format(customer_id,encoded_hash)
    return authorization

Not much needs to be said about the post_data function beyond that it uses the Python requests module to post the log content to the API.  Take note of the limits around the data that can be included in the body of the request.  Key takeaways here is if you plan pushing a lot of data to the API you’ll need to chunk your data to fit within the limits.

def post_data(customer_id, shared_key, body, log_type):
    method = 'POST'
    content_type = 'application/json'
    resource = '/api/logs'
    rfc1123date = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
    content_length = len(body)
    signature = build_signature(customer_id, shared_key, rfc1123date, content_length, method, content_type, resource)
    uri = 'https://' + customer_id + '.ods.opinsights.azure.com' + resource + '?api-version=2016-04-01'

    headers = {
        'content-type': content_type,
        'Authorization': signature,
        'Log-Type': log_type,
        'x-ms-date': rfc1123date
    }

    response = requests.post(uri,data=body, headers=headers)
    if (response.status_code >= 200 and response.status_code <= 299):
        print("Accepted")
    else:
        print("Response code: {}".format(response.status_code))

Last but not least we have the lambda_handler function which brings everything together. It first gets a listing of users, loops through each user to information about the access id and secret keys usage, creates a log record containing information about each key, converts the data from a dict to a JSON string, and writes it to the API. If the content is successfully delivered, the log for the Lambda will note that it was accepted.

def lambda_handler(event, context):

    # Enable logging to console
    logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    try:

        # Initialize empty records array
        #
        key_records = []
        
        # Retrieve list of IAM Users
        logging.info("Retrieving a list of IAM Users...")
        users = query_iam_users()

        # Retrieve list of access keys for each IAM User and add to record
        logging.info("Retrieving a listing of access keys for each IAM User...")
        for user in users:
            key_records.extend(query_access_keys(user))
        # Prepare data for sending to Azure Monitor HTTP Data Collector API
        body = json.dumps(key_records)
        post_data(os.environ['WorkspaceId'], os.environ['WorkspaceKey'], body, os.environ['LogName'])

    except Exception as e:
        logging.error("Execution error",exc_info=True)

Once the data is delivered, it will take a few minutes for it to be processed and appear in the Log Analytics Workspace. In my tests it only took around 2-5 minutes, but I wasn’t writing much data to the API.  After the data processes you’ll see a new entry under the listing of Custom Logs in the Log Analytics Workspace.  The entry will be the log name you picked and with a _CL at the end.  Expanding the entry will display the columns that were created based upon the log entry.  Note that the columns consumed from the data you passed will end with an underscore and a character denoting the data type.

mylog

Now that the data is in the workspace, I can start querying it and creating some visualizations.  Azure Monitor uses the Kusto Query Language (KQL).  If you’ve ever created queries in Splunk, the language will feel familiar.

The log I created in AWS and pushed to the API has the following schema.  Note the addition of the underscore followed by a character denoting the column data type.

  • logged_Date (string) – The date the Lambda ran
  • user_s (string) – The AWS IAM User the key belongs to
  • account_number_s (string) – The AWS Account number the IAM Users belong to
  • AccessKeyId (string) – The id of the access key associated with the user which has been sanitized to show just the first 4 and last 4 characters
  • CreateDate_t (timestamp) – The date and time when the access key was created
  • LastUsedDate_t (timestamp) – The date and time the key was last used
  • Region_s (string) – The region where the access key was last used
  • Status_s (string) – Whether the key is enabled or disabled
  • ServiceName_s (string) – The AWS service where the access key was last used

In addition to what I’ve pushed, Azure Monitor adds a TimeGenerated field to each record which is the time the log entry was sent to Azure Monitor.  You can override this behavior and provide a field for Azure Monitor to use for this if you like (see here).  There are some other miscellaneous fields are inherited from whatever schema the API is drawing from.  These are fields such as TenantId and SourceSystem, which in this case is populated with RestAPI.

Since my personal AWS environment is quite small and the AWS IAM Users usage are very limited, my data sets aren’t huge.  To address this I created a number of IAM Users with access keys for the purpose blog.  I’m getting that out of the way so my AWS friends don’t hate on me. 🙂

One of core best practices in key management with shared keys is to ensure you rotate them.  The first data point I wanted to extract was which keys that existed in my AWS account were over 90 days old.  To do that I put together the following query:

AWS_Access_Key_Report_CL
| extend key_age = datetime_diff('day',now(),CreateDate_t)
| project Age=key_age,AccessKey=AccessKeyId_s, User=user_s
| where Age > 90
| sort by Age

Let’s walk through the query.  The first line tells the query engine to run this query against the AWS_Access_Key_Report_CL.  The next line creates a new field that contains the age of the key by determining the amount of time that has passed between the creation date of the key and today’s date.  The line after that instructs the engine to pull back only the key_age field I just created and the AccessKeyId_s, user_s , and status_s fields.  The results are then further culled down to pull only records where the key age is greater than 90 days and finally the results are sorted by the age of the key.

query1

Looks like it’s time to rotate that access key in use by Azure AD. 🙂

I can then pin this query to a new shared dashboard for other users to consume.  Cool and easy right?  How about we create something visual?

Looking at the trends in access key creation can provide some valuable insights into what is the norm and what is not.  Let’s take a look a the metrics for key creation (of the keys still exist in an enabled/disabled state).  For that I’m going to use the following query:

AWS_Access_Key_Report_CL
| make-series AccessKeys=count() default=0 on CreateDate_t from datetime(2019-01-01) to datetime(2020-01-01) step 1d

In this query I’m using the make-series operator to count the number of access keys created each day and assigning a default value of 0 if there are no keys created on that date.  The result of the query isn’t very useful when looking at it in tabular form.

query2.PNG

By selecting the Line drop down box, I can transform the date into a line grab which shows me spikes of creation in log creation.  If this was real data, investigation into the spike of key creations on 6/30 may be warranted.

quer2_2.PNG

I put together a few other visuals and tables and created a custom dashboard like the below.  Creating the dashboard took about an hour so, with much of the time invested in figuring out the query language.

dashboard

What you’ve seen here is a demonstration of the power and simplicity of Azure Monitor.  By adding a simple to use API, Microsoft has exponentially increased the agility of the tool by allowing it to become a single pane of glass for monitoring across clouds.   It’s also worth noting that Microsoft’s BI (business intelligence) tool Power BI has direct integration with Azure Log Analytics.  This allows you to pull that log data into PowerBI and perform more in-depth analysis and to create even richer visualizations.

Well folks, I hope you’ve found this series of value.  I really enjoyed creating it and already have a few additional use cases in mind.  Make sure to follow me on Github as I’ll be posting all of the code and solutions I put together there for your general consumption.

Have a great day!

 

 

Exploring Azure AD Privileged Identity Management (PIM) – Part 4 – Access Review and Azure RBAC

Exploring Azure AD Privileged Identity Management (PIM) – Part 4 – Access Review and Azure RBAC

Access Reviews

Welcome to my final post on Azure Active Directory Privileged Identity Management (AAD PIM).  Over this series of posts I’ve provided an overview of the service, guidance on how to set the service up, and a deep dive and look at the user and approver experience.  We’ll wrap up the series by looking at the Access Review feature, take an intermission to look at a new feature, and wrap up with reviewing the Azure RBAC integration.

We have a lot to cover, so let’s jump into it.

As a quick refresher, I’ll be using my Journey Of the Geek tenant.  Within the tenant I have some Office 365 E5 and EMS E5 licenses provisioned.  Our admin user will be initiating the access review and Homer Simpson will acting as a reviewer.

I first log into the Azure Portal as the admin user and open up the AAD PIM shortcut from my dashboard.  Once the application opens, I’m going to navigate to the Azure AD directory roles option.

4aadpim1

After selecting the option my main menu is refreshed to show the management options for the various AAD PIM features.  As a quick refresher, let’s look at the settings I’ve configured for Access Reviews in my tenant.  We navigate to those Settings by clicking the Settings option as seen below and selecting Access Reviews.

4aadpim2

As you can see from the settings in the screenshot below, my tenant is set to send mail notifications to reviewers when a review is started and to admins when it finishes.  It’s also configured for reminders to be sent out to reviewers who haven’t yet completed their review.  I’ve configured reviewers to provider a reason as to why continued access to a privileged role needs to be maintained.  This is a great little option to capture the business requirements behind the access.  Finally, my access reviews are configured to run a total of 30 days.

4aadpim3

Let’s navigate back to the Access Review blade under the management menu.

4aadpim4

On the Access Reviews blade we see a listing of the access reviews in progress.  You can see I setup an access review for users that are members of the Global Admins role.  On the top we have the menu options to start a new access review, filter what access reviews are displayed, change the way they are grouped, and go back to the Access Review settings I showed earlier.

4aadpim5

et’s spin up a new access review for users who are permanent or eligible members of the User Administrators Azure AD role.  We click the Add link and a new blade opens where we can configure a number of options.  We have the basic options of naming the access review, providing a description, and a start and end date.

I’ve selected the User Administrator role as the role being reviewed during this access review.  Notice the Scope option with the Everyone radio button.  Perhaps that’s a placeholder for functionality that will be introduced in the future to limit the users within a role that the access review will cover.  I’ve selected Homer Simpson to be the reviewer for the role.  The advanced settings have inherited the global settings for my tenant for access reviews I covered previously.  Once the information is filled in, I hit the start button to kick off the access review.

4aadpim6

It takes a few minutes for the access review to be created and then it’s displayed in the listing of access reviews with a status of active.

4aadpim7

If we navigate over to Homer Simpson’s Outlook inbox, we see he has received an email informing him an access review has been kicked off and he has been designated as a reviewer and must approve or reject other members’ continued eligibility for the role.

4aadpim8

If we delay acting on the access review for a day we receive another reminder email per our settings.  The email can be seen below.

4aadpim9

If the approvers do not respond to the access review, the review completes but records that none of the users have been reviewed.

4aadpim10.png

Let’s spin up another review and complete this one.

4aadpim11

Homer Simpson again receives the notice that an Access Review has been kicked off.  Clicking the Start Review button in the email opens up the Azure Portal and the AAD PIM blade.  Here Homer gets an overview of the access review including the user who created the review, the length of the review, the description of the review, and the users who are members (permanent or eligible) for the role.

The filter option allows us to filter on the listing of users based upon whether they still need to be reviewed or have been approved or denied.

4aadpim12

We first check off Bart Simpson and see that we are required to input a reason for Bart’s approval or denial.  I input a reason and choose the deny button.  Bart disappears from the menu.  If I use the filter option to show all three categories of users, Bart now reappears under the denied category.

4aadpim13

I check off both Homer and Marge and provide a reason for both users and hit the approve button. All users have been reviewed by Homer Simpson. After refreshing the page the review now shows 0 users remaining to be reviewed.

4aadpim14

Switching over to the browser for the admin user we see that the access review is still open.

4aadpim15

If we open the access review we can see that all users have been reviewed even though the review is still active.  We have the option to Reset the access review to force the approvers to perform the access review activities again or we can stop it.  We’re going to choose end the access review early now that all the reviews have been completed.

4aadpim16

Re-opening the access review, we now have the option to apply the results of it.  After clicking he Apply button the changes are applied and we’re notified via the Portal notification system.

4aadpim17

Navigating to the roles blade under the Manage section now shows only Homer and Marge as being eligible for the User Administrator role verifying that the changes made during the access review have taken effect.

The access review feature is a wonderful addition by Microsoft.  Back in the olden days of Windows Active Directory, managing the entire lifecycle of an identity and its entitlements often involved complex third-party identity management solutions in combination with request management system.  By including this feature out of the gates, Microsoft is showing a real maturity in its identity offerings.

A Brief Intermission

Before I get into what AAD PIM can do for Azure RBAC, I want to touch on a new feature that went into public preview while I was working on this post.  Notice in the Manage section the Roles blade now has a (Preview) notation after it.

4aadpim18

Navigating into the blade shows an entirely new interface with far more useful information.  We now have a complete list of the roles AAD PIM can manage including descriptions.  If we select a role we go a level deeper and can add users to the role as we would expect.

4aadpim19

We also have two new menu options for Description and Definition.  The Description blade opens up and gives us a link to the Microsoft documentation on the role as well as every permissions the role has (AWESOME!).  The Definition blade gives us a JSON view of the role information.  Perhaps we’ll be able to create custom AAD / O365 roles in the future and we’ll be able to use these JSON views as ARM templates?  Time will tell.

4aadpim20

The introduction of this new feature is a great demonstration of how quickly things change in the cloud.

AAD PIM and Azure RBAC

Most organizations consuming Microsoft cloud services don’t just consume Office 365.  These organizations want to yield the benefits of the infrastructure-as-a-Service (IaaS) and platform-as-a-Service (PaaS) services provide by Microsoft’s Azure offering.  Managing authorization in Azure is handled through Azure Role-Based Access Control (RBAC).  In short, Azure RBAC provides a method of authorizing a security principal (user, group, or service principal) to perform an action on a resource (VM, storage account, Azure SQL, etc) based upon membership in a role.  Out of the box Microsoft provides a few roles such as owner, reader, and contributor.  You can also create custom roles to fit your business needs.

 

Similar to Office 365 prior to AAD PIM, preventing standing access of security principals in Azure RBAC roles was left to custom scripts and third-party solutions.  Last year it was announcedthat AAD PIM capabilities were being extended to Azure RBAC.  The integration of AAD PIM and Azure RBAC become generally available in the commercial offering of Azure AD in May of 2018.

For this demonstration I’m going to switch over to my Geek In The Weeds tenant.  Recall that the tenant is a synchronized and federated tenant using Azure AD Connect and Active Directory Federation Services.  I’ve already activated AAD PIM for the tenant so I’ll be jumping right into its integration with Azure RBAC.

After logging into the portal as a user who has permanent membership in the Privileged Role Administrator role I’m faced with the standard admin view of AAD PIM.  In the Manage menu I’m going to select the Azure resources option.

4aadpim21

If this is your first time using AAD PIM with Azure RBAC you’ll need to go through the discovery stage.  This will discover Azure Resources that you have write permissions to and thus have the ability to manage privileged access to.  After discovery is complete you’ll see a screen similar to the below.  You can see that my user is a member of the owners role for the Visual Studio Enterprise Azure subscription and that there are 77 roles defined for the subscription with three security principals holding one or more roles.

4aadpim22

Selecting the subscription resource gives us a dashboard displaying key metrics about PIM activity within the subscription.

4aadpim23.png

One of the metrics that caught my eye was the single user in the User Access Administrator role.  Selecting that area of the dashboard opens a new blade which lists out the members of the role.  We can see the service principal for PIM has been added to the User Access Administrator role to grant the service permissions to administer the roles within the resource (in this case a subscription).

4aadpim24

Notice also that the PIM menu for managing Azure AD/Office 365 differs for the menu for managing Azure RBAC.  We see that the new Role options I outlined above haven’t been migrated to the Azure RBAC integration yet.  Additionally we see that the request approval workflow is still in public preview in Azure RBAC.  In the Azure RBAC menu we also get a Resource Audit log which details PIM activity within the resource.

4aadpim25

Notice also that the general Settings option isn’t present in the Azure RBAC menu.  Instead we have a Role Settings option.  Selecting this option opens a new blade that lists out the Roles associated with the Resource.  Selecting any of the resources opens a new blade where we have the options of configuring a large selection of options for the role for both assignment (making the user eligible or a permanent member) as well as activation.  If you recall the configurable options for the Azure AD / Office 365 roles, these are far more granular.  The additional flexibility makes sense because these roles are going to managing IaaS and PaaS resources which are much more catered to programmatic access by non-humans.  The non-human access tends to be much more predictable than human access, so enforcing controls such as temporary eligibility for a role makes a lot of sense.

4aadpim26

Let’s take a look at what the experience is adding a user to one of the RBAC roles.  The process is very similar to AAD PIM with Azure AD / Office 365 in that we select the Roles option from the Manage section.  For this demonstration I’m going to add a user to the Virtual Machine Contributor role.

Clicking the Add Member option allows me to assign Ash Williams as an eligible member of the role.  Notice the additional option called Set membership settings.  Here I can set a timespan that Ash is eligible for the role.  This option isn’t available in AAD PIM for Azure AD / Office 365 that I could see.

4aadpim27

After hitting the add button Ash is successfully added as a Direct member to the role.  Notice that I can also add groups as members of the Role.  This is another capability unit to the Azure RBAC integration.

4aadpim28

Let’s go through the user experience for activating a role.  For the sake of simplicity I’m going to cover differences in the user experience.  You can reference my third post if you’re curious of the full user experience.

At this point I’ve logged into a virtual machine as Ash Williams and have authenticated to the Azure Portal.  I’ve entered the Azure resources blade.  Here we see the user being informed that no Azure resources are protected by PIM.  In this instance hitting the Discover resources permission will not update this menu because Ash Williams isn’t a member of any role that would grant him write permissions on an Azure Resource.  Instead I’m going to click the Activate Role button.

4aadpim29

After clicking the Activate role button I’m shown the roles Ash Williams is eligible to activate.  Notice Ash has the ability to activate the role due to both his direct membership and his membership in the GIW AIP Users group.  I’d recommend leveraging groups for this access where possible so you don’t get in the situation where you grant a security principal longer access to the role than you wanted due to a direct role assignment situation.

4aadpim30

he activation experience and approval experience is the same from this point forward so I’m going stop here.

Summing It Up

I really enjoyed this blog series.  I hadn’t done a deep dive into AAD PIM since it was in public preview and much has changed since then.  I really like how Microsoft is finally exposing capabilities which have historically been more Azure AD / Office 365 centric to Microsoft Azure.  It’s an excellent marketing tool for companies who may already be using Office 365 but are using another cloud provider for IaaS and PaaS. The product team has also done great job integrating much needed features such as approval workflows, access reviews, and metrics.

I’m not going to have the time to do a post about the AAD PIM PowerShell module but I recommend you check it out if you have some bandwidth.  There are some great opportunities there to integrate PIM functionality with third party workflow management tools to automate the entire user experience behind a GUI you users are already familiar with.

That wraps up my series on Azure AD Privileged Identity Management.  I hope you enjoyed it as much as I did.

See you next post!

Deep Dive into Azure AD Domain Services – Part 3

Deep Dive into Azure AD Domain Services  – Part 3

Well folks, it’s time to wrap up this series on Azure Active Directory Domain Services (AAD DS).  In my first post I covered the basic configurations of the managed domain and in my second post took a look at how well Microsoft did in applying security best practices and complying with NIST standards.  In this post I’m going to briefly cover the LDAPS over the Internet capability, summarize some key findings, and list out some improvements I’d like to see made to the service.

One of the odd features Microsoft provides with the AAD DS service is the ability to expose the managed domain over LDAPS to the Internet.  I really am lost as to the use case that drove the feature.  LDAP is very much a legacy on-premises protocol that has no place being exposed to risks of the public Internet.  It’s the last thing that should the industry should be encouraging.  Just because you can, doesn’t mean you should.   Now let me step off the soap box and let’s take a look at the feature.

As I covered in my last post LDAPS is not natively enabled in the managed domain.  The feature must be configured and enabled through the Azure Portal.  The configuration consists of uploading the private key and certificate the service will use in the form of a PKCS12 file (*.PFX).  The certificate has a few requirements that are outlined in the instructions above.  After the certificate is validated, it takes about 10-15 minutes for the service to become available.  Beyond enabling the service within the VNet, you additionally have the option to expose the LDAPS endpoint to the Internet.

3aads1.png

Microsoft provides instructions on how to restrict access to the endpoint to trusted IPs via a network security group (NSG) because yeah, exposing an LDAP endpoint to the Internet is just a tad risky.  To lock it down you simply associate an NSG with the subnet AAD DS is serving.  Once that is done enable the service via the option in the image above and wait about 10 minutes.  After the service is up, register a external DNS record for the service that points to the IP address noted under the properties section of the AADS blade and you’re good to go.

For my testing purposes, I locked the external LDAPS endpoint down to the public IP address my Azure VM was SNATed to.  After that I created an entry in the host file of the VM that matched the external DNS name I gave the service (whyldap.geekintheweeds.com) to the public IP address of the LDAPS endpoint in order to bypass the split-brain DNS challenge.  Initiating a connection from LDP.EXE was a success.

3aads2.png

Now that we know the service is running, let’s check out what the protocol support and cipher suite looks like.

3aads3.png

Again we see the use of deprecated cipher suites. Here the risk is that much greater since a small mistake with an NSG could expose this endpoint directly to the Internet.  If you’re going to use this feature, please just don’t.  If you’re really determined to, don’t screw up your NSGs.

This series was probably one of the more enjoyable series I’ve done since I knew very little about the AAD DS offering. There were a few key takeaways that are worth sharing:

  • The more objects in the directory, the more expensive the service.
  • Users and groups can be created directly in managed domain after a new organizational unit is created.
  • Password and lockout policy is insanely loose to the point where I can create an account with a three character password (just need to meet complexity requirements) and accounts never lockout.  The policy cannot be changed.
  • RC4 encryption ciphers are enabled and cannot be disabled.
  • NTLMv1 is enabled and cannot be disabled.
  • The service does not support smart-card enforced users.  Yes, that includes both the users synchronized from Azure AD as well as any users you create directly in the managed domain.  If I had to guess, it’s probably due to the fact that you’re not a Domain Admin so hence you can’t add to the NTAuth certificate store.
  • LDAPS is not enabled by default.
  • Schema extensions are not supported.
  • Account-Based Kerberos Delegation is not supported.
  • If you are syncing identities to Azure AD, you’ll also need to synchronize your passwords.
  • The managed domain is very much “out of the box” defaults.
  • Microsoft creates a “god” account which is a permanent member of every privileged group in the forest
  • Recovery of deleted objects created directly in the managed domain is not possible.  The rights have not been delegated to the AADC Administrator.
  • The service does not allow for Active Directory trusts
  • SIDHistory attribute of users and groups sourced from Azure AD is populated with Primary Group from on-premises domain

My verdict on AAD DS is it’s not a very useful service in its current state.  Beyond small organizations, organizations that have very little to no requirements on legacy infrastructure, organizations that don’t have strong security requirements, and dev/qa purposes I don’t see much of a use for it right now.  It comes off as a service in its infancy that has a lot of room to grow and mature.  Microsoft has gone a bit too far in the standardization/simplicity direction and needs to shift a bit in the opposite direction by allowing for more customization, especially in regards to security.

I’d really like to see Microsoft introduce the capabilities below.   All of them should  exposed via the resource blade in the Azure Portal if at all possible.  It would provide a singular administration point (which seems to be the strategy given the move of Azure AD and Intune to the Azure Portal) and would allow Microsoft to control how the options are enabled in the managed domain.  This means no more administrators blowing up their Active Directory forest because they accidentally shut off all the supported cipher suites for Kerberos.

  • Expose Domain Controller Event Logs to Azure Portal/Graph API and add support for AAD DS Power BI Dashboards
  • Support for Active Directory trusts
  • Out of the box provide a Red Forest model (get rid of that “god” account)
  • Option to disable risky cipher suites for both Kerberos and LDAPS
  • Option to harden the password and lockout policy
  • Option to disable NTLMv1
  • Option to turn on LDAP Debug Logging
  • Option to direct Domain Controller event logs to a SIEM
  • Option to restore deleted users and groups that were created directly in the managed domain.  If you’re allow creation, you need to allow for restoration.
  • Removal of Internet-accessible LDAPS endpoint feature or at least somehow incorporate the NSG lockdown feature directly into the AAD DS blade.

While the service has a lot of room for improvement the direction of a managed Windows AD offering is spot on.  In the year 2018, there is no reason Windows AD shouldn’t be offered as a managed service.  The direction Microsoft has gone by sourcing the identities and credentials from Azure AD is especially creative.  It’s a solid step in the direction of creating a singular centralized identity service that provides both legacy and modern protocols.  I’ll be watching this service closely as Microsoft builds upon it for the next few months.

Thanks and see you next post!

Deep Dive into Azure AD Domain Services – Part 1

Deep Dive into Azure AD Domain Services  – Part 1

Hi everyone.  In this series of posts I’ll be doing a deep dive into Microsoft’s Azure AD Domain Services (AAD DS).  AAD DS is Microsoft’s managed Windows Active Directory service offered in Microsoft Azure Infrastructure-as-a-Service intended to compete with similar offerings such as Amazon Web Services’s (AWS) Microsoft Active Directory.  Microsoft’s solution differs from other offerings in that it sources its user and group information from Azure Active Directory versus an on-premises Windows Active Directory or LDAP.

Like its competitors Microsoft realizes there are still a lot of organizations out there who are still very much attached to legacy on-premises protocols such as NTLM, Kerberos, and LDAP.  Not every organization (unfortunately) is ready or able to evolve its applications to consume SAML, Open ID Connect, OAuth, and Rest-Based APIs (yes COTS vendors I’m talking to you and your continued reliance on LDAP authentication in the year 2018).  If the service has to be there, it makes sense to consume a managed service so staff can focus less on maintaining legacy technology like Windows Active Directory and focus more on a modern Identity-as-a-Service (IDaaS), Software-as-a-Service (SaaS), and Platform-as-a-Service (Paas) solutions.

Sounds great right?  Sure, but how does it work?  Microsoft’s documentation does a reasonable job giving the high level details of the service so I encourage you to read through it at some point.  I won’t be covering information included in that documentation unless I notice a discrepancy or an area that could use more detail.  Instead, I’m going to focus on the areas which I feel are important to understand if you’re going to attempt to consume the service in the same way you would a traditional on-premises Windows Active Directory.

With that introduction, let’s dig in.

The first thing I did was to install the Remote Server Administration Tools (RSAT) for Active Directory Domain Services and Group Policy Management tools.  I used these tools to explore some of the configuration choices Microsoft made in the managed service.  I also installed Microsoft Network Monitor 3.4 to review packet captures  captured using the netsh.

After the tools were installed I started a persistent network capture using netsh using an elevated command prompt.  This is an incredibly useful feature of Windows when you need to debug issues that occur prior or during user or system logon.  I’ve used this for years to troubleshoot a number of Windows Active Directory issues including slow logons and failed logons.  The only downfall of this is you’re forced into using Microsoft Network Monitor or Microsoft Message Analyzer to review the packet captures it creates.  While Microsoft Message Analyzer is a sleek tool, the resources required to run it effectively are typically a non-starter for a lab or traditional work laptop so I tend to use Network Monitor.

entry1pic1

After the packet capture was started I went through the standard process of joining the machine to the domain and rebooting the computer.  After reboot, I logged in an account in the AAD DC Administrators Azure Active Directory group, started an elevated command prompt as the VM’s local administrator and stopped the packet capture.  This provided me with a capture of the domain join, initial computer authentication, and initial user authentication.

entry1pic2

While I know you’re as eager to dig into the packet capture as I am, I’ll cover that in a future post.  Instead I decided to break out the RSAT tools and poke around at configuration choices an administrator would normally make when building out a Windows Active Directory domain.

Let’s first open the tool everyone who touches Windows Active Directory is familiar with, Active Directory Users and Computers (ADUC).  The data layout (with Advanced Features option on) for organizational units (OUs) and containers looks very similar to what we’re used to seeing with the exception of the AADC Computers, AADC Users, AADDSDomainAdmin OUs, and AADDSDomainConfig container.  I’ll get into these containers in a minute.

entry1pic3.png

If we right-click the domain node and go to properties we see that the domain and forest are running in Windows Server 2012 R2 domain and forest functional level with no trusts defined.  Examining the operating system tab of the two domain controllers in the Domain Controllers OU shows that both boxes run Windows Server 2012 R2.  Interesting that Microsoft chose not to use Windows Server 2016.

entry1pic4.png

Navigating to the Security tab and clicking the Advanced button shows that the AAD DC Administrator group has only been granted the Create Organizational Unit objects permission while the AAD DC Service Accounts group has been granted Replicating Directory Changes.  As you can see from these permissions the base of the directory tree is very locked down.

entry1pic5.png

Let me circle back to the OUs and Containers I talked about above.  The AADDC Computers and AADDC Users OUs are the default OUs Microsoft creates for you.  Newly joined machines are added to the AADDC Computers OU and users synchronized from the Azure AD tenant are placed in the AADDC Users OU.  As we saw from the permissions above, we could use an account in the AAD DC Administrators group to create additional OUs under the domain node to delegate control to another set of more restricted admins, for the purposes of controlling GPOs if security filtering doesn’t meet our requirements, or for creating additional service accounts or groups for the workloads we deploy in the environment.  The permissions within the default OUs are very limited.  In the AADDC Computer OU GPOs can be applied and computer objects can be added and removed.  In the AADC Users OU only GPOs can be applied which makes sense considering the user and group objects stored there are sourced from your authoritative Azure AD tenant.

The AADDSDomainAdmin OU contains a single security group named AADDS Service Administrators Group (pre-Windows 2000 name of AADDSDomAdmGroup).  The group contains a single member names dcaasadmin which is the renamed built-in Active Directory administrator account.  The group is nested into a number of highly privileged built-in Active Directory groups including Administrators, Domain Admins, Domain Users, Enterprise Admins, and Schema Admins.  I’m very uncomfortable with Microsoft’s choice to make a “god” group and even a “god” user of the built-in administrator.  This directly conflicts with security best practices for Active Directory which would see no account being a permanent member of these highly privileged groups or at the least divvying up the privileges among separate security principals.  I would have liked to see Microsoft leverage a Red Forest Red Forest  design here.  Hopefully we’ll see some improvements as the service matures.  I’m unsure as to the purpose of the OU and this group at this time.

entry1pic6

The AADDSDomainConfig container contains a single container object named SchemaUpdate.  I reviewed the attributes of both containers hoping to glean some idea of the purpose of the containers and the only thing I saw of notice was the revision attribute was set to 2.  Maybe Microsoft is tracking the schema of their standard managed domain image via this attribute?  In a future post in this series I’ll do a comparison of this managed domain’s schema with a fresh Windows Server 2012 R2 schema.

entry1pic7.png

Opening Active Directory Sites and Services shows that Microsoft has chosen to leave the domain with a single site.  This design choice makes sense given that a limitation of AAD DS is that it can only serve a single region.  If that limitation is ever lifted, Microsoft will need to revisit this choice and perhaps include a site for each region.   Expanding the Default-First-Site-Name site and the Servers node shows the two domain controllers Microsoft is using to provide the Windows Active Directory service to the VNet.

entry1pic8

So the layout is simple, what about the group policy objects (GPO)?  Opening up the Group Policy Management Console displays five GPOs which are included in every managed domain.

entry1pic9.png

The AADDC Users GPO is empty of settings while the AADC Computers GPO has a single Preference defined that adds the AAD DC Administrators group to the built-in Administrators group on any member servers added to the OU.  The Default Domain Controllers Policy (DDCP) GPO is your standard out of the box DDCP with nothing special set.  The Default Domain Policy (DDP) GPO on the other hand has a number of settings applied.  The password policy is interesting… I get that you have the option to source all the user accounts within your AAD DS domain from Azure AD, but Microsoft is still giving you the ability to create user accounts in the managed domain as I covered above which makes me uncomfortable with the default password policy.  Microsoft hasn’t delegated the ability to create Fine Grained Password Policies (FGPPs) either, which means you’re stuck with this very lax password policy.  Given the lack of technical enforcement, I’d recommend avoiding creating user accounts directly in the managed domain for any purpose until Microsoft delegates the ability to create FGPPs.  The remaining settings in the policy are standard out of the box DDP.

entry1pic10.png

The GPO named Event Log GPO is linked to the Domain Controllers OU and executes a startup script named EventLogRetentionPolicy.PS1.  Being the nosy geek I am, I dug through SYSVOL to find the script.  The script is very simple in that it sets each event log to overwrite events over 31 days old.  It then verifies the results and prints the results to the console.  Event logs are an interesting beast in AAD DS.  An account in the AAD DC Administrators group doesn’t have the right to connect to the Event Logs on the DCs remotely and I haven’t come across any options to view those logs.  I don’t see any mention of them in the Microsoft documentation, so my assumption is you don’t get access to them at this time.  I have to imagine this is a show stopper for some organizations considering the critical importance of Domain Controller logs.  If anyone knows how to access these logs, please let me know.  I’d like to see Microsoft incorporate an option to send the logs to a syslog agent via a configuration option in the Azure AD Domain Services blade in the Azure Portal.

I’m going to stop here today.  In my next post I’ll do some poking around by running a port scan against the managed domain controllers to see what network flows are open, enable LDAPS to see what the SSL/TLS landscape looks like, and examine authentication protocols and algorithms supported (NTLMv1,v2, Kerberos DES, etc).  Thanks for reading!