DNS in Microsoft Azure – Part 1

DNS in Microsoft Azure – Part 1

Hi everyone,

In this series of posts I’m going to talk about a technology, that while old, still provides a critical foundational service.  Yes folks, we’re going to cover Domain Naming System (DNS).  Specifically, we’re going to look at the options for private DNS in Microsoft Azure and what the positives and negatives are of each pattern.  I’m going to go into this assuming you have a basic knowledge of DNS and understand the namespaces, various record types, forward and reverse lookup zones, recursive and iterative queries, DNS forwarding and conditional forwarding, and other core DNS concepts.  If any of those are unfamiliar to you, take some time to review the basics then come back to this post.

Before we jump into the DNS options in Azure, I first want to cover the 168.63.129.16 address.  If you’ve ever done anything even basic in Azure, you’ve probably run into this address or used it without knowing it.  This public IP address is owned by Microsoft and is presented as a virtual IP address serving as a communication channel to the host node  for a number of platform resources.  It provides functionality such as virtual machine (VM) agent communication of the VM’s ready state, health state, enables the VM to obtain an IP address via DHCP, and you guessed it, enables the VM to leverage Azure DNS services.  The address is static and is the same for any VNet you create in every Azure region.  Fun fact, some geolocation services will report this IP as being based out of Hong Kong and I’m sure you can imagine how that works when something like a WAF is in place with regional IP restrictions.  Fun times. 🙂

Traffic is routed to and from this virtual IP address through the subnet gateway.  If you run a route print on a Windows machine, you can see this route defined in the routing table of the VM.

route

Output of route print on Azure VM

The IP address is also defined in the VirtualNetwork service tag meaning the default rules within a network security group (NSG) allow this traffic to and from the VM.  Given the criticality of the functions the IP plays, Microsoft recommends you allow inbound and outbound communication with it (it’s a requirement for using any of the Azure DNS services we’ll discuss in these posts).

Now that you understand what the 168.63.129.16 virtual IP address is, let’s first cover the very basics of DNS in Azure. You can configure Azure’s DHCP service to push a custom set of DNS servers to Azure VMs or optionally leave the default which is for VMs to use Azure’s DNS services (through the 168.63.129.16 virtual IP address).  This can be configured at the VNet level and then inherited by all virtual network interfaces (VNIs) associated with the VNet, or optionally configured directly on the VNI associated with the VM.

Configure DNS on VNet

Configure DNS on VNet

This brings us to the first option for DNS resolution in Azure, Azure-provided name resolution.  Each time you spin up a virtual network Azure assigns it a unique private DNS namespace using the format <randomly generated>.internal.cloudapp.net.  This namespace is pushed to the machine via DHCP Option 15 thus each VM has an fully qualified domain name of <vm_host_name>.<randomly generated>.internal.cloudapp.net and each VM in the VNet can resolve IP addresses of one another.

Let’s look at an example with a single VNet.  I’ve created a single VNet named vnet1.  I’ve assigned the CIDR block of 10.101.0.0/16 and created a single subnet assigned the 10.101.0.0/24 block.  Two Windows Server 2016 VMs have been created named azuredns and azuredns1 with the IP addresses 10.101.0.4 and 10.101.0.5.  Azure has assigned the a namespace of r0b5mqxog0hu5nbrf150v3iuuh.bx.internal.cloudapp.net to the VNet.  Notes the DHCP Server and DNS Server settings in the ipconfig output of the azuredns vm shown below.

ipconfig

IPConfig output of Azure VM

If we ping azuredns1 from azuredns we can see the in below Wireshark capture that prior to executing the ping, azuredns performs a DNS query to the 168.63.129.16 VIP and gets back a query response with the IP address of azuredns1.

wireshark

Wireshark packet capture of DNS query

The resolution process is very simple as seen in the diagram below.

simple_reso

DNS Resolution within single VNet

Well that’s all well and good for very basic DNS resolution, but who the heck has a single VNet in anything but a test environment?  So can we expand Azure-provided DNS to multiple VNets?  The answer is yes, but it’s ugly.  Recall that each VNet has its own private DNS namespace.  The only way to resolve names contained within that namespace is for a VM in that VNet to send the query to the 168.63.129.16 address.  Yes folks, this means you would need to drop a DNS server in each VNet in order to resolve the Azure-provided DNS host names assigned to VMs within that VNet by another VMs in another VNet as illustrated in the diagram below.

multi_vnet_reso

Multiple VNet resolution

You can see as the number of VNets increases the scalability of this solution quickly breaks down.  Take note that if you wanted to resolve these host names from on-premises you could use a similar conditional forwarder pattern.

Let’s sum up the positives and negatives of Azure-provided DNS.

  • Positives
    • No need to provision your own DNS servers and worry about high availability or scalability
    • DNS service provided by Azure automatically scales
    • VMs within a VNet can resolve each other’s IP addresses out of the box
  • Negatives
    • Solution doesn’t scale with multiple VNets
    • You’re stuck with the namespace assigned to the VNet
    • WINS and NetBIOS are not supported
    • Only A records that are automatically registered by the service are supported (no manual registration of records)
    • No reverse DNS support
    • No query logging

As you can see from the above the negatives far outweigh the positives.  Personally, I see Azure-provided DNS only being useful for bare bones test environments with a single VNet.  If anyone has any other scenarios where it comes in handy, I’d love to hear them.

In my next post I’ll cover Azure’s new offering in the DNS space, Azure Private DNS Zones.  I’ll walk through how it works and how we can combine it with BYO DNS to create some pretty neat patterns.

See you then!

Debugging Azure SDK for Python Using Fiddler

Debugging Azure SDK for Python Using Fiddler

Hi there folks.  Recently I was experimenting with the Azure Python SDK when I was writing a solution to pull information about Azure resources within a subscription.  A function within the solution was used to pull a list of virtual machines in a given Azure subscription.  While writing the function, I recalled that I hadn’t yet had experience handling paged results the Azure REST API which is the underlining API being used by the SDK.

I hopped over to the public documentation to see how the API handles paging.  Come to find out the Azure REST API handles paging in a similar way as the Microsoft Graph API by returning a nextLink property which contains a reference used to retrieve the next page of results.  The Azure REST API will typically return paged results for operations such as list when the items being returned exceed 1,000 items (note this can vary depending on the method called).

So great, I knew how paging was used.  The next question was how the SDK would handle paged results.  Would it be my responsibility or would it by handled by the SDK itself?

If you have experience with AWS’s Boto3 SDK for Python (absolutely stellar SDK by the way) and you’ve worked in large environments, you are probably familiar with the paginator subclass.  Paginators exist for most of the AWS service classes such as IAM and S3.  Here is an example of a code snipped from a solution I wrote to report on aws access keys.

def query_iam_users():

todaydate = (datetime.now()).strftime("%Y-%m-%d")
users = []
client = boto3.client(
'iam'
)

paginator = client.get_paginator('list_users')
response_iterator = paginator.paginate()
for page in response_iterator:
for user in page['Users']:
user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
users.append(user_rec)
return users

Paginators make handling paged results a breeze and allow for extensive flexibility in controlling how paging is handled by the underlining AWS API.

Circling back to the Azure SDK for Python, my next step was to hop over to the SDK public documentation.  Navigating the documentation for the Azure SDK (at least for the Python SDK, I can’ t speak for the other languages) is a bit challenging.  There are a ton of excellent code samples, but if you want to get down and dirty and create something new you’re going to have dig around a bit to find what you need.  To pull a listing of virtual machines, I would be using the list_all method in VirtualMachinesOperations class.  Unfortunately I couldn’t find any reference in the documentation to how paging is handled with the method or class.

So where to now?  Well next step was the public Github repo for the SDK.  After poking around the repo I located the documentation on the VirtualMachineOperations class.  Searching the class definition, I was able to locate the code for the list_all() method.  Right at the top of the definition was this comment:

Use the nextLink property in the response to get the next page of virtual
machines.

Sounds like handling paging is on you right?  Not so fast.  Digging further into the method I came across the function below.  It looks like the method is handling paging itself releasing the consumer of the SDK of the overhead of writing additional code.

        def internal_paging(next_link=None):
            request = prepare_request(next_link)

            response = self._client.send(request, stream=False, **operation_config)

            if response.status_code not in [200]:
                exp = CloudError(response)
                exp.request_id = response.headers.get('x-ms-request-id')
                raise exp

            return response

I wanted to validate the behavior but unfortunately I couldn’t find any documentation on how to control the page size within the Azure REST API.  I wasn’t about to create 1,001 virtual machines so instead I decided to use another class and method in the SDK.  So what type of service would be a service that would return a hell of a lot of items?  Logging of course!  This meant using the list method of the ActivityLogsOperations class which is a subclass of the module for Azure Monitor and is used to pull log entries from the Azure Activity Log.  Before I experimented with the class, I hopped back over to Github and pulled up the source code for the class.  Low and behold we an internal_paging function within the list method that looks very similar to the one for the list_all vms.

        def internal_paging(next_link=None):
            request = prepare_request(next_link)

            response = self._client.send(request, stream=False, **operation_config)

            if response.status_code not in [200]:
                raise models.ErrorResponseException(self._deserialize, response)

            return response

Awesome, so I have a method that will likely create paged results, but how do I validate it is creating paged results and the SDK is handling them?  For that I broke out one of my favorite tools Telerik’s Fiddler.

There are plenty of guides on Fiddler out there so I’m going to skip the basics of how to install it and get it running.  Since the calls from the SDK are over HTTPS I needed to configure Fiddler to intercept secure web traffic.  Once Fiddler was up and running I popped open Visual Studio Code, setup a new workspace, configured a Python virtual environment, and threw together the lines of code below to get the Activity Logs.

from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.monitor import MonitorManagementClient

TENANT_ID = 'mytenant.com'
CLIENT = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
KEY = 'XXXXXX'
SUBSCRIPTION = 'XXXXXX-XXXX-XXXX-XXXX-XXXXXXXX'

credentials = ServicePrincipalCredentials(
    client_id = CLIENT,
    secret = KEY,
    tenant = TENANT_ID
)
client = MonitorManagementClient(
    credentials = credentials,
    subscription_id = SUBSCRIPTION
)

log = client.activity_logs.list(
    filter="eventTimestamp ge '2019-08-01T00:00:00.0000000Z' and eventTimestamp le '2019-08-24T00:00:00.0000000Z'"
)

for entry in log:
    print(entry)

Let me walk through the code quickly.  To make the call I used an Azure AD Service Principal I had setup that was granted Reader permissions over the Azure subscription I was querying.  After obtaining an access token for the service principal, I setup a MonitorManagementClient that was associated with the Azure subscription and dumped the contents of the Activity Log for the past 20ish days.  Finally I incremented through the results to print out each log entry.

When I ran the code in Visual Studio Code an exception was thrown stating there was an certificate verification error.

requests.exceptions.SSLError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /mytenant.com/oauth2/token (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)')))

The exception is being thrown by the Python requests module which is being used underneath the covers by the SDK.  The module performs certificate validation by default.  The reason certificate verification is failing is Fiddler uses a self-signed certificate when configured to intercept secure traffic when its being used as a proxy.  This allows it to decrypt secure web traffic sent by the client.

Python doesn’t use the Computer or User Windows certificate store so even after you trust the self-signed certificate created by Fiddler, certificate validation still fails.  Like most cross platform solutions it uses its own certificate store which has to be managed separately as described in this Stack Overflow article.  You should use the method described in the article for any production level code where you may be running into this error, such as when going through a corporate web proxy.

For the purposes of testing you can also pass the parameter verify with the value of False as seen below.  I can’t stress this enough, be smart and do not bypass certificate validation outside of a lab environment scenario.

requests.get('https://somewebsite.org', verify=False)

So this is all well and good when you’re using the requests module directly, but what if you’re using the Azure SDK?  To do it within the SDK we have to pass extra parameters called kwargs which the SDK refers to as an Operation config.  The additional parameters passed will be passed downstream to the methods such as the methods used by the requests module.

Here I modified the earlier code to tell the requests methods to ignore certificate validation for the calls to obtain the access token and call the list method.

from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.monitor import MonitorManagementClient

TENANT_ID = 'mytenant.com'
CLIENT = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
KEY = 'XXXXXX'
SUBSCRIPTION = 'XXXXXX-XXXX-XXXX-XXXX-XXXXXXXX'

credentials = ServicePrincipalCredentials(
    client_id = CLIENT,
    secret = KEY,
    tenant = TENANT_ID,
    verify = False
)
client = MonitorManagementClient(
    credentials = credentials,
    subscription_id = SUBSCRIPTION,
    verify = False
)

log = client.activity_logs.list(
    filter="eventTimestamp ge '2019-08-01T00:00:00.0000000Z' and eventTimestamp le '2019-08-24T00:00:00.0000000Z'",
    verify = False
)

for entry in log:
    print(entry)

After the modifications the code ran successfully and I was able to verify that the SDK was handling paging for me.

fiddler.png

Let’s sum up what we learned:

  • When using an Azure SDK leverage the Azure REST API reference to better understand the calls the SDK is making
  • Use Fiddler to analyze and debug issues with the Azure SDK
  • Never turn off certificate verification in a production environment and instead validate the certificate verification error is legitimate and if so add the certificate to the trusted store
  • In lab environments, certificate verification can be disabled by passing an additional parameter of verify=False with the SDK method

Hope that helps folks.  See you next time!

Deep Dive into Azure Managed Identities – Part 1

“I love the overhead of password management” said no one ever.

Password management is hard.  It’s even harder when you’re managing the credentials for non-humans, such as those used by an application.  Back in the olden days when the developer needed a way to access an enterprise database or file share, they’d put in a request with help desk or information security to have an account (often referred to as a service account) provisioned in Windows Active Directory, an LDAP, or a SQL database.  The request would go through a business approval and some support person would created the account, set the password, and email the information to the developer.  This process came with a number of risks:

  • Risk of compromise of the account
  • Risk of abuse of the account
  • Risk of a significant outage

These risks arise due to the following gaps in the process:

  • Multiple parties knowing the password (the party who provisions the account and the developer)
  • The password for the account being communicated to the developer unencrypted such as plain text in an email
  • The password not being changed after it is initially set due to the inability or difficult to change the password
  • The password not being regularly rotated due to concerns over application outages
  • The password being shared with other developers and the account then being used across multiple applications without the dependency being documented

Organizations tried to mitigate the risk of compromise by performing such actions as requiring a long and complex password, delivering the password in an encrypted format such as an encrypted Microsoft Office document, instituting policy requiring the password to be changed (exceptions with this one are frequent due to outage concerns), implementing password vaulting and management such as CyberArk Enterprise Password Vault or Hashicorp Vault, and instituting behavioral monitoring solutions to check for abuse.  Password rotation and monitoring are some of the more effective mitigations but can also be extremely challenging and costly to institute at a scale even with a vaulting and management solution.  Even then, there are always the exceptions to the systems with legacy applications which are not compatible (sadly these are often some of the more critical systems).

When the public cloud came around the credential management challenge for application accounts exploded due to the most favored traits of a public cloud which include on-demand self-service and rapid elasticity and scalability.  The challenge that was a few hundred application identities has grown quickly into thousands of applications and especially containers and serverless functions such as AWS Lambda and Azure Functions.  Beyond the volume of applications, the public cloud also changes the traditional security boundary due to its broad network access trait.  Instead of the cozy feeling multiple firewalls gave you, you now have developers using cloud services such as storage or databases which are directly administered via the cloud management plane which is exposed directly to the Internet.  It doesn’t stop here folks, you also have developers heavily using SaaS-based version control solutions to store the code which may have credentials hardcoded into it potentially publicly exposing those credentials.

Thankfully the public cloud providers have heard the cries of us security folk and have been working hard to help address the problem.  One method in use is the creation of security principals which are designed around the use of temporary credentials.  This way there are no long standing credentials to share, compromise, or abuse.  Amazon has robust use of this concept in AWS using IAM Roles.  Instead of hardcoding a set of IAM User credentials in a Lambda or an application running on an EC2 instance, a role can be created with the necessary permissions required for the application and be assumed by either the Lambda service or EC2 instance.

For this series of posts I’m going to be focusing in one of Microsoft Azure’s solutions to this problem which are called Managed Identities.  For you folk that are more familiar with AWS, Managed Identities conceptually work the same was as IAM Roles.  A security principal is created, permissions are granted, and the identity is assumed by a resource such as an Azure Web App or an Azure VM.  There are some features that differ from IAM Roles that add to the appeal of Managed Identities such as associating the identity lifecycle of the Managed Identity to the resource such that when the resource is created, the managed identity is created, and when the resource is destroyed, the identity is destroy.

In this series of posts I’ll be demonstrating how Managed Identities are created, how they are used, and how they differ (sometimes for the better and sometimes not) from AWS IAM Roles.  Hope you enjoy the series and except the next entry in the series early next week.

See you soon fellow geek!

Visualizing AWS Logging Data in Azure Monitor – Part 2

Visualizing AWS Logging Data in Azure Monitor – Part 2

Welcome back folks!

In this post I’ll be continuing my series on how Azure Monitor can be used to visualize log data generated by other cloud services.  In my last post I covered the challenges that multicloud brings and what Azure can do to help with it.  I also gave an overview of Azure Monitor and covered the design of the demo I put together and will be walking through in this post.  Please take a read through that post if you haven’t already.  If you want to follow along, I’ve put the solution up on Github.

Let’s quickly review the design of the solution.

Capture

This solution uses some simple Python code to pull information about the usage of AWS IAM User access id and secret keys from an AWS account.  The code runs via a Lambda and stores the Azure Log Analytics Workspace id and key in environment variables of the Lambda that are encrypted with an AWS KMS key.  The data is pulled from the AWS API using the Boto3 SDK and is transformed to JSON format.  It’s then delivered to the HTTP Data Collector API which places it into the Log Analytics Workspace.  From there, it becomes available to Azure Monitor to query and visualize.

Setting up an Azure environment for this integration is very simple.  You’ll need an active Azure subscription.  If you don’t have one, you can setup a free Azure account to play around.  Once you’re set with the Azure subscription, you’ll need to create an Azure Log Analytics Workspace.  Instructions for that can be found in this Microsoft article.  After the workspace has been setup, you’ll need to get the workspace id and key as referenced in the Obtain workspace ID and key section of this Microsoft article.  You’ll use this workspace ID and key to authenticate to the HTTP Data Collector API.

If you have a sandbox AWS account and would like to follow along, I’ve included a CloudFormation template that will setup the AWS environment.  You’ll need to have an AWS account with sufficient permissions to run the template and provision the resources.  Prior to running the template, you will need to zip up the lambda_function.py and put it on an AWS S3 bucket you have permissions on.  When you run the template you’ll be prompted to provide the S3 bucket name, the name of the ZIP file, the Log Analytics Workspace ID and key, and the name you want the API to assign to the log in the workspace.

The Python code backing the solution is pretty simple.  It uses all standard Python modules except for the boto3 module used to interact with AWS.

import json
import logging
import re
import csv
import boto3
import os
import hmac
import base64
import hashlib
import datetime

from io import StringIO
from datetime import datetime
from botocore.vendored import requests

The first function in the code parses the ARN (Amazon Resource Name) to extract the AWS account number.  This information is later included in the log data written to Azure.

# Parse the IAM User ARN to extract the AWS account number
def parse_arn(arn_string):
    acct_num = re.findall(r'(?<=:)[0-9]{12}',arn_string)
    return acct_num[0]

The second function uses the strftime method to transform the timestamp returned from the AWS API to a format that the Azure Monitor API will detect as a timestamp and make that particular field for each record in the Log Analytics Workspace a datetime type.

# Convert timestamp to one more compatible with Azure Monitor
def transform_datetime(awsdatetime):
transf_time = awsdatetime.strftime("%Y-%m-%dT%H:%M:%S")
return transf_time

The next function queries the AWS API for a listing of AWS IAM Users setup in the account and creates dictionary object representing data about that user. That object is added to a list which holds each object representing each user.

# Query for a list of AWS IAM Users
def query_iam_users():
    
    todaydate = (datetime.now()).strftime("%Y-%m-%d")
    users = []
    client = boto3.client(
        'iam'
    )

    paginator = client.get_paginator('list_users')
    response_iterator = paginator.paginate()
    for page in response_iterator:
        for user in page['Users']:
            user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
            users.append(user_rec)
    return users

The query_access_keys function queries the AWS API for a listing of the access keys that have been provisioned the AWS IAM User as well as the status of those keys and some metrics around the usage.  The resulting data is then added to a dictionary object and the object added to a list.  Each item in the list represents a record for an AWS access id.

# Query for a list of access keys and information on access keys for an AWS IAM User
def query_access_keys(user):
    keys = []
    client = boto3.client(
        'iam'
    )
    paginator = client.get_paginator('list_access_keys')
    response_iterator = paginator.paginate(
        UserName = user['username']
    )

    # Get information on access key usage
    for page in response_iterator:
        for key in page['AccessKeyMetadata']:
            response = client.get_access_key_last_used(
                AccessKeyId = key['AccessKeyId']
            )
            # Santize key before sending it along for export

            sanitizedacctkey = key['AccessKeyId'][:4] + '...' + key['AccessKeyId'][-4:]
            # Create new dictonionary object with access key information
            if 'LastUsedDate' in response.get('AccessKeyLastUsed'):

                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),
                'LastUsedDate':(transform_datetime(response['AccessKeyLastUsed']['LastUsedDate'])),
                'Region':response['AccessKeyLastUsed']['Region'],'Status':key['Status'],
                'ServiceName':response['AccessKeyLastUsed']['ServiceName']}
                keys.append(key_rec)
            else:
                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),'Status':key['Status']}
                keys.append(key_rec)
    return keys

The next two functions contain the code that creates and submits the request to the Azure Monitor API.  The product team was awesome enough to provide some sample code in the in the public documentation for this part.  The code is intended for Python 2 but only required a few small changes to make it compatible with Python 3.

Let’s first talk about the build_signature function.  At this time the API uses HTTP request signing using the Log Analytics Workspace id and key to authenticate to the API.  In short this means you’ll have two sets of shared keys per workspace, so consider the workspace your authorization boundary and prioritize proper key management (aka use a different workspace for each workload, track key usage, and rotate keys as your internal policies require).

Breaking down the code below, we the string that will act as the header includes the HTTP method, length of request content, a custom header of x-ms-date, and the REST resource endpoint.  The string is then converted to a bytes object, and an HMAC is created using SHA256 which is then base-64 encoded.  The result is the authorization header which is returned by the function.

def build_signature(customer_id, shared_key, date, content_length, method, content_type, resource):
    x_headers = 'x-ms-date:' + date
    string_to_hash = method + "\n" + str(content_length) + "\n" + content_type + "\n" + x_headers + "\n" + resource
    bytes_to_hash = bytes(string_to_hash, encoding="utf-8")  
    decoded_key = base64.b64decode(shared_key)
    encoded_hash = base64.b64encode(
        hmac.new(decoded_key, bytes_to_hash, digestmod=hashlib.sha256).digest()).decode()
    authorization = "SharedKey {}:{}".format(customer_id,encoded_hash)
    return authorization

Not much needs to be said about the post_data function beyond that it uses the Python requests module to post the log content to the API.  Take note of the limits around the data that can be included in the body of the request.  Key takeaways here is if you plan pushing a lot of data to the API you’ll need to chunk your data to fit within the limits.

def post_data(customer_id, shared_key, body, log_type):
    method = 'POST'
    content_type = 'application/json'
    resource = '/api/logs'
    rfc1123date = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
    content_length = len(body)
    signature = build_signature(customer_id, shared_key, rfc1123date, content_length, method, content_type, resource)
    uri = 'https://' + customer_id + '.ods.opinsights.azure.com' + resource + '?api-version=2016-04-01'

    headers = {
        'content-type': content_type,
        'Authorization': signature,
        'Log-Type': log_type,
        'x-ms-date': rfc1123date
    }

    response = requests.post(uri,data=body, headers=headers)
    if (response.status_code >= 200 and response.status_code <= 299):
        print("Accepted")
    else:
        print("Response code: {}".format(response.status_code))

Last but not least we have the lambda_handler function which brings everything together. It first gets a listing of users, loops through each user to information about the access id and secret keys usage, creates a log record containing information about each key, converts the data from a dict to a JSON string, and writes it to the API. If the content is successfully delivered, the log for the Lambda will note that it was accepted.

def lambda_handler(event, context):

    # Enable logging to console
    logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    try:

        # Initialize empty records array
        #
        key_records = []
        
        # Retrieve list of IAM Users
        logging.info("Retrieving a list of IAM Users...")
        users = query_iam_users()

        # Retrieve list of access keys for each IAM User and add to record
        logging.info("Retrieving a listing of access keys for each IAM User...")
        for user in users:
            key_records.extend(query_access_keys(user))
        # Prepare data for sending to Azure Monitor HTTP Data Collector API
        body = json.dumps(key_records)
        post_data(os.environ['WorkspaceId'], os.environ['WorkspaceKey'], body, os.environ['LogName'])

    except Exception as e:
        logging.error("Execution error",exc_info=True)

Once the data is delivered, it will take a few minutes for it to be processed and appear in the Log Analytics Workspace. In my tests it only took around 2-5 minutes, but I wasn’t writing much data to the API.  After the data processes you’ll see a new entry under the listing of Custom Logs in the Log Analytics Workspace.  The entry will be the log name you picked and with a _CL at the end.  Expanding the entry will display the columns that were created based upon the log entry.  Note that the columns consumed from the data you passed will end with an underscore and a character denoting the data type.

mylog

Now that the data is in the workspace, I can start querying it and creating some visualizations.  Azure Monitor uses the Kusto Query Language (KQL).  If you’ve ever created queries in Splunk, the language will feel familiar.

The log I created in AWS and pushed to the API has the following schema.  Note the addition of the underscore followed by a character denoting the column data type.

  • logged_Date (string) – The date the Lambda ran
  • user_s (string) – The AWS IAM User the key belongs to
  • account_number_s (string) – The AWS Account number the IAM Users belong to
  • AccessKeyId (string) – The id of the access key associated with the user which has been sanitized to show just the first 4 and last 4 characters
  • CreateDate_t (timestamp) – The date and time when the access key was created
  • LastUsedDate_t (timestamp) – The date and time the key was last used
  • Region_s (string) – The region where the access key was last used
  • Status_s (string) – Whether the key is enabled or disabled
  • ServiceName_s (string) – The AWS service where the access key was last used

In addition to what I’ve pushed, Azure Monitor adds a TimeGenerated field to each record which is the time the log entry was sent to Azure Monitor.  You can override this behavior and provide a field for Azure Monitor to use for this if you like (see here).  There are some other miscellaneous fields are inherited from whatever schema the API is drawing from.  These are fields such as TenantId and SourceSystem, which in this case is populated with RestAPI.

Since my personal AWS environment is quite small and the AWS IAM Users usage are very limited, my data sets aren’t huge.  To address this I created a number of IAM Users with access keys for the purpose blog.  I’m getting that out of the way so my AWS friends don’t hate on me. 🙂

One of core best practices in key management with shared keys is to ensure you rotate them.  The first data point I wanted to extract was which keys that existed in my AWS account were over 90 days old.  To do that I put together the following query:

AWS_Access_Key_Report_CL
| extend key_age = datetime_diff('day',now(),CreateDate_t)
| project Age=key_age,AccessKey=AccessKeyId_s, User=user_s
| where Age > 90
| sort by Age

Let’s walk through the query.  The first line tells the query engine to run this query against the AWS_Access_Key_Report_CL.  The next line creates a new field that contains the age of the key by determining the amount of time that has passed between the creation date of the key and today’s date.  The line after that instructs the engine to pull back only the key_age field I just created and the AccessKeyId_s, user_s , and status_s fields.  The results are then further culled down to pull only records where the key age is greater than 90 days and finally the results are sorted by the age of the key.

query1

Looks like it’s time to rotate that access key in use by Azure AD. 🙂

I can then pin this query to a new shared dashboard for other users to consume.  Cool and easy right?  How about we create something visual?

Looking at the trends in access key creation can provide some valuable insights into what is the norm and what is not.  Let’s take a look a the metrics for key creation (of the keys still exist in an enabled/disabled state).  For that I’m going to use the following query:

AWS_Access_Key_Report_CL
| make-series AccessKeys=count() default=0 on CreateDate_t from datetime(2019-01-01) to datetime(2020-01-01) step 1d

In this query I’m using the make-series operator to count the number of access keys created each day and assigning a default value of 0 if there are no keys created on that date.  The result of the query isn’t very useful when looking at it in tabular form.

query2.PNG

By selecting the Line drop down box, I can transform the date into a line grab which shows me spikes of creation in log creation.  If this was real data, investigation into the spike of key creations on 6/30 may be warranted.

quer2_2.PNG

I put together a few other visuals and tables and created a custom dashboard like the below.  Creating the dashboard took about an hour so, with much of the time invested in figuring out the query language.

dashboard

What you’ve seen here is a demonstration of the power and simplicity of Azure Monitor.  By adding a simple to use API, Microsoft has exponentially increased the agility of the tool by allowing it to become a single pane of glass for monitoring across clouds.   It’s also worth noting that Microsoft’s BI (business intelligence) tool Power BI has direct integration with Azure Log Analytics.  This allows you to pull that log data into PowerBI and perform more in-depth analysis and to create even richer visualizations.

Well folks, I hope you’ve found this series of value.  I really enjoyed creating it and already have a few additional use cases in mind.  Make sure to follow me on Github as I’ll be posting all of the code and solutions I put together there for your general consumption.

Have a great day!

 

 

A Comparison – AWS Managed Microsoft AD and Azure Active Directory Domain Services

A Comparison – AWS Managed Microsoft AD and Azure Active Directory Domain Services

Over the past year I’ve done deep dives into both Amazon’s AWS Managed Microsoft Active Directory and Microsoft’s Azure Active Directory Domain Services.  These services represent each vendor’s offering of a managed Windows Active Directory (AD) service.  I extensively covered the benefits of a service over the course of the posts, so today I’m going to cover the key features of each service.  I’m also going to include two tables.  One table will outline the differences in general features while the other outlines the differences in security-related features.

Let’s hit on the key points first.

  • Amazon provides a legacy (Windows AD is legacy folks) managed service while Microsoft provides a modernized service (Azure AD) which has been been integrated with a legacy service.
  • Microsoft synchronizes users, passwords hashes, and groups from the Azure AD to a managed instance of Windows Active Directory.  The reliance on this synchronization means the customer has to be comfortable synchronizing both directory data and password hashes to Azure AD.  Amazon does not require any data be synchronized.
  • Amazon provides the capability to leverage the identities in the managed instance of Windows AD or in a forest that has a trust with the managed instance to be leveraged in managing AWS resources.  In this instance Amazon is taking a legacy service and enabling it for management of the modern cloud management plane.
  • The pricing model for the services differs where Amazon bills on a per domain controller basis while Microsoft bills on the number of objects in the directory.
  • Amazon’s service is eligible to be used in solutions that require PCI DSS Level 1 or HIPAA.
  • Both services use a delegated model where the customer has full control over an OU rather the directory itself.  Highly privileged roles such as Schema Admin, Enterprise Admins, and Domain Admins are maintained by the cloud provider.
  • Both services provide LDAP for legacy applications customers may be trying to lift and shift.  Microsoft limits LDAP to read operations while Amazon supports both read and write operations.
  • Both services support LDAPS.  At this time Amazon requires an instance of Active Directory Certificate Services be deployed to act as a Certificate Authority and provide certificates to the managed domain controllers.
  • Both services do not allow the customer to modify the Default Domain Policy or Default Domain Controller Policies.  This means the customer cannot modify the password or lockout policy applied to the domain.  Amazon provides a method of enforcing custom password and lockout policies through Fine Grained Password Policies.  Additionally, the customer does not have the ability to harden the OS of the domain controllers for either service so it is important to review the vendor documentation.
  • Amazon’s service supports Active Directory forest trusts and external trusts.  Microsoft’s service doesn’t support trusts at this time.

Here is a table showing the comparison of general features:

Features AWS Managed Microsoft AD Azure Active Directory Domain Services
Cost Basis Number of Domain Controllers Number of Directory Objects
Schema Extensions Yes, with limitations No
Trusts Yes, with limitations No
Domain Controller Log Access Security and DNS Server Event Logs No
DNS Management Yes Yes
Snapshots Yes No
Limit of Managed Forests 10 per account 1 per Azure AD tenant
Supports being used on-premises Yes with Direct Connect or VPN No, within VNet only
Scaled By Customer Yes No
Max number of Domain Controller 20 per directory Unknown how service is scaled

Here is a table of security capabilities:

Features AWS Managed Microsoft AD Azure Active Directory Domain Services
Requires Directory Synchronization No Yes, including password
Fine-Grained Password Policies Yes, limited to seven No
Smart Card Authentication Not native, requires RADIUS No
LDAPS Yes, with special requirements Yes, but LDAP operations are limited to read
LDAPS Protocols SSLv3, TLS 1.0, TLS 1.2 TLS 1.0, TLS 1.2
LDAPS Cipher Suites RC4, 3DES, AES128, AES256 RC4, 3DES, AES128, AES256
Kerberos Delegation Account-Based and Resource-Based Resource-Based
Kerberos Encryption RC4, AES128, AES256 RC4, AES128, AES256
NTLM Support NTLMv1, NTLMv2 NTLMv1, NTLMv2

Well folks that sums it up.  As you can see from both of the series as well as this summary post both vendors have taken very different approaches in providing the service.  It will be interesting to see how these offerings evolve over the next few years.  As much as we’d love to see Windows Active Directory go away, it will still be here for years to come.

Until next time my fellow geeks!

AWS Managed Microsoft AD Deep Dive Part 7 – Trusts and Domain Controller Event Logs

AWS Managed Microsoft AD Deep Dive  Part 7 – Trusts and Domain Controller Event Logs

Welcome back fellow geek.  Today I’m continuing my deep dive series into AWS Managed Microsoft AD.  This will represent the seventh post in the series and I’ve covered some great content over the series including:

  1. An overview of the service
  2. How to setup the service
  3. The directory structure, pre-configured security principals, group policies and the delegated security model
  4. How to configure LDAPS and the requirements that pop up due to Amazon’s delegation model
  5. Security of the service including supported secure transport protocols, ciphers, and authentication protocols
  6. How do schema extensions work and what are the limitations

Today I’m going cover three additional capabilities of AWS Managed Microsoft AD which includes the creation of trusts, access to the Domain Controller event logs, and scalability.

I’ll first cover the capabilities around Active Directory trusts.  Providing this capability opens up the possibility a number of scenarios that aren’t possible in managed Windows Active Directory (Windows AD) services that don’t support trusts such as Microsoft’s Azure Active Directory Domain Services.  Some of the scenarios that pop up in my head are resource forest, trusts with trusted partners to maintain collaboration for legacy applications (applications dependent on legacy protocols such as Kerberos/NTLM/LDAP), trusts between development, QA, and production forests, and the usage of features features such as selective authentication to mitigate the risk to on-premises infrastructure.

For many organizations, modernization of an entire application catalog isn’t feasible but those organizations still want to take advantage of the cost and security benefits of cloud services.  This is where AWS Managed Microsoft AD can really shine.  It’s capability to support Active Directory forests trusts opens up the opportunity for those organizations to extend their identity boundary to the cloud while supporting legacy infrastructure.  Existing on-premises core infrastructure services such as PKI and SIEM can continue to be used and even extended to monitor the infrastructure using the managed Windows AD.

As you can see this is an extremely powerful capability and makes the service a good for almost every Windows AD scenario.  So that’s all well and good, but if you wanted marketing material you’d be reading the official documentation right?  You came here for the deep dive, so let’s get into it.

The first thing that popped into my mind was the question as to how Amazon would be providing this capability in a managed service model.  Creating a forest trust typically requires membership in privileged groups such as Enterprise Admins and Domain Admins, which obviously isn’t possible in a manged service.  I’m sure it’s possible to delegate the creation of Active Directory trusts and DNS conditional forwarders with modifications of directory permissions and possibly user rights, but there’s a better way.  What is this better way you may be asking yourself?  Perhaps serving it up via the Directory Services console in the same way schema modifications are served up?

Let’s walk through the process of setting up an Active Directory forest trust with a customer-managed traditional implementation of Windows Active Directory and an instance of AWS Managed Microsoft AD.  For this I’ll be leveraging my home Hyper-V lab.  I’m actually in the process of rebuilding it so there isn’t much there right now.  The home lab consists of two virtual machines, one named JOG-DC running Windows Server 2016 and functions as a domain controller (AD DS) and certificate authority (AD CS) for the journeyofthegeek.com Active Directory forest.  The other virtual machine is named named JOG-CLIENT, runs Windows 10, and is joined to the journeyofthegeek.com domain.  I’ve connected my VPC with my home lab using AWS’s Managed VPN to setup a site-to-site IPSec VPN connection with my local pfSense box.

7awsadds1.png

Prior to setting up the trusts there are a few preparatory steps that need to be completed.  The steps will be familiar to those of you who have established forests trusts across firewalled network segments.  At a high level, you’ll want to perform the following tasks:

  1. Ensure the appropriate ports are opened between the two forests.
  2. Ensure DNS resolution between the two forests is established

For the first step I played it lazy since this is is a temporary configuration (please don’t do this in production).   I allowed all traffic from the VPC address range to my lab environment by modifying the firewall rules on my pfSense box.  On the AWS side I needed to adjust the traffic rules for the security group SERVER01 is in as well as the security group for the managed domain controllers.

7awsadds2.png

To establish DNS resolution between the two forests I’ll be using conditional forwarders setup within each forest.  Setting the conditional forwarders up in the journeyofthegeek.com forest means I have to locate the IP addresses of the managed domain controllers in AWS.  There are a few ways you could do it, but I went to the AWS Directory Services Console and selected the geekintheweeds.com directory.

7awsadds3

On the Directory details section of the console the DNS addresses list the IP addresses the domain controllers are using.

7awsadds4.png

After creating the conditional forwarder in the DNS Management MMC in the journeyofthegeek.com forest, DNS resolution of a domain controller from geekintheweeds.com was successful.

7awsadds5.png

I next created the trust in the journeyofthegeek.com domain ensuring to select the option to create the trust in this domain only and recording the trust password using the Active Directory Domains and Trusts.  We can’t create the trusts in both domains since we don’t have an account with the appropriate privileges in the AWS managed domain.

Next up I bounced back over to the Directory Services console and selected the geekintheweeds.com directory.  From there I selected the Network & security tab to open the menu needed to create the trust.

7awsadds6.png

From here I clicked the Add trust relationship button which brings up the Add a trust relationship menu.  Here I filled in the name of the domain I want to establish the trust with, the trust password I setup in the journeyofthegeek.com domain, select a two-way trust, and add an IP that will be used within configuration of the conditional forwarder setup by the managed service.

7awsadds7.png

After clicking the Add button the status of the trust is updated to Creating.

7awsadds8.png

The process takes a few minutes after which the status reports as verified.

7awsadds9.png

Opening up the Active Directory Users and Computers (ADUC) MMC in the journeyofthegeek.com domain and selecting the geekintheweeds.com domain successfully displays the directory structure.  Trying the opposite in the geekintheweeds.com domain works correctly as well.  So our two-way trust has been created successfully.  We would now have the ability to setup any of the scenarios I talked about earlier in the post including a resource forest or leveraging the managed domain as a primary Windows AD service for on-premises infrastructure.

The second capability I want to briefly touch on is the ability to view the Security Event Log and DNS Server logs on the managed domain controllers.  Unlike Microsoft’s managed Windows AD service, Amazon provides ongoing access to the Security Event Log and DNS Server Log.  The logs can be viewed using the Event Log MMC from a domain-joined machine or programmatically with PowerShell.  The group policy assigned to the Domain Controllers OU enforces a maximum event log size of 256MB but Amazon also archives a year’s worth of logs which can be requested in the event of an incident.  The lack of this capability was a big sore spot for me when I looked at Azure Active Directory Domain Services.  It’s great to see Amazon has identified this critical use case.

Last but definitely not least, let’s quickly cover the scalability of the service.  Follow Microsoft best practices and you can take full advantage of scaling horizontally with the click of a single button.  Be aware that the service only scales horizontally and not vertically.  If you have applications that don’t follow best practices and point to specific domain controllers or perform extremely inefficient LDAP queries (yes I’m talking to you developers who perform searches using front and rear-facing wildcards and use LDAP_MATCHING_RULE_IN_CHAIN filters) horizontal scaling isn’t going to help you.

Well folks that rounds out this entry into the series.  As we saw in the post Amazon has added key capabilities that Microsoft’s managed service is missing right now.  This makes AWS Managed Microsoft AD the more versatile of the two services and more than likely a better fit in almost any scenario where there is a reliance on Windows AD.

In my final posts of the series I’ll provide a comparison chart showing the differing capabilities of both AWS and Microsoft’s services.

See you next post!

 

 

 

AWS Managed Microsoft AD Deep Dive Part 6 – Schema Modifications

AWS Managed Microsoft AD Deep Dive  Part 6 – Schema Modifications

Yes folks, we’re at the six post for the series on AWS Managed Microsoft AD (AWS Managed AD.  I’ve covered a lot of material over the series including an overview, how to setup the service, the directory structure, pre-configured security principals, group policies, and the delegated security model, how to configure LDAPS in the service and the implications of Amazon’s design, and just a few days ago looked at the configuration of the security of the service in regards to protocols and cipher suites.  As per usual, I’d highly suggest you take a read through the prior posts in the series before starting on this one.

Today I’m going to look the capabilities within the AWS Managed AD to handle Active Directory schema modifications.  If you’ve read my series on Microsoft’s Azure Active Directory Domain Services (AAD DS) you know that the service doesn’t support the schema modifications.  This makes Amazon’s service the better offering in an environment where schema modifications to the standard Windows AD schema are a requirement.  However, like many capabilities in a managed Windows Active Directory (Windows AD) service, limitations are introduced when compared to a customer-run Windows Active Directory infrastructure.

If you’ve administered an Active Directory environment in a complex enterprise (managing users, groups, and group policies doesn’t count) you’re familiar with the butterflies that accompany the mention of a schema change.  Modifying the schema of Active Directory is similar to modifying the DNA of a living being.  Sure, you might have wonderful intentions but you may just end up causing the zombie apocalypse.  Modifications typically mean lots of application testing of the schema changes in a lower environment and a well documented and disaster recovery plan (you really don’t want to try to recover from a failed schema change or have to back one out).

Given the above, you can see the logic of why a service provider providing a managed Windows AD service wouldn’t want to allow schema changes.  However, there very legitimate business justifications for expanding the schema (outside your standard AD/Exchange/Skype upgrades) such as applications that need to store additional data about a security principal or having a business process that would be better facilitated with some additional metadata attached to an employee’s AD user account.  This is the market share Amazon is looking to capture.

So how does Amazon provide for this capability in a managed Windows AD forest?  Amazon accomplishes it through a very intelligent method of performing such a critical activity.  It’s accomplished by submitting an LDIF through the AWS Directory Service console.  That’s right folks, you (and probably more so Amazon) doesn’t have to worry about you as the customer having to hold membership in a highly privileged group such as Schema Admins or absolutely butchering a schema change by modifying something you didn’t intend to modify.

Amazon describes three steps to modifying the schema:

  1. Create the LDIF file
  2. Import the LDIF file
  3. Verify the schema extension was successful

Let’s review each of the steps.

In the first step we have to create a LDAP Data Interchange Format (LDIF) file.  Think of the LDIF file as a set of instructions to the directory which in this could would be an add or modify to an object class or attribute.  I’ll be using a sample LDIF file I grabbed from an Oracle knowledge base article.  This schema file will add the attributes of unixUserName, unixGroupName, and unixNameIinfo to the default Active Directory schema.

To complete step one I dumped the contents below into an LDIF file and saved it as schemamod.ldif.

dn: CN=unixUserName, CN=Schema, CN=Configuration, DC=example, DC=com
changetype: add
attributeID: 1.3.6.1.4.1.42.2.27.5.1.60
attributeSyntax: 2.5.5.3
isSingleValued: TRUE
searchFlags: 1
lDAPDisplayName: unixUserName
adminDescription: This attribute contains the object's UNIX username
objectClass: attributeSchema
oMSyntax: 27

dn: CN=unixGroupName, CN=Schema, CN=Configuration, DC=example, DC=com
changetype: add
attributeID: 1.3.6.1.4.1.42.2.27.5.1.61
attributeSyntax: 2.5.5.3
isSingleValued: TRUE
searchFlags: 1
lDAPDisplayName: unixGroupName
adminDescription: This attribute contains the object's UNIX groupname
objectClass: attributeSchema
oMSyntax: 27

dn:
changetype: modify
add: schemaUpdateNow
schemaUpdateNow: 1
-

dn: CN=unixNameInfo, CN=Schema, CN=Configuration, DC=example, DC=com
changetype: add
governsID: 1.3.6.1.4.1.42.2.27.5.2.15
lDAPDisplayName: unixNameInfo
adminDescription: Auxiliary class to store UNIX name info in AD
mayContain: unixUserName
mayContain: unixGroupName
objectClass: classSchema
objectClassCategory: 3
subClassOf: top

For the step two I logged into the AWS Management Console and navigated to the Directory Service Console.  Here we can see my instance AWS Managed AD with the domain name of geekintheweeds.com.

6awsadds1.png

I then clicked hyperlink on my Directory ID which takes me into the console for the geekintheweeds.com instance.  Scrolling down shows a menu where a number of operations can be performed.  For the purposes of this blog post, we’re going to focus on the Maintenance menu item.  Here we the ability to leverage AWS Simple Notification Service (AWS SNS) to create notifications for directory changes such as health changes where a managed Domain Controller goes down.  The second section is a pretty neat feature where we can snapshot the Windows AD environment to create a point-in-time copy of the directory we can restore.  We’ll see this in action in a few minutes.  Lastly, we have the schema extensions section.

6awsadds2.png

Here I clicked the Upload and update schema button and entered selected the LDIF file and added a short description.  I then clicked the Update Schema button.

6awsadds3.png

If you know me you know I love to try to break stuff.  If you look closely at the LDIF contents I pasted above you’ll notice I didn’t update the file with my domain name.  Here the error in the LDIF has been detected and the schema modification was cancelled.

6awsadds4.png

I went through made the necessary modifications to the file and tried again.  The LDIF processes through and the console updates to show the schema change has been initialized.

6awsadds5.png

Hitting refresh on the browser window updates the status to show Creating Snapshot.  Yes folks Amazon has baked into the schema update process a snapshot of the directory provide a fallback mechanism in the event of your zombie apocalypse.  The snapshot creation process will take a while.

6awsadds6.png

While the snapshot process, let’s discuss what Amazon is doing behind the scenes to process the LDIF file.  We first saw that it performs some light validation on the LDIF file, it then takes a snapshot of the directory, then applies to the changes to a single domain controller by selecting one as the schema master, removing it from directory replication, and applying the LDIF file using the our favorite old school tool LDIFDE.EXE.  Lastly, the domain controller is added back into replication to replicate the changes to the other domain controller and complete the changes.  If you’ve been administering Windows AD you’ll know this has appeared recommended best practices for schema updates over the years.

Once the process is complete the console updates to show completion of the schema installation and the creation of the snapshot.

6awsadds7.png