Experimenting with Azure Arc

Posted on June 12, 2021 by mattfeltonma

Hello fellow geeks! Life got busy this year and I haven’t had much time to post. I’ve been been working on some fun side projects including an enterprise-like environment for Azure, some security control mappings, and some reusable artifacts to simplify customer journeys into Azure. Outside of those efforts, I’ve spent some time addressing technical skill gaps, one of which was Azure Arc which is the topic of this post today.

Azure Arc is Microsoft’s attempt to extend the Azure management plane and Azure capabilities to resources running on-premises and other clouds. As the date of this blog these resources include Windows and Linux machines and Kubernetes clusters running on-premises or in another cloud like AWS (Amazon Web Services). Integrating these resources with Azure Arc projects them into the Azure management plane making them resources of Azure.

Once the resources are projected into Azure many capabilities of the platform can be used to assist with managing the resources. Examples include installing VM (virtual machine) extensions Microsoft Monitoring Agent extension to deliver logs and metrics to Azure Monitor, tracking changes and the software inventory of a machine using Azure Automation, or even auditing a machine’s compliance to a specific set of controls using Azure Policy. On the Kubernetes side you can monitor your on-premises clusters using Azure Container Insights or even deploy an App Service on your on-premises cluster. These capabilities continue to grow so check the official documentation for up to date information.

One of the interesting capabilities of Azure Arc that captured my eye was the ability to use a system-assigned management identity. I’ve written in the past about managed identities so I’ll stick to covering the very basics of them today. A managed identity is simply some Microsoft-managed automation on top of an Azure AD service principal, which is best explained as a security principal used for non-humans. One of the primary benefits managed identities provider over traditional service principals is automatic credential rotation. If you come from the AWS world, managed identities are similar to AWS IAM roles.

Those of you that manage service principals of scale are surely familiar with the pain of having to work with application owners to rotate the credentials of service principals in use within corporate applications. Beyond the operational overhead, the security risk exists for an attacker to obtain the service principal credentials and use them outside of Azure. Since service principals are not yet subject to Azure AD Conditional access, rotation of credentials becomes a critical control to have in place. Both this operational overhead and security risk make the automated rotation of credentials capability of managed identities a must have.

Managed identities are normally only available for use by resources running in Azure like Azure VMs, App Service instances, and the like. In the past, if you were coming from outside of Azure such as on-premises or another cloud like AWS, you’d be forced to use a service principal and struggle with the challenges I’ve outlined above. Azure Arc introduces the ability to leverage managed identities from outside of Azure.

This made me very curious as to how this was being accomplished for a machine running outside of Azure. The official documentation goes into some detail. The documentation explains a few key items:

A system-assigned managed identity is provisioned for the Arc-enabled server when the server is onboarded to Arc
The Azure Instance Metadata Service (IDMS) is configured with the managed identity’s service principal client id and certificate
Code running on the machine can request an access token from the IDMS
The IDMS challenges the application code to prove it is privileged enough to obtain the access token by requiring it provide a secret to attest that it is highly privileged

This information left me with a few questions:

What exactly is the challenge it’s hitting the application with?
Since the service principal is using a certificate-based credential, the private key has to be stored on the machine. If so where and how easy would it be for an attacker to steal it?
How easy would it be to steal the identity of the Azure Arc machine in order to impersonate it on another machine?

To answer these questions I decided to deploy Azure Arc to a number of Windows and Linux machines I have running on my home instance of Hyper-V. That would give me god access to each machine and the ability to deploy whatever toolset I wanted to take a peek behind the curtains. Using the service-principal onboarding process I deployed Azure Arc to a number of Windows and Linux machines running in my home lab. Since I’m stronger in Windows, I decided I’d try to extract the service principal credentials from a Windows machine and see if I could use them on Azure Arc-enabled Linux machine.

The technical overview documentation covers how Windows and Linux machines connect into Azure Arc and communicate. There are three processes that run to support Arc connectivity which include the Azure Hybrid Instance Metadata Service (HIMDS), Guest Configuration Arc Service, and Guest Configuration Extension Service. The service I was most interested in was the HIDMS service which is in charge of authenticating and obtaining access tokens from Azure.

To better understand the service, I first referenced the service in the Services MMC (Microsoft Management Console). This gave me the path to the executable. Poking around the source directory showed some dsc logs and the location of the processes involved with Azure Arc but nothing too interesting.

My next stop was the configuration information. As documented in the technical overview documentation, the configuration data is stored in a json file in the %ProgramData%\AzureConnectedMachineAgent\Config directory. The key pieces of information we can extract from this file are the tenant id, subscription id, certificate thumbprint the service principal is using and most importantly the client id. That’s a start!

{"subscriptionId":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX","resourceGroup":"rg-onpremlab","resourceName":"SERVERWSQL01","tenantId":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX","location":"eastus2","vmId":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX","vmUuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX","certificateThumbprint":"be90bae2484ff6d495XXXXXXXXXXXXX","clientId":"XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX","cloud":"AzureCloud","privateLinkScope":"","namespace":"","correlationId":"XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX"}

I then wondered what else might be stored through this directory hierarchy. Popping up a directory, I came saw the Certs subdirectory. Could it really be this easy? Would this directory contain the private-key certificate? Navigating into the directory does indeed show a certificate file. However, attempting to open the certificate up using the Crypto Shell Extensions resulted in an error stating it’s not a valid certificate. No big deal, maybe the extension is wrong right? I then took the file and ran it through a few Python scripts I have to enumerate certificate data, but unfortunately no certificate format I tried worked. After a quick discussion with my good friend Armen Kaleshian, we both came to the conclusion that the certificate is more than likely wrapped with some type of symmetric encryption.

At this point I decided to take a step back and observe the HIMDS process to ensure this file is being used by the process, and if so, whether I could figure where the key that was used to wrap the certificate was stored. For that I used Process Monitor (the old tools are still the best!). After restarting the HIMDS service and sifting through the capture information, I found the mentions of the process reading the certificate file after reading from the agent config. In the past I’ve seen applications store the symmetric key used to wrap information like this in the registry, but I could not find any obvious calls to the registry indicating this. After a further conversation with Armen, we theorized the symmetric key might be stored in the himds.exe executable itself which would mean every Azure Arc instance would be using the same key to wrap the certificate. This would mean all one would have to do is move the certificate and configuration file to a new machine and one would be able to use the credentials to impersonate the machine’s managed identity (more on that later).

At this point I was fairly certain I had answered question number 2, but I wanted to explore question number 3 a bit more. For this I leveraged the PowerShell code sample provided in the documentation. Breaking down the code, I observed a web request is made to the IDMS endpoint. From the response from the IDMS endpoint, a returned header is extracted which contains the directory name a file is stored in. The contents of this file are extracted and then included as an authorization header using basic authentication. The resulting response from the IDMS contains the access token used to communicate with the relevant Microsoft cloud API.

$apiVersion = "2020-06-01"
$resource = "https://management.azure.com/"
$endpoint = "{0}?resource={1}&api-version={2}" -f $env:IDENTITY_ENDPOINT,$resource,$apiVersion
$secretFile = ""
try
{
    Invoke-WebRequest -Method GET -Uri $endpoint -Headers @{Metadata='True'} -UseBasicParsing
}
catch
{
    $wwwAuthHeader = $_.Exception.Response.Headers["WWW-Authenticate"]
    if ($wwwAuthHeader -match "Basic realm=.+")
    {
        $secretFile = ($wwwAuthHeader -split "Basic realm=")[1]
    }
}
Write-Host "Secret file path: " $secretFile`n
$secret = cat -Raw $secretFile
$response = Invoke-WebRequest -Method GET -Uri $endpoint -Headers @{Metadata='True'; Authorization="Basic $secret"} -UseBasicParsing
if ($response)
{
    $token = (ConvertFrom-Json -InputObject $response.Content).access_token
    Write-Host "Access token: " $token
}

By default the security principal doesn’t have any permissions in the ARM (Azure Resource Manager) API, so I granted it reader rights on the resource group and wrote some simple Python code to query for a list of resources in the in a resource group. I was able to successfully obtain the access token and got back a list of resources.

# Import standard libraries
import os
import sys
import logging
import requests
import json

# Create a logging mechanism
def enable_logging():
    stdout_handler = logging.StreamHandler(sys.stdout)
    handlers = [stdout_handler]
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        handlers = handlers
    )

def get_access_token(resource, api_version="2020-06-01"):
    logging.debug('Attempting to obtain access token...')
    try:
        response = requests.get(
            url = os.getenv('IDENTITY_ENDPOINT'),
            params = {
                'api-version': api_version,
                'resource': resource
            },
            headers = {
                'Metadata': 'True'
            }
        )
        
        secret_file = response.headers['Www-Authenticate'].split('Basic realm=')[1]
        with open(secret_file, 'r') as file:
            key = file.read()
        
        response = requests.get(
            url = os.getenv('IDENTITY_ENDPOINT'),
            params = {
                'api-version': api_version,
                'resource': resource
            },
            headers = {
                'Metadata': 'True',
                'Authorization': f"Basic {key}"
            }
        )
        logging.debug("Access token obtain successfully")
        return json.loads(response.text)['access_token']

    except Exception:
        logging.error('Unable to obtain access token. Error was: ', exc_info=True)

def main():
    try:

        # Setup variables
        API_VERSION = '2021-04-01'

        # Enable logging
        enable_logging()

        # Obtain a credential from the system-assigned managed identity
        token = get_access_token(resource='https://management.azure.com/')

        # Setup query parameters
        params = {
            'api-version': API_VERSION
        }

        # Setup header
        header = {
            'Content-Type': 'application/json',
            'Authorization': f"Bearer {token}"
        }

        # Make call to ARM API
        response = requests.get(url='https://management.azure.com/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rg-onpremlab/resources',headers=header, params=params)
        print(response.text)

    except Exception:
        logging.error('Execution error: ', exc_info=True)

if __name__ == "__main__":
    main()

This indicated to me that the additional challenge introduced with the IDMS running on Arc requires the process to obtain a secret generated by the IDMS that is placed in the %PROGRAMDATA\AzureConnectedMachineAgent\Tokens directory. This directory is locked down for access to the security principal the himds service runs as, SYSTEM, the Administrators group, and most importantly the Hybrid agent extension applications group. After the secret is obtain from the file in this directory, it is used as a secret to perform basic authentication with the IDMS service to obtain the access token. As the official documentation states, this is the group you would need to add the security principal running any processes you wanted to use the local IDMS. Question 1 had now been answered.

That left me with question 3. Based on what I learned so far, an attacker who compromised the machine or a security principal with sufficient permissions on the machine to access the %PROGRAMDATA%\AzureConnectedMachineAgent\ subdirectories could obtain the configuration of the agent and the encrypted certificate. Compromise could also come in the form of compromise of a process that was running as a security principal which was a member of the Hybrid agent extension applications group. Even with the data collected from the agentconfig.json file and the certificate file, the attacker would still be short of the symmetric key used to unwrap the certificate to make it available for use.

Here I took a gamble and tested the theory Armen and I had that the symmetric key was stored in the himds executable. I copied the agentconfig.json file and certificate file and move them to another machine. To eliminate the possibility of the configuration and credentials file being specific to the Windows operating system, I spun up an Ubuntu VM in my Hyper-V cluster. I then installed and configured the agent with a completely separate Azure AD tenant.

Once that was complete, I moved the agentconfig.json and myCert file from the existing Windows machine to the new Ubuntu VM and replaced the existing files. Re-running the same Python script (substituting out the variables for the actual URIs listed in the technical overview) I received back the same listing of resources from the resource group. This proved that if you’re able to get access to both the agentconfig.json and myCert files on an Azure Arc-enabled machine, you can move them to any other machine (regardless of operating system) and re-use them to impersonate source machine and exercise any permissions its been granted access.

Let’s sum up the findings:

The agentconfig.json file contains sensitive information including the tenant id, subscription id, and client id of the service principal associated with the Azure Arc-enabled machine.
The credential for the service principal is stored as a certificate file in the myCert file in the %PROGRAMDATA%\AzureConnectedMachineAgent\certs directory. This file is probably wrapped with some type of symmetric encryption.
The challenge the documentation speaks to involves the creation of a password by the HIMDS process which is then stored in the %PROGRAMDATA%\AzureConnectedMachineAgent\tokens directory. This directory is locked down to privileged users. The password contained in the file created by the HIMDS process is then passed back in a request to the IDMS service running on the machine using the basic HTTP authentication where it is validated and an access token is returned. This is used a method to prove the process requesting the token is privileged.
The agentconfig.json and myCert file can be moved from one Arc-enabled machine to another to be used to impersonate the source machine. This provided further evidence that the symmetric key used to encrypt the certificate file (myCert) is hardcoded into the executable.

It should come as no surprise to you follow old folks that the resulting findings stress the importance traditional security controls. A few recommendations I’d make are as follows:

Least Privilege
- If you’re going to use the managed identity available to an Arc-enabled server for applications running on that machine, ensure you’re granting that managed identity only the permissions it requires and nothing beyond it.
- Limit the human and non-human actors who have administrator privileges on Azure Arc-enabled machines.
- Tightly control which security principals are members of the Hybrid agent extension applications group.
Logging and monitor
- Log managed identity sign-ins and monitor for suspicious activity. Until Conditional Access is extended to service principals, it can’t be leveraged to further restrict where these security principals can be used from.
- Log and monitor the security events on the Azure Arc-enabled servers
Patching and Updating
- Ensure your Azure Arc-enabled machines are patched and updated. You can leverage the Azure Automation update management capabilities if you don’t have an existing system to do this already.
Identity and Access Management
- Create a process to review the use case and security risks of applications that want to leverage this functionality.
- Perform access reviews to validate any additional permissions granted to these managed identities are still warranted, and if not, remove them.

Nothing in the above is new or fancy but all things you should be doing today. The biggest take away you need here is by extending the Azure management plane you get great new features but you also introduce a potential new risk where a compromised Azure Arc-enabled machine outside of Azure could be used to impact the security of your Azure implementation. If you’re using service principals today (or hell, if you’re simply having users access Azure from their desktops or laptops) you already have this risk, but it’s always worthwhile understanding the different vectors available to attackers.

At the end of the day I love this feature. It is leagues better than how most organizations are handling service principal usage outside of Azure today where the credentials are often hardcoded into code into code, dumped into a file in an unsecured directory, or never rotated for years. There are tradeoffs, but as I said earlier, none of the mitigations are things you shouldn’t already be doing today. Outside of this capability, there are a lot of benefits to Azure Arc and I’ll be very interested to see how Microsoft grows this offering and extends it beyond.

Thanks!

What If… Volume 3

Posted on January 6, 2021 by mattfeltonma

Welcome back fellow geeks! I hope you managed to have an enjoyable holiday and took a break from the grind. I took a good week off, and minus some reading around AKS (Azure Kubernetes Service), I completely shut off the work and tech side of my brain. It was a great change!

Today I am back with a new entry into my What If series. In this entry I’ll be covering an interesting quirk of ASR (Azure Site Recovery) that I ran into helping a customer test out the service. For those of you unfamiliar with ASR, it’s a managed service in Microsoft Azure that provides business continuity and disaster recovery for VMs (virtual machines) both within Azure and on-premises. It can also be used to when migrating VMs from on-premises to Azure, between regions, or between availability zones.

With the quick introduction to the service out of the way, let’s get to it.

What if I wanted to test Azure Site Recovery with both Windows and Linux virtual machines?

Over the holiday break I received an email from a customer who doing some validation testing for a planned migration for ADE (Azure Disk Encryption) to SSE (Server-Side Encryption) for Managed Disks using a CMK (Customer Managed Key). As part of the validation process, the customer wanted to understand how ASR would work after when using SSE with CMK instead of ADE. The customer was making the shift from ADE to SSE for Managed Disks with CMK for a few reasons including:

Performance benefits by shifting the encryption engine out of the operating system
No limitations on specific images for the virtual machine
No VM extensions required
Can be combined with host-based encryption for end to end encryption

The customer followed the instructions on how to set up ASR for SSE with CMK-enabled disks referenced here. Replication was successful but they noticed the data disk they had attached to the VM in the source region was not automatically attached to the VM in the destination region and required manual attachment. While an inconvenience for a single test, this could create a huge headache at scale when you’re talking about hundreds of VMs.

After receiving the email I immediately spun up a very simple environment in Azure in the East US 2 region consisting of a single virtual network, Windows VM with an attached data disk, Azure Key Vault instance with a single key, and a disk encryption set. In the Central US region I created a second virtual network, Azure Key Vault instance with a single key, and a disk encryption set. My plan was to fail the VM over from East US 2 to Central US using ASR.

As I went through the enablement process I ensured both the OS disk and the data disk were selected for replication as seen in the image below.

Reviewing the configuration shows that two managed disks are set to replicate.

After confirmation I left it alone and came back an hour later and checked the destination resource group. Replicas of both managed disks are present in the destination resource group. Good so far.

I then did a test failover and pulled up the VM and observed the same thing my customer did. The data disk was not attached even though it was replicated. I was able to manually attach it without an issue, but again, how does this work at scale? Even more interesting was the status of the VM in the Site Recovery section of the Recovery Services Vault. It did not show the data disk as being replicated.

I ran through the same process a few more times and ran into the same result each time. To make sure it wasn’t an issue with the information the portal was displaying, I wrote some quick Python code to hit the replicationprotecteditem endpoint within the ARM REST API. The results from the API also included only the OS disk in the replication status.

Was this a bug? Did both the customer and I mess something up in setting this up? Turns out it was neither. This is actually expected behavior when replicating a Windows VM when a data disk is attached that is uninitialized in the OS (operating system). For you young folks out there that have never initialized a disk in Microsoft Windows or those of you don’t spend much time in Windows, initialization consists of creating the partition table on the drive which must occur prior formatting the partition with a file system. So why is this required? I’m not really sure and can only theorize. A friend and I talked this over and we theorize it may be a requirement to ensure the disk has some unique identifier in the operating system which may not be able to be generated without the disk first being partitioned.

Note that this issue only occurs with Windows VMs with an uninitialized data disk. It does not occur if the disk has been initialized in Windows and does not occur at all with a Linux VM whether the disk has been partitioned or not. In those cases the data disk will be attached after the VM is failed over.

So there you go folks. If you decide to test out ASR for a proof-of-concept or just a learning experience, remember to initialize your disks on your Windows VMs!

See you next post!

Deep Dive into Azure AD Domain Services – Part 1

Posted on February 19, 2018 by mattfeltonma

Hi everyone. In this series of posts I’ll be doing a deep dive into Microsoft’s Azure AD Domain Services (AAD DS). AAD DS is Microsoft’s managed Windows Active Directory service offered in Microsoft Azure Infrastructure-as-a-Service intended to compete with similar offerings such as Amazon Web Services’s (AWS) Microsoft Active Directory. Microsoft’s solution differs from other offerings in that it sources its user and group information from Azure Active Directory versus an on-premises Windows Active Directory or LDAP.

Like its competitors Microsoft realizes there are still a lot of organizations out there who are still very much attached to legacy on-premises protocols such as NTLM, Kerberos, and LDAP. Not every organization (unfortunately) is ready or able to evolve its applications to consume SAML, Open ID Connect, OAuth, and Rest-Based APIs (yes COTS vendors I’m talking to you and your continued reliance on LDAP authentication in the year 2018). If the service has to be there, it makes sense to consume a managed service so staff can focus less on maintaining legacy technology like Windows Active Directory and focus more on a modern Identity-as-a-Service (IDaaS), Software-as-a-Service (SaaS), and Platform-as-a-Service (Paas) solutions.

Sounds great right? Sure, but how does it work? Microsoft’s documentation does a reasonable job giving the high level details of the service so I encourage you to read through it at some point. I won’t be covering information included in that documentation unless I notice a discrepancy or an area that could use more detail. Instead, I’m going to focus on the areas which I feel are important to understand if you’re going to attempt to consume the service in the same way you would a traditional on-premises Windows Active Directory.

With that introduction, let’s dig in.

The first thing I did was to install the Remote Server Administration Tools (RSAT) for Active Directory Domain Services and Group Policy Management tools. I used these tools to explore some of the configuration choices Microsoft made in the managed service. I also installed Microsoft Network Monitor 3.4 to review packet captures captured using the netsh.

After the tools were installed I started a persistent network capture using netsh using an elevated command prompt. This is an incredibly useful feature of Windows when you need to debug issues that occur prior or during user or system logon. I’ve used this for years to troubleshoot a number of Windows Active Directory issues including slow logons and failed logons. The only downfall of this is you’re forced into using Microsoft Network Monitor or Microsoft Message Analyzer to review the packet captures it creates. While Microsoft Message Analyzer is a sleek tool, the resources required to run it effectively are typically a non-starter for a lab or traditional work laptop so I tend to use Network Monitor.

After the packet capture was started I went through the standard process of joining the machine to the domain and rebooting the computer. After reboot, I logged in an account in the AAD DC Administrators Azure Active Directory group, started an elevated command prompt as the VM’s local administrator and stopped the packet capture. This provided me with a capture of the domain join, initial computer authentication, and initial user authentication.

While I know you’re as eager to dig into the packet capture as I am, I’ll cover that in a future post. Instead I decided to break out the RSAT tools and poke around at configuration choices an administrator would normally make when building out a Windows Active Directory domain.

Let’s first open the tool everyone who touches Windows Active Directory is familiar with, Active Directory Users and Computers (ADUC). The data layout (with Advanced Features option on) for organizational units (OUs) and containers looks very similar to what we’re used to seeing with the exception of the AADC Computers, AADC Users, AADDSDomainAdmin OUs, and AADDSDomainConfig container. I’ll get into these containers in a minute.

If we right-click the domain node and go to properties we see that the domain and forest are running in Windows Server 2012 R2 domain and forest functional level with no trusts defined. Examining the operating system tab of the two domain controllers in the Domain Controllers OU shows that both boxes run Windows Server 2012 R2. Interesting that Microsoft chose not to use Windows Server 2016.

Navigating to the Security tab and clicking the Advanced button shows that the AAD DC Administrator group has only been granted the Create Organizational Unit objects permission while the AAD DC Service Accounts group has been granted Replicating Directory Changes. As you can see from these permissions the base of the directory tree is very locked down.

Let me circle back to the OUs and Containers I talked about above. The AADDC Computers and AADDC Users OUs are the default OUs Microsoft creates for you. Newly joined machines are added to the AADDC Computers OU and users synchronized from the Azure AD tenant are placed in the AADDC Users OU. As we saw from the permissions above, we could use an account in the AAD DC Administrators group to create additional OUs under the domain node to delegate control to another set of more restricted admins, for the purposes of controlling GPOs if security filtering doesn’t meet our requirements, or for creating additional service accounts or groups for the workloads we deploy in the environment. The permissions within the default OUs are very limited. In the AADDC Computer OU GPOs can be applied and computer objects can be added and removed. In the AADC Users OU only GPOs can be applied which makes sense considering the user and group objects stored there are sourced from your authoritative Azure AD tenant.

The AADDSDomainAdmin OU contains a single security group named AADDS Service Administrators Group (pre-Windows 2000 name of AADDSDomAdmGroup). The group contains a single member names dcaasadmin which is the renamed built-in Active Directory administrator account. The group is nested into a number of highly privileged built-in Active Directory groups including Administrators, Domain Admins, Domain Users, Enterprise Admins, and Schema Admins. I’m very uncomfortable with Microsoft’s choice to make a “god” group and even a “god” user of the built-in administrator. This directly conflicts with security best practices for Active Directory which would see no account being a permanent member of these highly privileged groups or at the least divvying up the privileges among separate security principals. I would have liked to see Microsoft leverage a Red Forest Red Forest design here. Hopefully we’ll see some improvements as the service matures. I’m unsure as to the purpose of the OU and this group at this time.

The AADDSDomainConfig container contains a single container object named SchemaUpdate. I reviewed the attributes of both containers hoping to glean some idea of the purpose of the containers and the only thing I saw of notice was the revision attribute was set to 2. Maybe Microsoft is tracking the schema of their standard managed domain image via this attribute? In a future post in this series I’ll do a comparison of this managed domain’s schema with a fresh Windows Server 2012 R2 schema.

Opening Active Directory Sites and Services shows that Microsoft has chosen to leave the domain with a single site. This design choice makes sense given that a limitation of AAD DS is that it can only serve a single region. If that limitation is ever lifted, Microsoft will need to revisit this choice and perhaps include a site for each region. Expanding the Default-First-Site-Name site and the Servers node shows the two domain controllers Microsoft is using to provide the Windows Active Directory service to the VNet.

So the layout is simple, what about the group policy objects (GPO)? Opening up the Group Policy Management Console displays five GPOs which are included in every managed domain.

The AADDC Users GPO is empty of settings while the AADC Computers GPO has a single Preference defined that adds the AAD DC Administrators group to the built-in Administrators group on any member servers added to the OU. The Default Domain Controllers Policy (DDCP) GPO is your standard out of the box DDCP with nothing special set. The Default Domain Policy (DDP) GPO on the other hand has a number of settings applied. The password policy is interesting… I get that you have the option to source all the user accounts within your AAD DS domain from Azure AD, but Microsoft is still giving you the ability to create user accounts in the managed domain as I covered above which makes me uncomfortable with the default password policy. Microsoft hasn’t delegated the ability to create Fine Grained Password Policies (FGPPs) either, which means you’re stuck with this very lax password policy. Given the lack of technical enforcement, I’d recommend avoiding creating user accounts directly in the managed domain for any purpose until Microsoft delegates the ability to create FGPPs. The remaining settings in the policy are standard out of the box DDP.

The GPO named Event Log GPO is linked to the Domain Controllers OU and executes a startup script named EventLogRetentionPolicy.PS1. Being the nosy geek I am, I dug through SYSVOL to find the script. The script is very simple in that it sets each event log to overwrite events over 31 days old. It then verifies the results and prints the results to the console. Event logs are an interesting beast in AAD DS. An account in the AAD DC Administrators group doesn’t have the right to connect to the Event Logs on the DCs remotely and I haven’t come across any options to view those logs. I don’t see any mention of them in the Microsoft documentation, so my assumption is you don’t get access to them at this time. I have to imagine this is a show stopper for some organizations considering the critical importance of Domain Controller logs. If anyone knows how to access these logs, please let me know. I’d like to see Microsoft incorporate an option to send the logs to a syslog agent via a configuration option in the Azure AD Domain Services blade in the Azure Portal.

I’m going to stop here today. In my next post I’ll do some poking around by running a port scan against the managed domain controllers to see what network flows are open, enable LDAPS to see what the SSL/TLS landscape looks like, and examine authentication protocols and algorithms supported (NTLMv1,v2, Kerberos DES, etc). Thanks for reading!

Deep dive into AD FS and MS WAP – Overview

Posted on August 5, 2017 by mattfeltonma

Hi everyone,

If you’ve followed my blog at all, you will notice I spend a fair amount of my time writing about the products and technologies powering the integration of on-premises and cloud solutions. The industry refers to that integration using a variety of buzzwords from hybrid cloud to software defined data center/storage/networking/etc. I prefer a more simple definition of legacy solutions versus modern solutions.

So what do I mean by a modern solution? I’m speaking of solutions with the following most if not all of these characteristics:

Customer maintains only the layers of the technology that directly present business value
Short time to market for new features and features are introduced in a “toggle on and toggle off” manner
Supports modern authentication, authorization, and identity management standards and specifications such as Open ID Connect, OAuth, SAML, and SCIM
On-demand scaling
Provides a robust web-based API
Customer data can exist on-premises or off-premises

Since I love the identity realm, I’m going to focus on the bullet regarding modern authentication, authorization, and identity management. For this series of posts I’m going to look at how Microsoft’s Active Directory Federation Service (AD FS) and Microsoft’s Web Application Proxy (WAP) can be used to help facilitate the use of modern authentication and authorization.

So where does AD FS and the WAP come in? AD FS provides us with a security token service producing the logical security tokens used in SAML, OAuth, and Open ID Connect. Why do we care about the MS WAP? The WAP acts a reverse proxy giving us the ability to securely expose AD FS to untrusted networks (like the Internet) so that devices outside our traditional firewalled security boundary can leverage our modern authentication and authorization solution.

Some real life business cases that can be solved with this solution are:

Single sign-on (SSO) experience to a SaaS application such as SharePoint online from both an Active Directory domain-joined endpoint or a non-domain joined endpoint such as a mobile phone.
Limit the number of passwords a user needs to remember to access both internal and cloud applications.
Provide authentication or authorization for modernized internal applications for endpoints outside the traditional firewalled security boundary.
Authentication and authorization of devices prior to accessing an internal or cloud application.

As we can see from the above, there are some great benefits around SSO, limiting user credentials to improve security and user experience, and taking our authorization to the next step by doing contextual-based authorization (device information, user location, etc) versus relying upon just Active Directory group.

Microsoft does a relatively decent job describing how to design and implement your AD FS and WAP rollout, so I’m not going to cover much of that in this series. Instead I’m going to focus on the “behind the scenes” conversations that occur with endpoints, WAP, AD FS, AD DS, and Azure AD. Before I begin delving into the weeds of the product, I’m going to spend this post giving an overview of what my lab looks like.

I recently put together a more permanent lab consisting of a mixture of on-premise VMs running on HyperV and Azure resources. I manage to stay well within my $150.00 MSDN balance by keeping a majority of the VMs deallocated. The layout of the lab is diagramed below.

HomeLab

On-premises I am running a small collection of Windows Server 2016 machines within HyperV running on top of Windows Server 2016. I’m using a standard setup of an AD DS, AD CS, AADC, AD FS, and IIS/MS SQL server. Running in Azure I have a single VNet with three subnets each separated by a network security group. My core infrastructure of an AD DS, IIS/MS SQL, and AD FS server exist in my Intranet subnet with my DMZ subnet containing a single WAP.

The Active Directory configuration consists of a single Active Directory forest with an FQDN of journeyofthegeek.local. The domain has been configured with an explicit UPN of journeyofthegeek.com which is assigned as the UPN suffix for all users synchronized to Azure Active Directory. The domain is running in Windows Server 2016 domain and forest functional level. The on-premises domain controller holds all FSMO roles and acts as the DC for the Active Directory site representing the on-premises physical location. The domain controller in Azure acts as the sole DC for the Active Directory site representing Azure. Both DCs host the split-brain DNS zone for journeyofthegeek.com.

The on-premises domain controller also runs Active Directory Certificate Services. The CA is an enterprise CA that is used to distribute certificates to security principals in the environment. I’ve removed the CDP from the certificate templates issued by the CA to eliminate complications with the CRL revocation checking.

The AD FS servers are members of an AD FS farm named sts.journeyofthegeek.com and use a MS SQL Server 2016 backend for storage of configuration information. The SQL Server on-premises hosts the SQL instance that the AD FS users are using to store configuration information.

Azure Active Directory Connect is co-located on the AD FS server and uses the same SQL server as the AD FS uses. It has been integrated with a lab Azure Active Directory tenant I use which has a few licenses of Office 365 Business Essentials. The objectGUID attribute is used as the immutable ID and the Azure Active Directory tenant has the DNS namespaces of journeyofthegeek.onmicrosoft.com and journeyofthegeek.com associated with it.

The IIS server running in Azure runs a simple .NET application (https://blogs.technet.microsoft.com/tangent_thoughts/2015/02/20/install-and-configure-a-simple-net-4-5-sample-federated-application-samapp/) that is used for claims-based authentication. I’ll be using that application for demonstrations with the Web Application Proxy and have used it in the past to demonstrate functionality of the Azure Application Proxy.

For the demonstrations throughout these series I’ll be using the following tools:

Fiddler (http://www.telerik.com/fiddler)
PSTools (https://docs.microsoft.com/en-us/sysinternals/downloads/pstools)
ProcMon (https://docs.microsoft.com/en-us/sysinternals/downloads/procmon)
Process Explorer (https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer)
API Monitor (http://www.rohitab.com/apimonitor)
WireShark (https://www.wireshark.org/)
Microsoft Network Monitor (https://www.microsoft.com/en-us/download/details.aspx?id=4865)

In my next post I’ll do a deep dive into what happens behind the scenes during the registration of the Web Application Proxy with an AD FS farm. See you then!

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Tag Archives: windows