What If… Volume 3

Welcome back fellow geeks! I hope you managed to have an enjoyable holiday and took a break from the grind. I took a good week off, and minus some reading around AKS (Azure Kubernetes Service), I completely shut off the work and tech side of my brain. It was a great change!

Today I am back with a new entry into my What If series. In this entry I’ll be covering an interesting quirk of ASR (Azure Site Recovery) that I ran into helping a customer test out the service. For those of you unfamiliar with ASR, it’s a managed service in Microsoft Azure that provides business continuity and disaster recovery for VMs (virtual machines) both within Azure and on-premises. It can also be used to when migrating VMs from on-premises to Azure, between regions, or between availability zones.

With the quick introduction to the service out of the way, let’s get to it.

What if I wanted to test Azure Site Recovery with both Windows and Linux virtual machines?

Over the holiday break I received an email from a customer who doing some validation testing for a planned migration for ADE (Azure Disk Encryption) to SSE (Server-Side Encryption) for Managed Disks using a CMK (Customer Managed Key). As part of the validation process, the customer wanted to understand how ASR would work after when using SSE with CMK instead of ADE. The customer was making the shift from ADE to SSE for Managed Disks with CMK for a few reasons including:

  • Performance benefits by shifting the encryption engine out of the operating system
  • No limitations on specific images for the virtual machine
  • No VM extensions required
  • Can be combined with host-based encryption for end to end encryption

The customer followed the instructions on how to set up ASR for SSE with CMK-enabled disks referenced here. Replication was successful but they noticed the data disk they had attached to the VM in the source region was not automatically attached to the VM in the destination region and required manual attachment. While an inconvenience for a single test, this could create a huge headache at scale when you’re talking about hundreds of VMs.

After receiving the email I immediately spun up a very simple environment in Azure in the East US 2 region consisting of a single virtual network, Windows VM with an attached data disk, Azure Key Vault instance with a single key, and a disk encryption set. In the Central US region I created a second virtual network, Azure Key Vault instance with a single key, and a disk encryption set. My plan was to fail the VM over from East US 2 to Central US using ASR.

Lab environment

As I went through the enablement process I ensured both the OS disk and the data disk were selected for replication as seen in the image below.

ASR VM Disk Selection

Reviewing the configuration shows that two managed disks are set to replicate.

ASR Config

After confirmation I left it alone and came back an hour later and checked the destination resource group. Replicas of both managed disks are present in the destination resource group. Good so far.

Replica Disks

I then did a test failover and pulled up the VM and observed the same thing my customer did. The data disk was not attached even though it was replicated. I was able to manually attach it without an issue, but again, how does this work at scale? Even more interesting was the status of the VM in the Site Recovery section of the Recovery Services Vault. It did not show the data disk as being replicated.

Status of VM in Recovery Services Vault

I ran through the same process a few more times and ran into the same result each time. To make sure it wasn’t an issue with the information the portal was displaying, I wrote some quick Python code to hit the replicationprotecteditem endpoint within the ARM REST API. The results from the API also included only the OS disk in the replication status.

Was this a bug? Did both the customer and I mess something up in setting this up? Turns out it was neither. This is actually expected behavior when replicating a Windows VM when a data disk is attached that is uninitialized in the OS (operating system). For you young folks out there that have never initialized a disk in Microsoft Windows or those of you don’t spend much time in Windows, initialization consists of creating the partition table on the drive which must occur prior formatting the partition with a file system. So why is this required? I’m not really sure and can only theorize. A friend and I talked this over and we theorize it may be a requirement to ensure the disk has some unique identifier in the operating system which may not be able to be generated without the disk first being partitioned.

Note that this issue only occurs with Windows VMs with an uninitialized data disk. It does not occur if the disk has been initialized in Windows and does not occur at all with a Linux VM whether the disk has been partitioned or not. In those cases the data disk will be attached after the VM is failed over.

So there you go folks. If you decide to test out ASR for a proof-of-concept or just a learning experience, remember to initialize your disks on your Windows VMs!

See you next post!

Deep Dive into Azure AD Domain Services – Part 1

Deep Dive into Azure AD Domain Services  – Part 1

Hi everyone.  In this series of posts I’ll be doing a deep dive into Microsoft’s Azure AD Domain Services (AAD DS).  AAD DS is Microsoft’s managed Windows Active Directory service offered in Microsoft Azure Infrastructure-as-a-Service intended to compete with similar offerings such as Amazon Web Services’s (AWS) Microsoft Active Directory.  Microsoft’s solution differs from other offerings in that it sources its user and group information from Azure Active Directory versus an on-premises Windows Active Directory or LDAP.

Like its competitors Microsoft realizes there are still a lot of organizations out there who are still very much attached to legacy on-premises protocols such as NTLM, Kerberos, and LDAP.  Not every organization (unfortunately) is ready or able to evolve its applications to consume SAML, Open ID Connect, OAuth, and Rest-Based APIs (yes COTS vendors I’m talking to you and your continued reliance on LDAP authentication in the year 2018).  If the service has to be there, it makes sense to consume a managed service so staff can focus less on maintaining legacy technology like Windows Active Directory and focus more on a modern Identity-as-a-Service (IDaaS), Software-as-a-Service (SaaS), and Platform-as-a-Service (Paas) solutions.

Sounds great right?  Sure, but how does it work?  Microsoft’s documentation does a reasonable job giving the high level details of the service so I encourage you to read through it at some point.  I won’t be covering information included in that documentation unless I notice a discrepancy or an area that could use more detail.  Instead, I’m going to focus on the areas which I feel are important to understand if you’re going to attempt to consume the service in the same way you would a traditional on-premises Windows Active Directory.

With that introduction, let’s dig in.

The first thing I did was to install the Remote Server Administration Tools (RSAT) for Active Directory Domain Services and Group Policy Management tools.  I used these tools to explore some of the configuration choices Microsoft made in the managed service.  I also installed Microsoft Network Monitor 3.4 to review packet captures  captured using the netsh.

After the tools were installed I started a persistent network capture using netsh using an elevated command prompt.  This is an incredibly useful feature of Windows when you need to debug issues that occur prior or during user or system logon.  I’ve used this for years to troubleshoot a number of Windows Active Directory issues including slow logons and failed logons.  The only downfall of this is you’re forced into using Microsoft Network Monitor or Microsoft Message Analyzer to review the packet captures it creates.  While Microsoft Message Analyzer is a sleek tool, the resources required to run it effectively are typically a non-starter for a lab or traditional work laptop so I tend to use Network Monitor.

entry1pic1

After the packet capture was started I went through the standard process of joining the machine to the domain and rebooting the computer.  After reboot, I logged in an account in the AAD DC Administrators Azure Active Directory group, started an elevated command prompt as the VM’s local administrator and stopped the packet capture.  This provided me with a capture of the domain join, initial computer authentication, and initial user authentication.

entry1pic2

While I know you’re as eager to dig into the packet capture as I am, I’ll cover that in a future post.  Instead I decided to break out the RSAT tools and poke around at configuration choices an administrator would normally make when building out a Windows Active Directory domain.

Let’s first open the tool everyone who touches Windows Active Directory is familiar with, Active Directory Users and Computers (ADUC).  The data layout (with Advanced Features option on) for organizational units (OUs) and containers looks very similar to what we’re used to seeing with the exception of the AADC Computers, AADC Users, AADDSDomainAdmin OUs, and AADDSDomainConfig container.  I’ll get into these containers in a minute.

entry1pic3.png

If we right-click the domain node and go to properties we see that the domain and forest are running in Windows Server 2012 R2 domain and forest functional level with no trusts defined.  Examining the operating system tab of the two domain controllers in the Domain Controllers OU shows that both boxes run Windows Server 2012 R2.  Interesting that Microsoft chose not to use Windows Server 2016.

entry1pic4.png

Navigating to the Security tab and clicking the Advanced button shows that the AAD DC Administrator group has only been granted the Create Organizational Unit objects permission while the AAD DC Service Accounts group has been granted Replicating Directory Changes.  As you can see from these permissions the base of the directory tree is very locked down.

entry1pic5.png

Let me circle back to the OUs and Containers I talked about above.  The AADDC Computers and AADDC Users OUs are the default OUs Microsoft creates for you.  Newly joined machines are added to the AADDC Computers OU and users synchronized from the Azure AD tenant are placed in the AADDC Users OU.  As we saw from the permissions above, we could use an account in the AAD DC Administrators group to create additional OUs under the domain node to delegate control to another set of more restricted admins, for the purposes of controlling GPOs if security filtering doesn’t meet our requirements, or for creating additional service accounts or groups for the workloads we deploy in the environment.  The permissions within the default OUs are very limited.  In the AADDC Computer OU GPOs can be applied and computer objects can be added and removed.  In the AADC Users OU only GPOs can be applied which makes sense considering the user and group objects stored there are sourced from your authoritative Azure AD tenant.

The AADDSDomainAdmin OU contains a single security group named AADDS Service Administrators Group (pre-Windows 2000 name of AADDSDomAdmGroup).  The group contains a single member names dcaasadmin which is the renamed built-in Active Directory administrator account.  The group is nested into a number of highly privileged built-in Active Directory groups including Administrators, Domain Admins, Domain Users, Enterprise Admins, and Schema Admins.  I’m very uncomfortable with Microsoft’s choice to make a “god” group and even a “god” user of the built-in administrator.  This directly conflicts with security best practices for Active Directory which would see no account being a permanent member of these highly privileged groups or at the least divvying up the privileges among separate security principals.  I would have liked to see Microsoft leverage a Red Forest Red Forest  design here.  Hopefully we’ll see some improvements as the service matures.  I’m unsure as to the purpose of the OU and this group at this time.

entry1pic6

The AADDSDomainConfig container contains a single container object named SchemaUpdate.  I reviewed the attributes of both containers hoping to glean some idea of the purpose of the containers and the only thing I saw of notice was the revision attribute was set to 2.  Maybe Microsoft is tracking the schema of their standard managed domain image via this attribute?  In a future post in this series I’ll do a comparison of this managed domain’s schema with a fresh Windows Server 2012 R2 schema.

entry1pic7.png

Opening Active Directory Sites and Services shows that Microsoft has chosen to leave the domain with a single site.  This design choice makes sense given that a limitation of AAD DS is that it can only serve a single region.  If that limitation is ever lifted, Microsoft will need to revisit this choice and perhaps include a site for each region.   Expanding the Default-First-Site-Name site and the Servers node shows the two domain controllers Microsoft is using to provide the Windows Active Directory service to the VNet.

entry1pic8

So the layout is simple, what about the group policy objects (GPO)?  Opening up the Group Policy Management Console displays five GPOs which are included in every managed domain.

entry1pic9.png

The AADDC Users GPO is empty of settings while the AADC Computers GPO has a single Preference defined that adds the AAD DC Administrators group to the built-in Administrators group on any member servers added to the OU.  The Default Domain Controllers Policy (DDCP) GPO is your standard out of the box DDCP with nothing special set.  The Default Domain Policy (DDP) GPO on the other hand has a number of settings applied.  The password policy is interesting… I get that you have the option to source all the user accounts within your AAD DS domain from Azure AD, but Microsoft is still giving you the ability to create user accounts in the managed domain as I covered above which makes me uncomfortable with the default password policy.  Microsoft hasn’t delegated the ability to create Fine Grained Password Policies (FGPPs) either, which means you’re stuck with this very lax password policy.  Given the lack of technical enforcement, I’d recommend avoiding creating user accounts directly in the managed domain for any purpose until Microsoft delegates the ability to create FGPPs.  The remaining settings in the policy are standard out of the box DDP.

entry1pic10.png

The GPO named Event Log GPO is linked to the Domain Controllers OU and executes a startup script named EventLogRetentionPolicy.PS1.  Being the nosy geek I am, I dug through SYSVOL to find the script.  The script is very simple in that it sets each event log to overwrite events over 31 days old.  It then verifies the results and prints the results to the console.  Event logs are an interesting beast in AAD DS.  An account in the AAD DC Administrators group doesn’t have the right to connect to the Event Logs on the DCs remotely and I haven’t come across any options to view those logs.  I don’t see any mention of them in the Microsoft documentation, so my assumption is you don’t get access to them at this time.  I have to imagine this is a show stopper for some organizations considering the critical importance of Domain Controller logs.  If anyone knows how to access these logs, please let me know.  I’d like to see Microsoft incorporate an option to send the logs to a syslog agent via a configuration option in the Azure AD Domain Services blade in the Azure Portal.

I’m going to stop here today.  In my next post I’ll do some poking around by running a port scan against the managed domain controllers to see what network flows are open, enable LDAPS to see what the SSL/TLS landscape looks like, and examine authentication protocols and algorithms supported (NTLMv1,v2, Kerberos DES, etc).  Thanks for reading!

Deep dive into AD FS and MS WAP – Overview

Hi everyone,

If you’ve followed my blog at all, you will notice I spend a fair amount of my time writing about the products and technologies powering the integration of on-premises and cloud solutions.  The industry refers to that integration using a variety of buzzwords from hybrid cloud to software defined data center/storage/networking/etc.  I prefer a more simple definition of legacy solutions versus modern solutions.

So what do I mean by a modern solution?  I’m speaking of solutions with the following most if not all of these characteristics:

  • Customer maintains only the layers of the technology that directly present business value
  • Short time to market for new features and features are introduced in a “toggle on and toggle off” manner
  • Supports modern authentication, authorization, and identity management standards and specifications such as Open ID Connect, OAuth, SAML, and SCIM
  • On-demand scaling
  • Provides a robust web-based API
  • Customer data can exist on-premises or off-premises

Since I love the identity realm, I’m going to focus on the bullet regarding modern authentication, authorization, and identity management.  For this series of posts I’m going to look at how Microsoft’s Active Directory Federation Service (AD FS)  and Microsoft’s Web Application Proxy (WAP) can be used to help facilitate the use of modern authentication and authorization.

So where does AD FS and the WAP come in?  AD FS provides us with a security token service producing the logical security tokens used in SAML, OAuth, and Open ID Connect.  Why do we care about the MS WAP?  The WAP acts a reverse proxy giving us the ability to securely expose AD FS to untrusted networks (like the Internet) so that devices outside our traditional firewalled security boundary can leverage our modern authentication and authorization solution.

Some real life business cases that can be solved with this solution are:

  1. Single sign-on (SSO) experience to a SaaS application such as SharePoint online from both an Active Directory domain-joined endpoint or a non-domain joined endpoint such as a mobile phone.
  2. Limit the number of passwords a user needs to remember to access both internal and cloud applications.
  3. Provide authentication or authorization for modernized internal applications for endpoints outside the traditional firewalled security boundary.
  4. Authentication and authorization of devices prior to accessing an internal or cloud application.

As we can see from the above, there are some great benefits around SSO, limiting user credentials to improve security and user experience, and taking our authorization to the next step by doing contextual-based authorization (device information, user location, etc) versus relying upon just Active Directory group.

Microsoft does a relatively decent job describing how to design and implement your AD FS and WAP rollout, so I’m not going to cover much of that in this series.  Instead I’m going to focus on the “behind the scenes” conversations that occur with endpoints, WAP, AD FS, AD DS, and Azure AD. Before I begin delving into the weeds of the product, I’m going to spend this post giving an overview of what my lab looks like.

I recently put together a more permanent lab consisting of a mixture of on-premise VMs running on HyperV and Azure resources.  I manage to stay well within my $150.00 MSDN balance by keeping a majority of the VMs deallocated.   The layout of the lab is diagramed below.

HomeLab

 

On-premises I am running a small collection of Windows Server 2016 machines within HyperV running on top of Windows Server 2016.  I’m using a standard setup of an AD DS, AD CS, AADC, AD FS, and IIS/MS SQL server.  Running in Azure I have a single VNet with three subnets each separated by a network security group.  My core infrastructure of an AD DS, IIS/MS SQL, and AD FS server exist in my Intranet subnet with my DMZ subnet containing a single WAP.

The Active Directory configuration consists of a single Active Directory forest with an FQDN of journeyofthegeek.local.  The domain has been configured with an explicit UPN of journeyofthegeek.com which is assigned as the UPN suffix for all users synchronized to Azure Active Directory.  The domain is running in Windows Server 2016 domain and forest functional level.  The on-premises domain controller holds all FSMO roles and acts as the DC for the Active Directory site representing the on-premises physical location.  The domain controller in Azure acts as the sole DC for the Active Directory site representing Azure.  Both DCs host the split-brain DNS zone for journeyofthegeek.com.

The on-premises domain controller also runs Active Directory Certificate Services.  The CA is an enterprise CA that is used to distribute certificates to security principals in the environment.  I’ve removed the CDP from the certificate templates issued by the CA to eliminate complications with the CRL revocation checking.

The AD FS servers are members of an AD FS farm named sts.journeyofthegeek.com and use a MS SQL Server 2016 backend for storage of configuration information.  The SQL Server on-premises hosts the SQL instance that the AD FS users are using to store configuration information.

Azure Active Directory Connect is co-located on the AD FS server and uses the same SQL server as the AD FS uses.  It has been integrated with a lab Azure Active Directory tenant I use which has a few licenses of Office 365 Business Essentials.  The objectGUID attribute is used as the immutable ID and the Azure Active Directory tenant has the DNS namespaces of journeyofthegeek.onmicrosoft.com and journeyofthegeek.com associated with it.

The IIS server running in Azure runs a simple .NET application (https://blogs.technet.microsoft.com/tangent_thoughts/2015/02/20/install-and-configure-a-simple-net-4-5-sample-federated-application-samapp/) that is used for claims-based authentication.  I’ll be using that application for demonstrations with the Web Application Proxy and have used it in the past to demonstrate functionality of the Azure Application Proxy.

For the demonstrations throughout these series I’ll be using the following tools:

In my next post I’ll do a deep dive into what happens behind the scenes during the registration of the Web Application Proxy with an AD FS farm.  See you then!

 

Helpful hints for resolving AD FS problems – Part 1

Hi everyone.

Over the past week I’ve been building a lab for an upcoming deep dive into Microsoft’s Web Application Proxy.  During the course of building the lab I ran into a few interesting issues with AD FS and the Web Application Proxy that I wanted to cover.  Some were similar to issues I’ve run into in production environments and some were new to me.

These issues are interesting in that there aren’t any obvious indicators of the problem in any of the typical logs.  Two out of three required some trial and error to determine root cause, while the third drove me quite insane for a good two weeks before getting an answer from an “official” source.  Over the course of this series of blogs I’ll cover each issue in detail with the hopes that it will help others troubleshoot these issues in the future.

Issue 1: AD FS Certificate authentication fails

I’m going to start with the problem that took me the longest to resolve and eventually required getting the answer directly from an official source.

For those of you that are unfamiliar, AD FS provides the capability to offer multi-factor authentication methods both native and third-party.  Out of the box, it supports certificate-based authentication as an option for a multi-factor or “step-up” authentication mechanism.

A few months back I wanted to take advantage of the certificate authentication feature to provide a two-factor authentication solution for applications integrated with AD FS.  Like a good engineer I did my Googling, read the Microsoft articles and various blogs out there to understand how the feature worked and what the requirements were.  I built a lab in Azure, setup an AD FS server, and ensured port 49443 was open in addition to the the typical ports required by AD FS.  I created my instance of AD CS, issued a user certificate containing the user’s UPN in the subject alternate name field, and setup a sample SAML app and configured it to require Certificate authentication.

How easy it all sounds right?  I navigated to the sample application and got the screen below…

Screen Shot 2017-06-04 at 9.29.35 PM

and I waited….  and waited…. and waited…  Ummm, what went wrong?  Well surely the AD FS log will tell me what happened.

Screen Shot 2017-06-04 at 9.34.03 PM.png

Well isn’t that odd.  No errors or warnings in the AD FS Admin log.  A quick check of the Application and System logs showed no errors either.  Maybe the AD FS Debug log would show me something?  I flipped on the log and attempted another authentication.

Screen Shot 2017-06-04 at 9.38.07 PM

Nothing as well?  Maybe the server can’t query the revocation lists designated in the certificates CDP?  Nope, not that either the server can successfully contact the CDP endpoints.  At this point I began to get quite frustrated and attempted packet captures, Fiddler captures, and anything and everything I could think of.  Nothing I tried revealed the answer.

I finally gave in (which I can tell you is incredibly challenging for me) and reached out to an “official” source.  We chatted back and forth and went through much of the same steps as outlined above to ensure I didn’t miss anything.  However, we ran into another dead end.  He then reached out to some other engineers he knew and eventually we got a hit.  We were told to check to see if there were any intermediary certificates stored within the trusted root certificate authorities store.  Sounds like an odd circumstance, but sure why not.

Upon opening up the certificates MMC, opening the machine store, and exploring the trusted root certificate authorities store low and behold I see an intermediary certificate within the store.  I deleted the certificate, restarted the AD FS server and attempted another login to the sample claim application and hit the screen below.

Screen Shot 2017-06-04 at 9.50.16 PM

Boom, I’m finally receiving the certificate prompt.  Clicking the OK button brings about the successful login below.

Screen Shot 2017-06-04 at 9.51.23 PM

So what was the issue?  Apparently AD FS certificate authentication fails without generating an error in any logical location (maybe nowhere at all?) if there is an intermediary certificate in the trusted root certificate authority machine store.  I’ve verified this is an issue in both AD FS 2012 R2 and AD FS 2016.  Now why this occurs is unknown to me.  It could be the underlining HTTPS.SYS driver that pukes and doesn’t report any errors to the event logs.  I didn’t get a straight answer as to why this occurs, just that it will due to some type of integrity check on the machine certificate store.  Odd right?

That completes the rundown of the first of three problems I’ll be outlining in this series of blogs.  Hopefully this helps save someone else some time and aggravation.

See you next post!