Deep Dive into Azure AD and AWS SSO Integration – Part 4

Deep Dive into Azure AD and AWS SSO Integration – Part 4

Today we continue exploring the new integration between Microsoft’s Azure AD (Azure Active Directory) and AWS (Amazon Web Services) SSO (Single Sign-On).  Over the past three posts I’ve covered the high level concepts of both platforms, the challenges the integration seeks to solve, and how to enable the federated trust which facilitates the single sign-on experience.  If you haven’t read through those posts, I recommend you before you dive into this one.  In this post I’ll be covering the neatest feature of the new integration, which is the support for automated provisioning.

If you’ve ever worked in the identity realm before, you know the pains that come with managing the life cycle of an identity from initial provisioning, changes resulting to the identity such as department and position changes, to the often forgotten stage of de-provisioning.  On-premises these problems were used solved by cobbled together scripts or complex identity management solution such as SailPoint Identity IQ or Microsoft Identity Manager.  While these tools were challenging to implement and operate, they did their job in the world of Windows Active Directory, LDAP, SQL databases and the like.

Then came cloud, and all bets were off.  Identity data stores skyrocketed from less than a hundred to hundreds and sometimes thousands (B2C has exploded far beyond event that).  Each new cloud service introduced into the enterprise introduced yet another identity management challenge.  While some of these offerings have APIs that support identity management operations, most do not, and those that do are proprietary in nature.  Writing custom code to each of the APIs is a huge challenge that most enterprises can’t keep up.  The result is often manual management of an identity life cycle, through uploading exported CSV files or some poor soul pointing and clicking a thousand times in a vendor portal.

Wouldn’t it be great if there was some mythical standard out that would help to solve this problem, use a standard API through REST, and support the JSON format?  Turns out there is and that standard is SCIM (System for Cross-domain Identity Management).  You may be surprised to know the standard has been around for a while now (technically 2011).  I recall hearing about it at a Gartner conference many many hears ago.  Unfortunately, it’s taken a long time to catch on but support is steadily increasing.

Thankfully for us, Microsoft has baked support into Azure AD and AWS recognized the value and took advantage of the feature.  By doing this, the identity life cycle challenges of managing an Azure AD and AWS integration has been heavily re-mediated and our lives made easier.

Azure AD Provisioning - Example

Azure AD Provisioning – Example

Let’s take a look at how set it up, shall we?

The first place you’ll need to go is into the AWS account which is the master for the organization and into the AWS SSO Settings.  In Settings you’ll see the provisioning option which is initially set as manual.  Select to enable automatic provisioning.

AWS SSO Settings - Provisioning

AWS SSO Settings – Provisioning

Once complete, a SCIM endpoint will be created.  This is the endpoint in AWS (referred to as the SCIM service provider in the SCIM standard) that the SCIM service on Azure AD (referred to as the client in the SCIM standard) will interact with to search for, create, modify, and delete AWS users and groups.  To interact with this endpoint, Azure AD must authenticate to it, which it does with a bearer access token that is issued by AWS SSO.  Be aware that the access token has a one year life span, so ensure you set some type of reminder.  A quick search through the boto3 API doesn’t show a way to query for issued access tokens (yes you can issue more than one at at time) so you won’t be able to automate the process as of yet.

awssso-scimendpoint.png

After SCIM is enabled, AWS SSO Settings for provisioning now reports SCIM in use.

awssso-scimenabled.png

Next you’ll need to bounce over to Azure AD and go into the enterprise app you created (refer to my third post for this process).   There you’ll navigate to the Provisioning blade and select Automatic as the provisioning method.

azuread-scimprov.png

You’ll then need to configure the URL and access token you collected from AWS and test the connection.  This will cause Azure AD to test querying the endpoint for a random user and group to validate functionality.

azuread-scimtest.png

If your test is successful you can then save the settings.

azuread-scimtestsucccess.PNG

You’re not done yet.  Next you have to configure a mapping which map attributes in Azure AD to the resource and attributes in the SCIM schema.  Yes folks, SCIM does have a schema for attributes and resources (like users and groups).  You can extend it as needed, but in this integration it looks to be using the default user and group resources.

azuread-scimmappings

Let’s take a look at what the group mappings look like.

azuread-scimgroupmappings.PNG

The attribute names on the left are the names of the attributes in Azure AD and the attributes on the right are the names of the attributes Azure AD will write the values of the attributes to in AWS SSO.  Nothing too surprising here.

How about the user mappings?

azuread-scimusermappings1azuread-scimusermappings2

Lots more attributes in the user mappings by default.  Now I’m not sure how many of these attributes AWS SSO supports.  According to the SCIM standard, a client can attempt to write whatever it wants and any attributes the service provider doesn’t understand is simply discarded.  The best list of attributes I could find were located here, and it’s not near this number.  I can’t speak to what the minimum required attributes are to make AWS work, because their official instructions on this integration doesn’t say.  I know some of the product team sometimes reads the blog, so maybe we’ll luck out and someone will respond with that answer.

The one tweak you’ll need to make here is to delete the mailNickName mapping and replace it with a mapping of objectId to externalId.  After you make the change, click the save icon.

I don’t know why AWS requires this so I can only theorize.  Maybe they’re using this attribute as a primary key in the back end database or perhaps they’re using it to map the users to the groups?  I’m not sure how Azure AD is writing the members attribute over to AWS.  Maybe in the future I’ll throw together a basic app to visualize what the service provider end looks like.

newmapping.PNG

Now you need to decide what users and groups you want to sync to AWS SSO.  Towards the bottom of the provisioning blade, you’ll see the option to toggle the provisioning status.  The scope drop down box has an option to sync all users and groups or to sync only assigned users and groups.  Best practice here is basic security, only sync what you need to sync, so leave the option on sync only assigned users and group.

The assigned users and groups refers to users that have been assigned to the enterprise application in Azure AD.  This is configured on the Users and Groups blade for the enterprise app.  I tested a few different scenarios using an Azure AD dynamic group, standard group, and a group synchronized from Windows AD.  All worked successfully and synchronized the relevant users over.

Once you’re happy with your settings, toggle the provisioning status and save the changes.  It may take some time depending on how much you’re syncing.

syncsuccess.PNG

If the sync is successful, you’ll be able to hop back over to AWS SSO and you’ll see your users and groups.

awssyncedusersawssyncedgroup

Microsoft’s official documentation does a great job explaining the end to end cycle.  The short of it is there’s an initial cycle which grabs all users and groups from Azure AD, then filters the list down to the users and groups assigned to the application.  From there it queries the target system to match the user with the matching attribute and if it isn’t found creates it, and if found and needs updating, updates it.

Incremental cycles are down from that point forward every 40 minutes.  I couldn’t find any documentation on how to adjust the synchronization frequency.  Be aware of that 40 minute sync and consider the end to end synchronization if you’re sourcing from Windows Active Directory.  In that case making changes in Windows AD could take just over an hour (assuming you’re using the 30 minute sync interval in Azure AD Connect) to fully synchronize.

awsssotime.PNG

As I described in my third post, I have a lab environment setup where a Windows Active Directory domain is syncing to Azure AD.  I used that environment to play out a few scenarios.

In the first scenario I disabled Marge Simpson’s account.  After waiting some time for changes to synchronize across both platforms, I saw in AWS SSO that Marge Simpson was now disabled.

margedisabled.PNG

For another scenario, I removed Barney Gumble from the Network Operators Active Directory group.  After waiting time for the sync to complete, the Network Operators group is now empty reflecting Barney’s removal from the group.

networkoperators.PNG

Recall that I assigned four groups to the app in Azure AD, Network Operators, Security Admins, Security Auditors, and Systems Operators.  These are the four groups syncing to AWS SSO.  Barney Gumble was only a member of the Network Operators group, which means removing him put him out of scope for the app assignment.  In AWS SSO, he now reports as being disabled.

barneydisabled.PNG

For our final scenario, let’s look at what happens when I deleted Barney Gumble from Windows Active Directory.  After waiting the required replication time, Barney Gumble’s user account was still present in AWS SSO, but set as disabled.  While Barney wouldn’t be able to login to AWS SSO, there would still be cleanup that would need to happen on the AWS SSO directory to remove the stale identity records.

barneydisabled.PNG

The last thing I want to cover is the logging capabilities of the SCIM service in Azure AD.  There are two separate logs you can reference.  The first are the Provisioning Logs which are currently in preview.  These logs are going to be your go to to troubleshoot issues with the provisioning process.  They’re available with an Azure AD P1 or above license and are kept for 30 days.  Supposedly they’re kept for free for 7 days, but the documentation isn’t clear whether or not you have the ability to consume them.  I also couldn’t find any documentation on if it’s possible to pull the logs from an API for longer term retention or analysis in Log Analytics or a 3rd party logging solution.

If you’ve ever used Azure AD, you’ll be familiar with the second source of logs.  In the Azure AD Audit logs, you get additional information, which while useful, is more catered to tracking the process vs troubleshooting the process like the provisioning logs.

Before I wrap up, let’s cover a few key findings:

  • The access token used to access the SCIM endpoint in AWS SSO has a one year lifetime.  There doesn’t seem to be a way to query what tokens have been issued by AWS SSO at this time, so you’ll need to manage the life cycle in another manner until the capability is introduced.
  • Users that are removed from the scope of the sync, either by unassigning them from the app or deleting their user object, become disabled in AWS SSO.  The records will need to be cleaned up via another process.
  • If synchronizing changes from a Windows AD the end to end synchronization process can take over an hour (30 minutes from Windows AD to Azure AD and 40 minutes from Azure AD to AWS SSO).

That will wrap up this post.  In my opinion the SCIM service available in Azure AD is extremely under utilized.  SCIM is a great specification that needs more love.  While there is a growing adoption from large enterprise software vendors, there is a real opportunity for your organization to take advantage of the features it offers in the same way AWS has.  It can greatly ease the pain your customers and enterprise users experience having to manage the life cycle of an identity and makes for a nice belt and suspenders to modern identity capabilities in an application.

In the last post of my series I’ll demonstrate a few scenarios showing how simple the end to end experience is for users.  I’ll include some examples of how you can incorporate some of the advanced security features of Azure AD to help protect your multi-cloud experience.

See you next post!

 

Deep Dive into Azure AD and AWS SSO Integration – Part 3

Deep Dive into Azure AD and AWS SSO Integration – Part 3

Back for more are you?

Over the past few posts I’ve been covering the new integration between Azure AD and AWS SSO.  The first post covered high level concepts of both platforms and some of the problems with the initial integration which used the AWS app in the Azure Marketplace.  In the second post I provided a deep dive into the traditional integration with AWS using a non-Azure AD security token service like AD FS (Active Directory Federation Services), what the challenges were, how the new integration between Azure AD and AWS SSO addresses those challenges, and the components that make up both the traditional and the new solution.  If you haven’t read the prior posts, I highly recommend you at least read through the second post.

Azure AD and AWS SSO Integration

New Azure AD and AWS SSO Integration

In this post I’m going to get my hands dirty and step through the implementation steps to establish the SAML trust between the two platforms.  I’ve setup a fairly simple lab environment in Azure.  The lab environment consists of a single VNet (virtual network) with a four virtual machines with the following functions:

  • dc1 – Windows Active Directory domain controller for jogcloud.com domain
  • adcs – Active Directory Certificate Services
  • aadc1 – Azure Active Directory Connect (AADC)
  • adfs1 – Active Directory Federation Services

AADC has been configured to synchronize to the jogcloud.com Azure Active Directory tenant.  I’ve configured federated authentication in Azure AD with the AD FS server acting as an identity provider and Windows Active Directory as the credential services provider.

visio of lab environment

Lab Environment

On the AWS side I have three AWS accounts setup associated with an AWS Organization.  AWS SSO has not yet been setup in the master account.

Let’s setup it up, shall we?

The first thing you’ll need to do is log into the AWS Organization master account with an account with appropriate permissions to enable AWS SSO for the organization.  If you’ve never enabled AWS SSO before, you’ll be greeted by the following screen.

1.png

Click the Enable AWS SSO button and let the magic happen in the background.  That magic is provisioning of a service-linked role for AWS SSO in each AWS account in the organization.  This role has a set of permissions which include the permission to write to the AWS IAM instance in the child account.  This is used to push the permission sets configured in AWS SSO to IAM roles in the accounts.

Screenshot of AWS SSO IAM Role

AWS SSO Service-Linked IAM Role

After about a minute (this could differ depending on how many AWS accounts you have associated with your organization), AWS SSO is enabled and you’re redirected to the page below.

Screenshot of AWS SSO successfully enabled page

AWS SSO Successfully Enabled

Now that AWS SSO has been configured, it’s time to hop over to the Azure Portal.  You’ll need to log into the portal as a user with sufficient permissions to register new enterprise applications.  Once logged in, go into the Azure Active Directory blade and select the Enterprise Applications option.

Register new Enterprise Application

Register new Enterprise Application

Once the new blade opens select the New Application option.

Register new application

Register new application

Choose the Non-gallery application potion since we don’t want to use the AWS app in the Azure Marketplace due to the issues I covered in the first post.

Choose Non-gallery application

Choose Non-gallery application

Name the application whatever you want, I went with AWS SSO to keep it simple.  The registration process will take a minute or two.

Registering application

Registering application

Once the process is complete, you’ll want to open the new application and to go the Single sign-on menu item and select the SAML option.  This is the menu where you will configure the federated trust between your Azure AD tenant and AWS SSO on the Azure  AD end.

SAML Configuration Menu

SAML Configuration Menu

At this point you need to collect the federation metadata containing all the information necessary to register Azure AD with AWS SSO.  To make it easy, Azure AD provides you with a link to directly download the metadata.

Download federation metadata

Download federation metadata

Now that the new application is registered in Azure AD and you’ve gotten a copy of the federation metadata, you need to hop back over to AWS SSO.  Here you’ll need to go to Settings.  In the settings menu you can adjust the identity source, authentication, and provisioning methods for AWS SSO.  By default AWS SSO is set to use its own local directory as an identity source and itself for the other two options.

AWS SSO Settings

AWS SSO Settings

Next up, you select the Change option next to the identity source.  As seen in the screenshot below, AWS SSO can use its own local directory, an instance of Managed AD or BYOAD using the AD Connector, or an external identity provider (the new option).  Selecting the External Identity Provider option opens up the option to configure a SAML trust with AWS SSO.

Like any good authentication expert, you know that you need to configure the federated trust on both the identity provider and service provider.  To do this we need to get the federation metadata from AWS SSO, which AWS has been lovely enough to also provide it to us via a simple download link which you’ll want to use to get a copy of the metadata we’ll later import into Azure AD.

Now you’ll need to upload the federation metadata you downloaded from Azure AD in the Identity provider metadata section.  This establishes the trust in AWS SSO for assertions created from Azure AD.  Click the Next: Review button and complete the process.

AWS SSO Identity Sources

Configure SAML trust

You’ll be asked to confirm changing the identity source.  There a few key points I want to call out in the confirmation page.

  • AWS SSO will preserve your existing users and assignments -> If you have created existing AWS SSO users in the local directory and permission sets to go along with them, they will remain even after you enable it but those users will no longer be able to login.
  • All existing MFA configurations will be deleted when customer switches from AWS SSO to IdP.  MFA policy controls will be managed on IdP -> Yes folks, you’ll now need to handle MFA.  Thankfully you’re using Azure AD so you plenty of options there.
  • All items about provisioning – You have to option to manually provision identities into AWS SSO or use the SCIM endpoint to automatically provision accounts.  I won’t be covering it, but I tested manual provisioning and the single sign-on aspect worked flawless.  Know it’s an option if you opt to use another IdP that isn’t as fully featured as Azure AD.
Confirmation prompt

Confirmation prompt

Because I had to, I popped up the federation metadata to see what AWS requiring in the order of claims in the SAML assertion.  In the screenshot below we see is requesting the single claim of nameid-format:emailaddress.  This value of this claim will be used to map the user to the relevant identity in AWS SSO.

AWS SSO Metadata

Back to the Azure Portal once again where you’ll want to hop back to Single sign-on blade of the application you registered.  Here you’ll click the Upload metadata file button and upload the AWS metadata.

Uploading AWS federation metadata

Uploading AWS federation metadata

After the upload is successful you’ll receive a confirmation screen.  You can simple hit the Save button here and move on.

Confirming SAML

Confirming SAML

At this stage you’ve now registered your Azure AD tenant as an identity provider to AWS SSO.  If you were using a non-Azure AD security token service, you could now manually provision your users AWS SSO, create the necessary groups and permissions sets, and administer away.

I’ll wrap up there and cover the SCIM provisioning in the next post.  To sum it up, in this post we configured AWS SSO in the AWS Organization and established the SAML federated trust between the Azure AD tenant and AWS SSO.

See you next post!

Deep Dive into Azure AD and AWS SSO Integration – Part 2

Deep Dive into Azure AD and AWS SSO Integration – Part 2

Welcome back folks.

Today I’ll be continuing my series on the new integration between Azure AD and AWS SSO.  In my last post I covered the challenges with the prior integration between the two platforms, core AWS concepts needed to understand the new integration, and how the new integration addresses the challenges of the prior integration.

In this post I’m going to give some more context to the challenges covered in the first post and then provide an overview of the what the old and new patterns look like.  This will help clarify the value proposition of the integration for those of you who may still not be convinced.

The two challenges I want to focus on are:

  1. The AWS app was designed to synchronize identity data between AWS and Azure AD for a single AWS account
  2. The SAML trust between Azure AD and an AWS account had to be established separately for each AWS account.

Challenge 1 was unique to the Azure Marketplace AWS app because they were attempting to solve the identity lifecycle management problem.  Your security token service (STS) needs to pass a SAML assertion which includes the AWS IAM roles you are asserting for the user.  Those roles need to be mapped to the user somewhere for your STS to tap into them.  This is a problem you’re going to feel no matter what STS you use, so I give the team that put together the AWS app together credit for trying.

The folks over at AWS came up with an elegant solution requiring some transformation in the claims passed in the SAML token and another solution to store the roles in commonly unused attributes in Active Directory.  However, both solutions suffered the same problem in that you’re forced to workaround that mapping, which becomes considerably difficult as you began to scale to hundreds of AWS accounts.

Challenge 2 plagues all STSs because the SAML trust needs to be created for each and every AWS account.  Again, something that begins to get challenging as you scale.

AWS Past Integration

AWS Past Integration

In the image above, we see an example of how some enterprises addressed these problems.  We see that there is some STS in use acting as an identity provider (idP) (could be Azure AD, Okta, Ping, AD FS, whatever) that has a SAML trust with each AWS account.  The user to AWS IAM role mappings are included in an attribute of the user’s Active Directory user account.  When the user attempts to access AWS, the STS queries Active Directory for the information.  There is a custom process (manual or automated) that queries each AWS account for a list of AWS IAM Roles that are associated with the IdP in the AWS account.  These roles are then populated in the attribute for each relevant user account.  Lastly, CloudFormation is used to push IAM Roles to each AWS account.  This could be pushed through a manual process or a CI/CD pipeline.

Yeah this works, but who wants all that overhead?  Let’s look at the new method.

Azure AD and AWS SSO Integration

Azure AD and AWS SSO Integration

In the new integration where we use Azure AD and AWS SSO together, we now only need to establish a single SAML trust with AWS SSO.  Since AWS SSO is integrated with AWS Organizations it can be used as a centralized identity source for all AWS accounts within the organization.  Additionally, we can now leverage Azure AD to manage the synchronization of identity data (users and groups) from Azure AD to AWS SSO.  We then map our users or groups to permission sets (collections of IAM policies) in AWS SSO which are then provisioned as IAM roles in the relevant AWS accounts.  If we want to add a user to a role in AWS IAM, we can add that user to the relevant group in Azure AD and wait for the synchronization process to occur.  Once it’s complete, that user will have access to that IAM role in the relevant accounts.  A lot less work, right?

Let’s sum up what changes here:

  • We can use existing processes already in place to move users in and out of groups either on-premises in Windows AD (that is syncing to Azure AD with Azure AD Connect) or directly in Azure AD (if we’re not syncing from Windows AD).
  • Group to role mappings are now controlled in AWS SSO
  • Permission sets (or IAM policies for the IAM roles) are now centralized in AWS SSO
  • We no longer have to provision the IAM roles individually into each AWS account, we can centrally control it in AWS SSO

Cool right?

In my few posts I’ll begin walking through the integration an demonstrating some the solution.

Thanks!

DNS in Microsoft Azure – Part 3

DNS in Microsoft Azure – Part 3

Today I’ll be continuing my series on DNS in Microsoft Azure.  In my first post I covered fundamental concepts of DNS resolution in Azure such as the 168.63.129.16 virtual IP and Azure-provided DNS.  In the second post I went over the Azure Private DNS service, it’s benefits, limitations, and available patterns when you use Azure Private DNS alone.  In this post I’ll be exploring how, when combined with bring your own DNS (BYODNS), Azure Private DNS begins to really shine and introduces opportunities some very cool self-service/delegation models.

If an enterprise has any degree technical footprint, it will have a DNS infrastructure providing DNS resolution to intranet and Internet resources.  These existing services are often very mature and deeply embedded into the technology stack.  This means the likelihood of ditching your existing DNS service for a cloud-based DNS service isn’t going to happen out of the gates (if at all).  This leaves you with the question of extending your existing DNS infrastructure into Azure as is, or hooking it into cloud native DNS services such as Azure Private DNS.  I’m not going to give you the typical sales pitch stating how easy it is to do the latter, because it can be challenging depending on how complex your DNS infrastructure is and what your internal policies and operations models are.  Instead I’m going to show you how you can make these two services coexist and compliment each other.

As I covered in my first post, you can configure the VMs to use either Azure DNS servers or your own DNS servers.  This configuration is available at both the VNet level and VM network interface level.  Avoid setting the DNS server settings directly on the VM’s network interface if possible because it will introduce more management overhead.  There are always exceptions to the rule, but make sure establish what those exceptions are and have a way of tracking them.

So you’ve decided you’re going to BYODNS.  Common reason for doing this are:

  1. Hybrid workloads that require access to on-premises services
  2. Advanced capabilities of existing DNS services
  3. Requirements for Windows Active Directory for centralized identity, authentication, and optionally configuration management services
  4. Maintaining a singular management plane for all DNS services across an organization

Since the requirement around Windows Active Directory services is the most common reason in my experiences, I’m going to cover that use case.  Keep in mind that you could easily sub in your favorite DNS infrastructure service for the DNS patterns I demonstrate in this post.  Yes, this means you could toss in a BIND server or InfoBlox NVA.

With that settled, let’s cover the basics.

In the BYODNS scenario, you’ll want to configure your own DNS servers as seen in the screenshot below (note that you should include at least two DNS servers for redundancy):

dnservers.PNG

When configured to use a specific set of DNS servers, a few things happen at the VM.  The screenshot below is the results of an ipconfig /all on a domain-joined Windows Server 2016 VM.  First you’ll notice that the DNS server being pushed to the VM is the 10.100.4.10 address which is the DNS server setting I’m pushing at the VNet.  The other thing to take note of is the Connection-specific DNS suffix being pushed by the Azure DHCP service is no longer the Azure-provided (xxx.xxx.internal.cloudapp.net).  It’s now reddog.microsoft.com which is a non-functioning placeholder.  This is pushed to avoid interfering with DNS resolution through BYODNS such as the domain-joined scenario I’m demonstrating.ipconfig.png

The lab environment I’m using for this post looks like the below.

labenv.PNG

It has three VNets in a hub and spoke architecture where the shared VNet is peered to both the app1 and app2 VNet.  The shared VNet contains a single VM named dc1 acting as a domain controller for a Windows Active Directory forest named journeyofthegeek.com.  Each spoke VNet is configured to push the IP of dc1 (10.100.4.10) to the VMs within the VNet as the DNS server.  The VMs in each spoke are domain-joined.  I’ve also created multiple Azure Private Zones as seen in the table in the diagram.  The shared VNet has been linked to all the zones for resolution.  Each spoke VNet is linked to a zone for registration and resolution.

The DNS Server service running on dc1 has been configured to forward all traffic outside of its domain to Google’s public DNS servers .  It also has multiple conditional forwarders configured to send traffic for any of the Azure Private DNS zones to the 168.63.129.16 virtual IP.  I’ve created a single A record in the appzone.com named www and assigned it the IP of the app1 server (10.102.0.10) in the app1 VNet.

If you take a look below at each of the Azure Private DNS zones assigned to the spokes, you can see that the VMs in each spoke has automatically registered an A record for itself with its associated zone.  Take note that this happened even though each VM is configured to use dc1 as a DNS server.  This is the magic of the cloud platform where the platform itself took care of registration of the records.

app1zone

app1zone.com Private DNS Zone

app2zone

app2zone.com Private DNS Zone

When a VM needs to perform DNS resolution, it sends that DNS query to dc1.  It then sends a DNS query to Azure DNS services via the 168.63.129.16 virtual IP for resolution of the Azure Private zones (red line) that it has been linked to.  Resolution of records in other domains are sent out to the Internet (blue).  The traffic flows is illustrated in the diagram below:

stddnsreso.PNG

There are a few benefits to this pattern introduces.  One benefit is it addresses a few of the gaps in Azure Private DNS, namely no conditional forwarding and no query logging.

With no support for conditional forwarding, any VMs you set to use the Azure DNS servers through the 168.63.129.16 virtual IP will only be able to resolve namespaces Azure DNS is aware of.  Since Azure DNS has no awareness of DNS zones running on the domain controller, we’d be out of luck if we needed to use any domain services.  This problem extends to any DNS zone you’re running on DNS equipment that isn’t resolvable from the Internet.  Yep, this means no hybrid workloads over your private connection back to your on-prem or colo datacenter.  The conditional forwarder capability on the BYODNS service allow us to resolve the problem and additionally get the queries to Azure DNS when it’s called for.

The other limitation is DNS query logging.  As I’ve mentioned before, DNS query logs are excellent inputs to any organization’s behavior analytics to help detect threats in the environment.  That log data is that much more important when you move into the cloud, because it helps mitigate the risks of the additional freedoms you’ll be giving application owners and developers to spin up their own resources.  By introducing a BYODNS service, we capture that log data.

I fully expect both of these features to eventually make their way into the service.  Until that time, the BYODNS pattern demonstrated above can help address the gaps.

You may be asking yourself, “If I have to BYODNS, what does Azure Private DNS get me?” Excellent question.  The answer is it can provide self-service, agility, reduce overhead, and mitigate risk.  How does it do these things, let me count the ways:

  1. In most organizations, DNS is managed by a central IT group.  This means application owners and developers have to submit request and wait for those requests to be completed.  Wouldn’t it be great to let them perform the updates themselves on a zone they own?
  2. Azure Private DNS is available over a modern REST API.  Yes yes, I know you are a scripting ninja and have a 100 PowerShell and Bash scripts available at your fingertips, but show me a developer in 2019 who wants to write anything in those languages when a REST option is available.
  3. Managing multiple DNS zones and associated records on BYODNS equipment can require significant overhead in both staff and hardware.  This sometimes drives organizations to support fewer zones which increases the risks of changes to the zone affecting applications.  By incorporating Azure Private DNS into the mix, you can reduce the overhead of BYODNS (think of how much more when logging and conditional forwarders are introduced) by letting each business unit own a zone (i.e. marketing.journeyofthegeek.com, hr.journeyofthegeek.com, etc).
  4. Show me someone who been in operations that hasn’t had a major outage caused by what should have been a simple DNS change.  No?  I didn’t think so.  By giving each BU its own Azure Private DNS zone, you limit the blast radius of a bad change to BU1 affecting BU2.  Since each zone is different resource in Azure, you can additionally wrap an authorization boundary around that resource limiting employees to only the zones they need to administer.

Once you have the above pattern in place, you can easily expand upon it providing DNS resolution from on-premises VMs to Azure and vice versa.  You can Setup the appropriate connection between Azure and your on-premises (S2S, ExpressRoute), put in the appropriate conditional forwarders on both ends, and you’re good to go!  Again, expect this to be easier as the service matures if conditional forwarders and a PrivateLink endpoint for the service are introduced.

Well folks, that will wrap up the series.  The key things I want you to take away from this is that Azure Private DNS isn’t in a state where it can replace a mature DNS implementation (I fully expect that to change over time).  Instead, you will want to use to to supplement your existing DNS implementation to reduce overhead, increase agility of application owners and developers, and yes even mitigate a bit risk in the process.

For those of you who will be stuffing themselves with turkey, stuffing, and mashed potatoes this week, have a wonderful Thanksgiving!

 

DNS in Microsoft Azure – Part 2

DNS in Microsoft Azure – Part 2

Welcome back fellow geek to part two of my series on DNS in Azure.  In the first post I covered some core concepts behind the DNS offerings in Azure.  One of the core concepts was the 168.63.129.16 virtual IP address which acts the communication point when Azure services within a VNet need to talk to Azure DNS resolver.  If you’re unfamiliar with it, circle back and read that portion of the post.  I also covered the basic DNS offering, Azure-provided DNS.  For this post I’m going to cover the newly minted Azure Private DNS service.

As I covered my last post, Azure-provided DNS is a decent service if you’re doing some very basic proof-of-concept testing, but not much use beyond that.  The limited capabilities around record types, scale challenges for BYODNS when requiring resolution across multiple VNets, and no reverse DNS support typically have required an enterprise BYODNS solution.  This meant organizations were stuck purchasing expensive NVAs or rolling VMs running BIND or Windows DNS Server.  Beyond the overhead of having to manage all aspects of the VM we’re all familiar with, it also brings along legacy request and change management processes.  In most enterprises application owners have to submit requests to central IT to have DNS entries created or modified.  This is counter to the goal of empowering application owners to be more agile.

Thankfully, Microsoft heard the cries of application owners and central IT and introduced Azure Private DNS into public preview back in early 2018.  After a few iterations and improvements, the service officially went general availability just last month.  The service addresses many of the gaps Azure-provided DNS has such as:

  • Support for custom DNS namespaces
  • Support for all common DNS record types such as A, MX, CNAME, PTR
  • Support for reverse DNS
  • Automatic lifecycle management of VM DNS records
  • Resolution across multiple VNets

Before we jump into the weeds, we’ll first want to cover the basic concepts of the service.  Azure Private DNS zones are an Azure resource under the namespace of /providers/Microsoft.Network/privateDnsZones/.  Each DNS zone you want to create is represented as a separate resource.  Zones created in one subscription can be consumed in another subscription as long as they’re within the same Azure AD tenant.  VNets can resolve and register DNS records with the zones you create after you “link” the VNet to the zone.  Each zone can be linked to multiple VNets for registration and resolution.  On other hand, VNets can be linked to multiple zones for resolution but only one zone for registration.  Once a zone is linked to the VNet, VMs within the VNet resolve and/or register DNS records for those zones through the 168.63.129.16 virtual IP.

I’ll quickly cover the reverse lookup zone capability that comes along with using the service. When a VNet is linked to a zone for registration there is reverse lookup zone created for the VNet.  VMs created in subnets within that VNet will register a PTR record for its FQDN of the private zone as well as a PTR record for FQDN of internal.cloudapp.net zone.  Take note that records in the reverse lookup zone will only be resolvable by VMs within that VNet when sent through the 168.63.129.16 virtual IP.

In the image below VNet1 is linked to an Azure Private Zone for both resolution and registration.  VNet2 is registered to a different Azure Private Zone for both resolution and registration.  Both VNets are configured to use Azure DNS servers.  In this scenario, Server1 will be able to perform a reverse lookup for the IP address of Server2 because it is within the same VNet.  However, Server3 will not be able to perform a reverse lookup for Server2 because it is in a different VNet.

Picture of reverse DNS lookup flows

Reverse DNS Lookups with Azure Private DNS

In addition to PTR records, the VMs also register A records for the private zone and the Azure-provided DNS zone.  There are a few things to note about the A records automatically created in the private zone:

  • Each record has a property called isAutoRegistered which has a boolean value of true for any records created through the auto-registration process.
  • Auto-registered records have an extremely short TTL of 10 seconds.  If you have plans of performing DNS scavenging, take note of this and that these records are automatically deleted when the VM is deleted.
Private DNS zone viewed in portal

Portal View of Private DNS Zone

The Azure-provided DNS zone dynamically created for the VNet is still created even when linking an Azure Private DNS zone to a VNet.  Additionally, if you try to resolve the IP address using a single label hostname, you’ll get back the A record for the Azure-provided DNS zone.  This is by design and allows you to control the DNS suffix automatically appended by your VMs.  It also means you need to use the FQDN in any application configuration to ensure the record is resolved correctly.

dnsquery.PNG

Let’s now look at resolution between two VNets.  In this scenario we again have VNet1 and VNet2.  VNet1 is linked for both registration and resolution to the Azure Private DNS Zone of app1zone.com.  VNet2 is linked for just resolution to the app1zone.com.  VMs in VNet2 are able to resolve queries for the fully-qualified domain name of VMs in VNet1 as illustrated in the diagram below.

Image showing DNS resolution between two VNets where both are linked for resolution

DNS Resolution between two VNets

Beyond the auto-registration of records, you can also manually create a variety of record types as I mentioned above. There isn’t anything special or different in the way Azure is handling these records.  The only thing worth noting is the records have a standard 1 hour TTL.

There are two significant limitations in the service right now.  One of those limitations is no support for query logging.  Given how important DNS query logging data can be as data points to identifying threats in the environment, your organization may require this.  If so, you’ll need to insert some BYODNS into the mix (I’ll cover that pattern next post).  The other bigger and more critical limitation is the lack of support for conditional forwarding.  As of today, you can’t create conditional forwarders for the service which will prevent you from forwarding queries from the 168.63.129.16 virtual IP to other DNS services you may have running for resolution of other resources such as on-premises resources.  Again, the workaround here is BYODNS.  Expect both of these limitations to be addressed in time as the service matures.

Azure Private DNS alone is a great service if your organization is completely in the cloud and has basic DNS resolution needs.  Some patterns you could leverage here:

  • Separate private DNS zone for each application –  In this scenario you could grant your application owners full control of the zone letting them manage the records as they see fit.  This would improve the application team’s agility while reducing operational burden on central IT.
  • Separate private DNS zones for each environment (Dev/QA/Prod) – In this scenario you could avoid having to do any BYODNS if there are no dependencies on on-premises infrastructure.  You also get full lifecycle management of VM records cutting back on operational overhead.

Summing up the service:

  • Positives
    • Managed service where you don’t have to worry the managing the underlining infrastructure
    • Scalability and availability are baked into the service
    • Use of custom DNS namespaces
    • VMs spread across multiple VNets can resolve each other’s addresses
    • Reverse DNS is supported within a VNet
    • Lifecycle of the VMs DNS records are automatically managed by the platform
    • Applications could be assigned their own DNS zones and application owners delegated full control over that zone
  • Negatives
    • No support for conditional forwarders at this time
    • No support for DNS query logging at this time
    • No support for WINS or NETBIOS (although I call this a positive 🙂 )

In my next post I’ll cover how the service works with BYODNS and will discuss some neat patterns that are available when you take advantage of the service.

DNS in Microsoft Azure – Part 1

DNS in Microsoft Azure – Part 1

Hi everyone,

In this series of posts I’m going to talk about a technology, that while old, still provides a critical foundational service.  Yes folks, we’re going to cover Domain Naming System (DNS).  Specifically, we’re going to look at the options for private DNS in Microsoft Azure and what the positives and negatives are of each pattern.  I’m going to go into this assuming you have a basic knowledge of DNS and understand the namespaces, various record types, forward and reverse lookup zones, recursive and iterative queries, DNS forwarding and conditional forwarding, and other core DNS concepts.  If any of those are unfamiliar to you, take some time to review the basics then come back to this post.

Before we jump into the DNS options in Azure, I first want to cover the 168.63.129.16 address.  If you’ve ever done anything even basic in Azure, you’ve probably run into this address or used it without knowing it.  This public IP address is owned by Microsoft and is presented as a virtual IP address serving as a communication channel to the host node  for a number of platform resources.  It provides functionality such as virtual machine (VM) agent communication of the VM’s ready state, health state, enables the VM to obtain an IP address via DHCP, and you guessed it, enables the VM to leverage Azure DNS services.  The address is static and is the same for any VNet you create in every Azure region.  Fun fact, some geolocation services will report this IP as being based out of Hong Kong and I’m sure you can imagine how that works when something like a WAF is in place with regional IP restrictions.  Fun times. 🙂

Traffic is routed to and from this virtual IP address through the subnet gateway.  If you run a route print on a Windows machine, you can see this route defined in the routing table of the VM.

route

Output of route print on Azure VM

The IP address is also defined in the VirtualNetwork service tag meaning the default rules within a network security group (NSG) allow this traffic to and from the VM.  Given the criticality of the functions the IP plays, Microsoft recommends you allow inbound and outbound communication with it (it’s a requirement for using any of the Azure DNS services we’ll discuss in these posts).

Now that you understand what the 168.63.129.16 virtual IP address is, let’s first cover the very basics of DNS in Azure. You can configure Azure’s DHCP service to push a custom set of DNS servers to Azure VMs or optionally leave the default which is for VMs to use Azure’s DNS services (through the 168.63.129.16 virtual IP address).  This can be configured at the VNet level and then inherited by all virtual network interfaces (VNIs) associated with the VNet, or optionally configured directly on the VNI associated with the VM.

Configure DNS on VNet

Configure DNS on VNet

This brings us to the first option for DNS resolution in Azure, Azure-provided name resolution.  Each time you spin up a virtual network Azure assigns it a unique private DNS namespace using the format <randomly generated>.internal.cloudapp.net.  This namespace is pushed to the machine via DHCP Option 15 thus each VM has an fully qualified domain name of <vm_host_name>.<randomly generated>.internal.cloudapp.net and each VM in the VNet can resolve IP addresses of one another.

Let’s look at an example with a single VNet.  I’ve created a single VNet named vnet1.  I’ve assigned the CIDR block of 10.101.0.0/16 and created a single subnet assigned the 10.101.0.0/24 block.  Two Windows Server 2016 VMs have been created named azuredns and azuredns1 with the IP addresses 10.101.0.4 and 10.101.0.5.  Azure has assigned the a namespace of r0b5mqxog0hu5nbrf150v3iuuh.bx.internal.cloudapp.net to the VNet.  Notes the DHCP Server and DNS Server settings in the ipconfig output of the azuredns vm shown below.

ipconfig

IPConfig output of Azure VM

If we ping azuredns1 from azuredns we can see the in below Wireshark capture that prior to executing the ping, azuredns performs a DNS query to the 168.63.129.16 VIP and gets back a query response with the IP address of azuredns1.

wireshark

Wireshark packet capture of DNS query

The resolution process is very simple as seen in the diagram below.

simple_reso

DNS Resolution within single VNet

Well that’s all well and good for very basic DNS resolution, but who the heck has a single VNet in anything but a test environment?  So can we expand Azure-provided DNS to multiple VNets?  The answer is yes, but it’s ugly.  Recall that each VNet has its own private DNS namespace.  The only way to resolve names contained within that namespace is for a VM in that VNet to send the query to the 168.63.129.16 address.  Yes folks, this means you would need to drop a DNS server in each VNet in order to resolve the Azure-provided DNS host names assigned to VMs within that VNet by another VMs in another VNet as illustrated in the diagram below.

multi_vnet_reso

Multiple VNet resolution

You can see as the number of VNets increases the scalability of this solution quickly breaks down.  Take note that if you wanted to resolve these host names from on-premises you could use a similar conditional forwarder pattern.

Let’s sum up the positives and negatives of Azure-provided DNS.

  • Positives
    • No need to provision your own DNS servers and worry about high availability or scalability
    • DNS service provided by Azure automatically scales
    • VMs within a VNet can resolve each other’s IP addresses out of the box
  • Negatives
    • Solution doesn’t scale with multiple VNets
    • You’re stuck with the namespace assigned to the VNet
    • WINS and NetBIOS are not supported
    • Only A records that are automatically registered by the service are supported (no manual registration of records)
    • No reverse DNS support
    • No query logging

As you can see from the above the negatives far outweigh the positives.  Personally, I see Azure-provided DNS only being useful for bare bones test environments with a single VNet.  If anyone has any other scenarios where it comes in handy, I’d love to hear them.

In my next post I’ll cover Azure’s new offering in the DNS space, Azure Private DNS Zones.  I’ll walk through how it works and how we can combine it with BYO DNS to create some pretty neat patterns.

See you then!

Capturing Azure Management Group Activity Logs Using Azure Automation – Part 2

Welcome back fellow geeks!

This post will be the second post in a series covering how to use Azure Automation to capture Azure Management Group Activity Logs.  In the first post I walked through what management groups are and the problems that they solve.  The key takeaway of that post is that management groups have their own Activity Logs and (at this time) they’re only accessible from within the Portal and over the Azure REST API.  Given that management groups are where we’re applying our Azure Policy for governance and compliance and our access controls via Azure RBAC, the Activity Logs are pretty critical.  So what is a geek to do?

In this post I’ll cover a solution I put together to solve the problem.  It uses an Azure Automation PowerShell Runbook to iterate through the management groups within an Azure Active Directory tenant, write the logs to Azure Storage, and optionally deliver the logs to Azure Monitor or Azure EventHubs.  The architecture is pictured below.

Capture.PNG

If you’re not familiar with Azure Automation it’s a service that provides a number of key capabilities within Azure such as configuration management, update management, and process automation.  If you’re coming from AWS, I’d compare it to a service somewhat similar to AWS Systems Manager.  For the purposes of this series of posts I’m going to focus on the process automation capability of the service delivered through Runbooks.  I’m not going to go too in-depth into Azure Automation, but I’ll provide a brief overview of the service features and tweaks relevant to the solution.

Runbooks are modules of code that can be strung together to perform a series of tasks such as performing maintenance on a collection of VMs.  The modules can be authored using either PowerShell or Python.  At this time only Python 2 is supported which makes me a sad panda.  Given that Python 2 enters end of life in two months, I’d recommend doing anything Python related in Azure Functions.  I could devote an entire blog post complaining about the lack of Python 3 in the year 2019, but I’ll spare you.  You’re going to want to author your Runbooks in PowerShell until/if Python 3 is supported is supported in the future.

The Azure Automation account acts as a logical container for the Runbooks created within it.  An Azure Automation Account can be provided with a RunAs account, which is simply a service principal in Azure Active Directory.   The service principal is configured with a certificate credential which is used by the Automation Account to authenticate to Azure AD and access Azure resources within the tenant.  Any Runbooks you create within the Automation account can assume the identity to execute tasks across your Azure resources.

You can automatically provision the RunAs account when the Azure Automation Account is provisioned, just be aware that the service principal will be granted the Contributor role on the Azure Subscription.  This is probably going to be way more permissions than are needed so I’d recommend removing that role assignment, creating a custom RBAC role, and assigning it at the appropriate scope.

Automation Accounts have a number of assets which are relevant for Runbooks.  These include variables, connections, credentials, and certificates.  The links I provided will give you detailed information on these assets, so I’ll summarize the relevant content to the solution.  Variables can come in a variety of types including strings and integers and can also optionally be encrypted.  For this solution I use encrypted variables to store the Event Hub connection string, Log Analytics Workspace Id, and Log Analytics Workspace Key.  Connections contain information required to connect to an external service or application.  The only connection asset used with this solution is the AzureServicePrincipal which is used by the RunAs account.  You can retrieve the  connection to get information such as the Azure AD tenant Id and application id (client id in the OAuth world).  Lastly, we have the certificate asset, which as the name describes, can be used to securely store a certificate that is used for authentication.  This solution uses the AzureRunAsCertificate certificate which contains the certificate asset used to authenticate the Automation Account RunAs account.

Each Automation Account comes with a predefined set of PowerShell modules and .NET libraries.  You can add additional modules and libraries by importing them to the Automation Account.  For this solution I added a number of .NET libraries including the ADAL and some libraries required to communicate with Event Hubs.  While PowerShell does a wonderful job of handling things at the management plane of Azure, it is severely lacking in the data plane requiring you to fall back on incorporating .NET code into your PowerShell script.

The above (including the links) should give you the bare minimum you need to understand to use this solution.  Let’s deep dive into the code.  Since this is a fairly lengthy script I’m not going to paste every line of code.  Instead I’m going to call out key sections of code that were particularly relevant or interesting to write.

The first function in the script is called Get-AdalToken and uses the .NET ADAL library to retrieve a token from Azure AD.  When I code in Python I typically use the MSAL library since I find it to be a bit more slick, but found the .NET version too cumbersome and difficult to use in in PowerShell.  If you’ve ever used .NET libraries in your PowerShell scripts, you know where I’m coming from.

The token retrieved by the function is used for calls to the Azure Management REST API.  The reason I went with ADAL vs pulling the access token from a session created using Add-AzAccount method as demonstrated here is I wanted code I could reuse for other purposes outside of the Azure REST API.

Once the token is retrieved it is stored in a variable for later use in the script.

adal

Next up we have the Get-AllManagementGroups function.  This function calls the Azure REST API to get a full listing of management groups.  Oddly enough there is an AzureRM cmdlet included in the AzureRm.Resources module that comes preinstalled with every new Automation Account.  However, even after updating the modules within the account (this link tells you how to do this and I highly recommend doing it whenever you create a new automation account) the cmdlet only ever reported back the tenant root group.  This occurred even when following the instructions to spit back all Management Groups.  I chalked it up to there being an issue with the cmdlet or user error on my part.  Either way, it was simple enough to whip up a call to the REST API.

Following the Get-AllManagementGroups function we have the Get-ManagementGroupActivityLog function.  Let me tell you folks, this one was an absolute pain to write.  According to this Azure feedback thread these logs have been accessible over the API since back in March of this year, but the REST API reference documentation doesn’t look to have been updated to reflect this.  I’m going to save you all a ton of headaches and hours of experimentation and searching the web.  When you want to get Activity Logs over the REST API you are going to use the following endpoint:


https://management.azure.com/providers/Microsoft.Management/

managementGroups/mgmtGroupId/providers/microsoft.insights/
eventtypes/management/values

The mgmtGroupId variable would be the name of your management group.  If your management group is named production then the value in that URL would be production.  Additionally, you’ll want to pass query parameters of api-version set to 2017-03-01-preview and a $filter query parameter constructed in the same way you would to query a subscription Activity Log.

activitylogquery.PNG

The SendTo-Storage function sends the Activity Log for each Management Group as a separate blob to Azure Storage.  The format of the Activity Log is raw JSON.

The SentTo-Workspace function sends the log data to Azure Monitor (really a Log Analytics Workspace) via the HTTP Data Collector API.  The product team was wonderful enough to include sample PowerShell code that made writing that function a breeze.

I did run into some weirdness with this function which was caused by the maximum size of an output stream in Runbooks which is 1MB.  When I pulled the Activity Log for 90 days, the entirety of the log was well over 1MB so it would cause the Runbook to fail three times and suspend.  Debugging this was a pain because the Runbook doesn’t report the error in an obvious way.  I got around this by collecting the log entries into a group and sending them at 200KB intervals.    Additionally, I also added some error checking and retry handling if it got throttled.

The final function is named SendTo-EventHub and delivers the logs to an Event Hub.  I couldn’t find any PowerShell cmdlets that could be used to send data to Event Hub.  This forced me to fall back to the .NET libraries.  In the end I got it working and got them streaming, but I’m sure someone more skilled in .NET than me (which isn’t difficult to be) could optimize and improve that code.

The main chunk of the solution strings everything together.  By default the solution writes the logs to Azure blob storage.  You can optionally deliver the data to Azure Monitor and Azure Event Hubs.

Well folks that brings us to the end of this post and series.  While I’m sure the product team is quickly coming out with this out of box integration, I learned a ton about Azure Automation and Runbooks working on this effort.  Runbooks are a wonderful tool if you’re a classic infrastructure / security tech new to the whole coding thing.  It’s a very simple and straightforward user experience for that audience and a good stepping stone into the coding world vs jumping directly into Azure Functions.

I’ve posted the solution up onto my Github.  For those folks without Github, I’ve put a static copy of the solution up on this website at this link.  Take it, test it, play with it, build upon it, and experiment with it.