Writing a Custom Azure Policy

Writing a Custom Azure Policy

Hi folks! It’s been a while since my last post. This was due to spending a significant amount of cycles building out a new deployable lab environment for Azure that is a simplified version of the Enterprise-Scale Landing Zone. You can check it out the fruits of that labor here.

Recently I had two customers adopt the encryption at host feature of Azure VMs (virtual machines). The quick and dirty on this feature is it exists to mitigate the threat of an attacker accessing the data on the VHDs (virtual hard disks) belonging to your VM in the event that an attacker has compromised a physical host in an Azure datacenter which is running your VM. The VHDs backing your VM are temporarily cached on a physical host when the VM is running. Encryption at host encrypts the cached VHDs for the time they exist cached on that physical host. For more detail check out my peer and friend Sebastian Hooker’s blog post on the topic.

Both customers asked me if there was a simple and easy way to report on which VMs were enabled the feature and which were not. This sounded like the perfect use case for Azure Policy, so down that road I went and hence this blog post. I dove neck deep into Azure Policy about a year and a half ago when it was a newer addition to Azure and had written up a tips and tricks post. I was curious as to how relevant the content was given the state of the service today. Upon review of the content, I found many of the tips still relevant even though the technical means to use them had changed (for the better). Instead of walking through each tip in a generic way, I thought it would be more valuable to walk through the process I took to develop an Azure Policy for this use case.

Step 1 – Understand the properties of the Azure Resource

Before I could begin writing the policy I needed to understand what the data structure returned by ARM (Azure Resource Manager) for a VM enabled with encryption at host looks like. This meant I needed to create a VM and enable it for encryption at host using the process documented here. Once VM was created and configured, I then broke out Azure CLI and requested the properties of the virtual machine using the command below.

az vm show --name vmworkload1 --resource-group RG-WORKLOAD-36FCD6EC

This gave me back the properties of the VM which I then quickly searched through to find the block below indicating encryption at host was enabled for the VM.

VM properties with encryption at host enabled

When crafting an Azure Policy, always review the resource properties when the feature or property you want to check is both enabled and disabled.

Running the same command as above for a VM without encryption at host enabled yielded the following result.

VM properties with encryption at host not enabled

Now if I had assumed the encryptionAtHost property had been set to false for VMs without the feature enabled and had written a searching for VMs with the encryptionAtHost property equal to False, my policy would not have yielded the correct result. I’d want to use conditional exists instead of equals. This is why it’s critical to always review a sample resource both with and without the feature enabled.

Step 2 – Determine if Azure Policy can evaluate this property

Wonderful, I know my property, I understand the structure, and I know what it looks like when the feature is enabled or disabled. Now I need to determine if Azure Policy can evaluate it. Azure Policy accesses the properties of resources using something something called an alias. A given alias maps back to the path of a specific property of a resource for a given API version. More than likely there is some type of logic behind an alias which makes an API call to get the value of a specific property for a given resource.

Now for the bad news, there isn’t always an alias for the property you want to evaluate. The good news is the number of aliases has dramatically increased since I last looked at them over a year ago. To validate whether the property you wish to validate has an alias you can use the methods outlined in this link. For my purposes I used the AZ CLI command and grepped for encryptionAtHost.

az provider show --namespace Microsoft.Compute --expand "resourceTypes/aliases" --query "resourceTypes[].aliases[].name" | grep encryptionAtHost

The return value displayed three separate aliases. One alias for the VM resource type and two aliases for the VMSS (virtual machine scale set) type.

Aliases for encryption at host feature

Seeing these aliases validated that Azure Policy has the ability to query the property for both VM and VMSS resource types.

Always validate the property you want Azure Policy to validate has an alias present before spending cycles writing the policy.

Next up I needed to write the policy.

Step 3 – Writing the Azure Policy

The Azure Policy language isn’t the most initiative unfortunately. Writing complex Azure Policy is very similar to writing good AWS IAM policies in that it takes lots of practice and testing. I won’t spend cycles describing the Azure Policy structure as it’s very well documented here. However, I do have one link I use as a refresher whenever I return to writing policies. It does a great job describing the differences between the conditions.

You can write your policy using any editor that you wish, but Visual Studio Code has a nice add-in for Azure Policy. The add in can be used to examine the Azure Policies associated with a given Azure Subscription or optionally to view the aliases available for a given property (I didn’t find this to be as reliable as using querying via CLI).

Now that I understood the data structure of the properties and validated that the property had an alias, writing the policy wasn’t a big challenge.

{
    "mode": "All",
    "parameters": {},
    "displayName": "Audit - Azure Virtual Machines should have encryption at host enabled",
    "description": "Ensure end-to-end encryption of a Virtual Machines Managed Disks with encryption at host: https://docs.microsoft.com/en-us/azure/virtual-machines/disk-encryption#encryption-at-host---end-to-end-encryption-for-your-vm-data",
    "policyRule": {
        "if": {
            "anyOf": [
                {
                    "allOf": [
                        {
                            "field": "type",
                            "equals": "Microsoft.Compute/virtualMachines"
                        },
                        {
                            "field": "Microsoft.Compute/virtualMachines/securityProfile.encryptionAtHost",
                            "exists": false
                        }
                    ]
                },
                {
                    "allOf": [
                        {
                            "field": "type",
                            "equals": "Microsoft.Compute/virtualMachineScaleSets"
                        },
                        {
                            "field": "Microsoft.Compute/virtualMachineScaleSets/virtualMachineProfile.securityProfile.encryptionAtHost",
                            "exists": false
                        }
                    ]
                },
                {
                    "allOf": [
                        {
                            "field": "type",
                            "equals": "Microsoft.Compute/virtualMachineScaleSets/virtualMachines"
                        },
                        {
                            "field": "Microsoft.Compute/virtualMachineScaleSets/virtualmachines/securityProfile.encryptionAtHost",
                            "exists": false
                        }
                    ]
                }
            ]
        },
        "then": {
            "effect": "audit"
        }
    }
}

Let me walk through the relevant sections of the policy where I set the value to something for a reason. For the mode section I chose to use the “Full” mode since Microsoft recommends setting this to full unless you’re evaluating tags or locations. In the parameters section I left it blank since there was no relevant parameters that I wanted to add to the policy. You’ll often use parameters when you want to give the person assigning the policy the ability to determine the effect. Other use cases could be allowing the user to specify a list of tags you want applied or a list of allowed VM SKUs.

Finally we have the policyRule section which is the guts of the policy. In the policy I’ve written I have three rules which each include two conditions. The policy will fire the effect of Audit when any of the three rules evaluates as true. For a rule to evaluate as true both conditions must be true. For example, in the rule below I’m checking the resource type to validate it’s a virtual machine and then I’m using the encryption at host alias to check whether or not the property exists on the resource. If the resource is a virtual machine and the encryptionAtHost property doesn’t exist, the rule evaluates as true and the Azure Policy engine marks the resource as non-compliant.

Step 4 – Create the policy definition and assign the policy

Now that I’ve written my policy in my favorite editor, I’m ready to create the definition. Similar to Azure RBAC roles (role based access control) Azure Policies uses the concept of a definition and an assignment. Definitions can be created via the Portal, CLI, PowerShell, ARM templates, or ARM REST API like any Azure resource. For the purposes of simplicity, I’ll be creating the definition and assigning the policy through the Azure Portal.

Before I create the definition I need to plan where I create the definitions. Definitions can be created at the Management Group or Subscription scope. Creating a definition at the Management Group allows the policy to be assigned to the Management Group, child subscriptions of the Management Group, child resource groups of the subscriptions, or child resources of a resource group. I recommend creating the definitions as part of an initiative and create the definitions for the initiatives at a Management Group scope to minimize operational overhead. For this demonstration I’ll be creating a standalone definition at the subscription scope.

To do this you’ll want to search for Azure Policy in the Azure Portal. Once you’re in the Azure Policy blade you’ll want to Definitions under the Authoring section seen below. Click on the + Policy definition link.

On the next screen you’ll define some metadata about the policy and provide the policy logic. Make sure to use a useful display name and description when you’re doing this in a real environment.

Next we need to add our policy logic which you can copy and paste from your editor. The engine will validate that the policy is properly formatted. Once complete click the Save button as seen below.

Now that you’ve created the policy definition you can create the assignment. I’m not a fan of recreating documentation, so use the steps at this link to create the assignment.

Step 5 – Test the policy and review the results

Policy evaluations are performed under the following circumstances. When I first started using policy a year ago, triggering a manual evaluation required an API call. Thankfully Microsoft incorporated this functionality into CLI and PowerShell. In CLI you can use the command below to trigger a manual policy evaluation.

az policy state trigger-scan

Take note that after adding a new policy assignment, I’ve found that running a scan immediately after adding the assignment results in the new assignment not being evaluated. Since I’m impatient I didn’t test how long I needed to wait and instead ran another scan after the first one. Each scan can take up 10 minutes or longer depending on the number of resources being evaluated so prepare for some pain when doing your testing.

Once evaluation is complete I can go to the Azure Policy blade and the Overview section to see a summary of compliance of the resources.

Clicking into the policy shows me the details of the compliant and non-compliant resources. As expected the VMs, VMSS, and VMSS instances not enabled for encryption at host are reporting as non-compliant. The ones set for host-based encryption report to as compliant.

That’s all there is to it!

So to summarize the tips from the post:

  1. Before you create a new policy check to see if it already exists to save yourself some time. You can check the built-in policies in the Portal, the Microsoft samples, or the community repositories. Alternatively, take a crack it yourself then check your logic versus the logic someone else came up with to practice your policy skills.
  2. Always review the resource properties of a resource instance with that you would want to report on and one you wouldn’t want to report on. This will help make sure you make good use of equals vs exists.
  3. Validate that the properties you want to check are exposed to the Azure Policy engine. To do this check the aliases using CLI, PowerShell, or the Visual Studio add-in.
  4. Create your policy definitions at the management group scope to minimize operational overhead. Beware of tenant and subscription limits for Azure Policy.
  5. Perform significant testing on your policy to validate the policy is reporting on the resources you expect it to.
  6. When you’re new to Azure it’s fine to use the built-in policies directly. However, as your organization matures move away from using them directly and instead create a custom policy with the built-in policy logic. This will help to ensure the logic of your policies only change when you change it.

I hope this post helps you in your Azure Policy journey. The policy in this post and some others I’ve written in the past can be found in this repo. Don’t get discouraged if you struggle at first. It takes time and practice to whip up policies and even then some are a real challenge.

See you next post!

What If… Volume 2

Hi there folks!

I’ve been busy lately buried in learning and practicing Kubernetes in preparation for the Certified Kubernetes Administrator exam. Tonight I’m taking a break to bring you another entry into the “What If” series I started a few months back.

Let’s get right to it.

What if I need to access a Private Endpoint in a subscription associated with a different Azure AD tenant and I have an existing Azure Private DNS Zone already?

I’ve been helping a good friend who recently joined Microsoft to support his customer as he gets up to speed on the Azure platform. This customer consists of two very large organizations which have a high degree of independence. Each of these organizations have their own Azure AD tenant and their own Azure footprint. One organization is further along in their cloud journey than the other.

Organization A (new to Azure) needed to consume some data that existed in an Azure SQL database in an Azure subscription associated with Organization B’s tenant. Both organizations have strict security and compliance requirements so they are heavy users of Azure PrivateLink Endpoints. A site-to-site VPN (virtual private network) connection was established between the two organizations to facilitate network communication between the Azure environments.

Customer Environment

The customer environment looked similar to the above where a machine on-premises in Organization A needed to access the Azure SQL database in Organization B. If you look closely, you probably see the problem already. From a DNS perspective, we have two Azure Private DNS Zones for privatelink.database.windows.net. This means we have two authorities for the same zone.

My peer and I went back on forth with a few different solutions. One solution seemed obvious in that organization A would manually create an A record in their Azure Private DNS zone pointing to IP of the PrivateLink Endpoint in Organization B. Since the organizations had connectivity between the two environments, this would technically work. The challenge with this pattern is it would introduce a potential bottleneck depending on the size of the VPN pipe. It could also lead to egress costs for Organization A depending on how the VPN connection was implemented.

The other option we came up with was to create a Private Endpoint in Organization A’s Azure subscription which would be associated with the Azure SQL instance running in Organization B’s Azure subscription. This would avoid any egress costs, we wouldn’t be introducing a potential bottleneck, and we’d avoid the additional operational head of having to manually manage the A record in Organization A’s Azure Private DNS Zone. Neither of us had done this before and while it seemed to be possible based on Microsoft’s documentation, the how was a bit lacking when talking PaaS services.

To test this I used two separate personal tenants I keep to test scenarios that aren’t feasible to test with internal resources. My goal was to build an architecture like the below.

Target architecture

So was it possible? Why yes it was, and an added bonus I’m going to tell you how to do it.

When you create a Private Endpoint through the Azure Portal, there is a Connection Method radio button seen below. If you’re creating the Private Endpoint for a resource within the existing tenant you can choose the Connect to an Azure resource in my directory option and you get a handy guided selection tool. If you want to connect to a resource outside your tenant, you instead have to select the Connect to an Azure resource by resource ID or alias. In this field you would end the full resource ID of the resource you’re creating the Private Endpoint for, which in this case is the Azure SQL server resource id. You’ll be prompted to enter the sub-resource which for Azure SQL is SqlServer. Proceed to create the Private Endpoint.

Private Endpoint Creation

After the Private Endpoint has been created you’ll observe it has a Connection status of Pending. This is part of the approval workflow where someone with control over the resource in the destination tenant needs to approve of the connection to the Azure SQL server.

Private Endpoint in pending status

If you jump over to the other resource in the target tenant and select the Private endpoint connections menu option you’ll see there is a pending connection that needs approval along with a message from the requestor.

Private Endpoint request

Select the endpoint to approve and click the approve button. At that point the Private Endpoint in the requestor tenant and you’ll see it has been approved and is ready for use.

This was a fun little problem to work through. I was always under the assumption this would work, the documentation said it would work, but I’m a trust but verify type of person so I wanted to see and experience it for myself.

I hope you enjoyed the post and learned something new. Now back to practicing Kubernetes labs!

Interesting behaviors with Private Endpoints

Interesting behaviors with Private Endpoints

Hi folks!

Working for and with organizations in highly regulated industries like federal and state governments and commercial banks often necessitates diving REALLY deep into products and technologies. This means peeling back the layers of the onion most people do not. The reason this pops up is because these organizations tend to have extremely complex environments due the length of time the organization has existed and the strict laws and regulations they must abide by. This is probably the reason why I’ve always gravitated towards these industries.

I recently ran into an interesting use case where that willingness to dive deep was needed.

A customer I was working with was wrapping up its Azure landing zone deployment and was beginning to deploy its initial workloads. A number of these workloads used Microsoft Azure PaaS (platform-as-a-service) services such as Azure Storage and Azure Key Vault. The customer had made the wise choice to consume the services through Azure Private Endpoints. I’m not going to go into detail on the basics of Azure Private Endpoints. There is plenty of official Microsoft documentation that can cover the basics and give you the marketing pitch. You can check out my pasts posts on the topic such as my series on Azure Private DNS and Azure Private Endpoints.

This particular customer chose to use them to consume the services over a private connection from both within Azure and on-premises as well as to mitigate the risk of data exfiltration that exists when egressing the traffic to Internet public endpoints or using Azure Service Endpoints. One of the additional requirements the customer had as to mediate the traffic to Azure Private Endpoints using a security appliance. The security appliance was acting as a firewall to control traffic to the Private Endpoints as well to perform deep packet inspection sometime in the future. This is the requirement that drove me down into the weeds of Private Endpoints and lead to a lot of interesting observations about the behaviors of network traffic flowing to and back from Private Endpoints. Those are the observations I’ll be sharing today.

For this lab, I’ll be using a slightly modified version of my simple hub and spoke lab. I’ve modified and added the following items:

  • Virtual machine in hub runs Microsoft Windows DNS and is configured to forward all DNS traffic to Azure DNS (168.63.129.16)
  • Virtual machine in spoke is configured to use virtual machine in hub as a DNS server
  • Removed the route table from the spoke data subnet
  • Azure Private DNS Zone hosting the privatelink.blob.core.windows.net namespace
  • Azure Storage Account named mftesting hosting some sample objects in blob storage
  • Private Endpoint for the mftesting storage account blob storage placed in the spoke data subnet
Lab environment

The first interesting observation I made was that there was a /32 route for the Private Endpoint. While this is documented, I had never noticed it. In fact most of my peers I ran this by were never aware of it either, largely because the only way you would see it is if you enumerated effective routes for a VM and looked closely for it. Below I’ve included a screenshot of the effective routes on the VM in the spoke Virtual Network where the Private Endpoint was provisioned.

Effective routes on spoke VM

Notice the next hop type of InterfaceEndpoint. I was unable to find the next hop type of InterfaceEndpoint documented in public documentation, but it is indeed related to Private Endpoints. The magic behind that next hop type isn’t something that Microsoft documents publicly.

Now this route is interesting for a few reasons. It doesn’t just propagate to all of the route tables of subnets within the Virtual Network, it also propagates to all of the route tables in directly peered Virtual Networks. In the hub and spoke architecture that is recommended for Microsoft Azure, this means that every Private Endpoint you create in a spoke Virtual Network is propagated to as a system route to route tables of each subnet in the hub Virtual Network. Below you can see a screen of the VM running in the hub Virtual Network.

Effective routes on hub VM

This can make things complicated if you have a requirement such as the customer I was working with where the customer wants to control network traffic to the Private Endpoint. The only way to do that completely is to create a /32 UDRs (user defined routes) in every route table in both the hub and spoke. With a limit of 400 UDRs per route table, you can quickly see how this may break down at scale.

There is another interesting thing about this route. Recall from effective routes for the spoke VM, that there is a /32 system route for the Private Endpoint. Since this is the most specific route, all traffic should be routed directly to the Private Endpoint right? Let’s check that out. Here I ran a port scan against the Private Endpoint using nmap using the ICMP, UDP, and TCP protocols. I then opened the Log Analytics Workspace and ran a query across the Azure Firewall logs for any traffic to the Private Endpoint from the VM and lo and behold, there is the ICMP and UDP traffic nmap generated.

Captured UDP and ICMP traffic

Yes folks that /32 route is protocol aware and will only apply to TCP traffic. UDP and ICMP traffic will not be affected. Software defined networking is grand isn’t it? 🙂

You may be asking why the hell I decided to test this particular piece. The reason I followed this breadcrumb was my customer had setup a UDR to route traffic from the VM to an NVA in the hub and attempted to send an ICMP Ping to the Private Endpoint. In reviewing their firewall logs they saw only the ICMP traffic. This finding was what drove me to test all three protocols and make the observation that the route only affects TCP traffic.

Microsoft’s public documentation mentions that Private Endpoints only support TCP at this time, but the documentation does not specify that this system route does not apply to UDP and ICMP traffic. This can result in confusion such as it did for this customer.

So how did we resolve this for my customer? Well in a very odd coincidence, a wonderful person over at Microsoft recently published some patterns on how to approach this problem. You can (and should) read the documentation for the full details, but I’ll cover some of the highlights.

There are four patterns that are offered up. Scenario 3 is not applicable for any enterprise customer given that those customers will be using a hub and spoke pattern. Scenario 1 may work but in my opinion is going to architect you into a corner over the long term so I would avoid it if it were me. That leaves us with Scenario 2 and Scenario 4.

Scenario 2 is one I want to touch on first. Now if you have a significant background in networking, this scenario will leave you scratching your head a bit.

Microsoft Documentation Scenario 2

Notice how a UDR is applied to the subnet with the VM which will route traffic to Azure Firewall however, there is no corresponding UDR applied to the Private Endpoint. Now this makes sense since the Private Endpoint would ignore the UDR anyway since they don’t support UDRs at this time. Now you old networking geeks probably see the problem here. If the packet from the VM has to travel from A (the VM) to B (stateful firewall) to C (the Private Endpoint) the stateful firewall will make a note of that connection in its cache and be expecting packets coming back from the Private Endpoint representing the return traffic. The problem here is the Private Endpoint doesn’t know that it needs to take the C (Private Endpoint) to B (stateful firewall) to A (VM) because it isn’t aware of that route and you’d have an asymmetric routing situation.

If you’re like me, you’d assume you’d need to SNAT in this scenario. Oddly enough, due the magic of software defined routing, you do not. This struck me as very odd because in scenario 3 where everything is in the same Virtual Network you do need to SNAT. I’m not sure why this is, but sometimes accepting magic is part of living in a software defined world.

Finally, we come to scenario 4. This is a common scenario for most customers because who doesn’t want to access Azure PaaS services over an ExpressRoute connection vs an Internet connection? For this scenario, you again need to SNAT. So honestly, I’d just SNAT for both scenario 2 and 4 to make maintain consistency. I have successfully tested scenario 2 with SNAT so it does indeed work as you expect it would.

Well folks I hope you found this information helpful. While much of it is mentioned in public documentation, it lacks the depth that those of us working in complex environments need and those of us who like to geek out a bit want.

See you next post!

Azure Files and AD DS – Part 2

Azure Files and AD DS – Part 2

Hi there and welcome to the second post in my series about Azure Files integration with AD DS. In the first post I gave an overview of the service, the value proposition, its current limitations, and described the lab I’ll be using for this post. For this post I’ll be walking through the setup, examining some packet captures and Fiddler captures, and touching on a few of the gotchas I ran into.

Before I jump into the technical gooey goodness, I’m going to cover some prerequisites.

Prerequisites

One obvious factoid is you’ll need a Windows AD domain up and running and the machine you connect to the share from will need to be joined to that domain. One disclaimer to keep in mind is you have a multiple Windows AD forest scenario, such as an account and resource forest, you’ll need to be aware of which domain you’re integrating the Azure File share with. If you integrate it with a resource forest but have your user accounts in the account forest, you’ll need to use name suffix routing. I’m not going to go into the details of name suffix routing, but if you’re curious you can read through this article. The short of it is the service principal name associated with the computer or service account used to represent the Azure Storage account the file share is created on uses the domain of files.core.windows.net. When performing the Kerberos authentication, the domain controller in the account forest wouldn’t know how to to direct the request to the resource forest because that domain will not be associated with your resource forest. For this lab I created a Windows AD domain with the namespace jogcloud.local

You will also need to ensure that you are synchronizing the users and groups from your Windows AD domain you want to be able to access the Azure File share to Azure AD. The tenant you synchronize to must be the same tenant the Azure subscription containing the Azure Storage account is associated with. Don’t worry about why right now, I’ll cover that later. For this lab I’ll be using my jogcloud.com tenant.

Logical Layout

To store those wonderful files you’ll need an Azure Storage account. The storage account should be created in the same region (or closest region if on-premises) to the clients that will access the file share. This will ensure optimal performance and avoid cross region costs if your clients are in Azure. You can use either a storage account with the standard GPv2 (General Purpose v2) SKU or Premium FileStorage SKU if you need better performance and scale. For this lab I’ll be using the GPv2 SKU.

Lastly, networking requirements. Like all Azure PaaS offerings, Azure Storage is by default available over the public Internet. Since no sane human being wants to send SMB traffic over the public Internet, you have the option of using a private endpoint. For this lab I’ll be going the private endpoint route.

So prerequisites are now set, let’s jump into the setup.

Integrating Azure Storage account with Windows AD

The first step in the process to get this integration working is to get the Azure Storage account you’ll be using setup with an identity in Windows AD. A kind human being over at Microsoft wrote a wonderful Azure PowerShell module that makes what I’m about to do a hundred times easier and is the recommended way to go about this. I’m not going to use it for this demonstration because I want to walk through each of the steps in the process to better your understanding of the magic within the module.

Before you run any commands you’ll need to ensure you have the PowerShell modules below installed. You can validate this by running Get-Module -ListAvailable to display the PowerShell modules installed on the machine.

Now we need to create the security principal that is going to represent the Azure Storage account in Windows AD. You have the option of either using a traditional service account (user account) or a computer account. As of August 28th 2020, there are some limitations you’ll run into if you use a service account over a computer account. My recommendation is use a computer account for now. I’ll cover the limitation later on in this entry. For the purposes of this blog post, I’ll be using a service account.

One important thing to note here is you need to treat this just like you would a traditional service account. By this I mean you will want to create the account with a non-expiring password and put in appropriate controls to perform a controlled rotation of the credential to avoid service disruption.

Here I’ve created a service account with the name azurestorage and have set it with a non-expiring password.

Service account for Azure Storage account

Next up I’m going create a SPN (service principal name) for the service account. The SPN is going to identify the Azure Storage account to Windows AD and instruct the user’s system which service it needs to obtain a Kerberos ticket for. The SPN is going to use the CIFS service class and include the FQDN of the Azure Files endpoint on your storage account. It will look like cifs/stjogfileshare.file.core.windows.net. You can register the SPN using the setspn -S <SPN> <ACCOUNT_NAME>. The -S switch will validate there the SPN is not already registered to another security principal in the domain.

Setting the SPN

So you have a service account and an SPN. Now you need to create a credential in Azure Storage and associate that credential with the service account. To create that credential you’ll need to hop over to PowerShell and connect to Azure. Once connected you’ll use the New-AzStorageAccountKey and Get-AzStorageAccountKey cmdlets to create and retrieve the storage account key used for the integration. It’s important to note that this key (named kerb1 or kerb2) is only used to setup this integration and can’t be used for any control or data plane operations against the storage account.

Configure the service account with this key as its password.

Creating the account key

The last piece in this step of the process is to enable the AD DS feature support for the storage account. To do this you’ll use the Set-AzStorageAccount cmdlet using the syntax below.

Set-AzStorageAccount syntax

All of the inputs between Name and ActiveDirectoryAzureStorageSid can be obtained by using the Get-ADDomain cmdlet as seen below.

Get-ADDomain Output

The ActiveDirectoryAzureStorageSid parameter can be obtain by using the Get-ADUser cmdlet as seen below.

Get-ADUser Output

Once you have the inputs, you’ll plug them in Set-AzStorageAccount cmdlet. If successful you’ll get a return similar to below.

Set-AzStorageAccount Output

If you’d like you can confirm the feature is enabled you can do that with the steps documented here. The AzFilesHybrid module I mentioned earlier also has some great debugging tools as outlined here.

Now that the integration is complete, I need to create a file share and configure authorization at the management plane. There are two separate layers of authorization occurring, one for access to the file share itself and the other for access to the files and folders within the file share. Access to the share itself is controlled by Azure RBAC and thus controlled by the Azure management plane. There are three roles built in roles provided that should service most use cases and these are:

  1. Storage File Data SMB Share Read which allows read access to the file share over SMB
  2. Storage File Data SMB Share Contributor which allows read, write, and delete access over the file share over SMB
  3. Storage File Data SMB Share Elevated Contributor which allows read, write, delete, and modify of Windows ACLs of the file share over SMB

You are free to design your own custom roles, but those three built in roles are pretty much spot on as to what you’d see in your typical Windows File share-level permissions.

The second layer of authorization is controlled by the Windows ACLs (access control lists) associated with the share, files, and folders. These are your classic Windows ACLs you know and love and will be enforced by the Windows OS.

Just like on a traditional Windows file share, the most restrictive of controls will apply. This means if you’ve only been granted the Storage File Data SMB Share Read role, it won’t matter if you have full permissions in the Windows ACLs, you will only be able to read and will not be able to write.

Let me demonstrate this.

Here I have assigned the Bob Gray user the Storage File Data SMB Share Reader role on the stjogfileshare storage account. Bob Gray is Domain administrator on the jogcloud.local Windows AD domain and is a local administrator on the member server.

Role Assignment on Storage Account

As seen below I’m able to successfully map the shared folder, but I’m unable to create folder on it because the management plane is restricting my access.

Unable to created a folder when holding Read RBAC role

Running the klist command on the machine shows I successfully obtained a Kerberos ticket for file share.

klist Output

The packet capture I ran when I mapped the share shows in the SMB conversation that the client and server are using the negotiate protocol (which includes the Kerberos protocol).

Session Negotiation

After the Kerberos ticket is obtained from the domain controller the client sets up the session with the storage account.

Session Setup

From this point forward, the encryption capabilities of SMB 3 are used to encrypt the session between the client and Azure storage account.

Remember the limitation around using a service account as the security principal representing Azure Files I mentioned earlier? Well that limitation is around the encryption algorithm used to encrypt the Kerberos tickets passed to Azure Files. Out of the gates, the Azure Files with AD DS feature only supported the RC4-HMAC encryption algorithm. This is a deprecated algorithm according to the IETF (Internet Engineering Task Force) and should not be used. Many organizations in regulated environments have disabled this encryption algorithm in Active Directory due to its deprecation.

As of August 28th, 2020, the integration now supports AES256 encryption for Kerberos. However, this is only supported if you’re using a Computer account. This means that if you are already using the service with a service account and you want to move to AES256, you’ll need to migrate to using a Computer account. I expect support for a service account will be added sometime in the future, so check the official documentation for updates.

If you try to use a service account and attempt to enforce the use of AES256, your connection will fail. If you do a packet capture you’ll see the Azure Storage account throw a KRB Error: KRB5RB_AP_ERR_MODIFIED indicating the Azure Storage account was unable to validate the ticket that was passed because it doesn’t support the encryption algorithm used to secure it.

I went through and created a group in Windows AD named engineering and added Bob Gray to it. I then removed the Storage File Data SMB Share Read role assignment for Bob Gray and created a role assignment for the engineering group for the Storage File Data SMB Share Contributor role. I’m now able to create files and folders on the share as seen below.

Creating folder on share

Bringing up the permissions on the folders you’ll observe that that are a few default permissions which come out of the box. You can modify these default permission if you’d like (for example by removing Authenticated Users Read/Modify which is overly permissive). You do this by mounting the share as a super user using the standard storage keys. The process is outlined here.

Default permissions

That is pretty much all there is to it to the technical configuration.

So when you use this service what are some of the best practices that I would recommend?

  1. Use a computer account if you require an encryption algorithm better than RC-HMAC. At this time, computer accounts are the only type of security principal which supports AES256 encryption.
  2. Ensure you rotate the password at whatever interval aligns with your organizational security policy and any laws and regulations you may be subject to.
  3. When you create your Azure RBAC role assignments, use synchronized groups vs synchronized users. You do this for the same reason you would on-premises, granting access per user is not scalable.
  4. Dedicate the storage account you use for the file share to only file shares. Storage accounts have fixed limits that are shared across blobs, queues, tables, and files. You don’t want to get into a situation where you have to share those limits.
  5. Do your research to determine if Premium FileStorage makes more sense than GPv2. It’s more costly but provides better performance and scale.
  6. Try to deploy one file share per storage account if possible to ensure you get the maximum IOPS available for that file share. You can certainly put multiple file shares in the same storage account, but they will share the total IOPS available for the storage account.
  7. Ensure you are replicating files to another storage account. Unlike blobs, you can’t read from the second region if you’re using a RA-GRS storage account. If you’re using the Premium Files SKU, the storage account will only support LRS and ZRS which makes this replication to a storage account in another region so important. You could use AzCopy, PowerShell, or Azure Data Factory.

That’s it folks! Hope this post helped you understand feature that much better.

Thanks and see you next post!