Welcome back fellow geeks! I hope you managed to have an enjoyable holiday and took a break from the grind. I took a good week off, and minus some reading around AKS (Azure Kubernetes Service), I completely shut off the work and tech side of my brain. It was a great change!
Today I am back with a new entry into my What If series. In this entry I’ll be covering an interesting quirk of ASR (Azure Site Recovery) that I ran into helping a customer test out the service. For those of you unfamiliar with ASR, it’s a managed service in Microsoft Azure that provides business continuity and disaster recovery for VMs (virtual machines) both within Azure and on-premises. It can also be used to when migrating VMs from on-premises to Azure, between regions, or between availability zones.
With the quick introduction to the service out of the way, let’s get to it.
What if I wanted to test Azure Site Recovery with both Windows and Linux virtual machines?
Performance benefits by shifting the encryption engine out of the operating system
No limitations on specific images for the virtual machine
No VM extensions required
Can be combined with host-based encryption for end to end encryption
The customer followed the instructions on how to set up ASR for SSE with CMK-enabled disks referenced here. Replication was successful but they noticed the data disk they had attached to the VM in the source region was not automatically attached to the VM in the destination region and required manual attachment. While an inconvenience for a single test, this could create a huge headache at scale when you’re talking about hundreds of VMs.
After receiving the email I immediately spun up a very simple environment in Azure in the East US 2 region consisting of a single virtual network, Windows VM with an attached data disk, Azure Key Vault instance with a single key, and a disk encryption set. In the Central US region I created a second virtual network, Azure Key Vault instance with a single key, and a disk encryption set. My plan was to fail the VM over from East US 2 to Central US using ASR.
As I went through the enablement process I ensured both the OS disk and the data disk were selected for replication as seen in the image below.
Reviewing the configuration shows that two managed disks are set to replicate.
After confirmation I left it alone and came back an hour later and checked the destination resource group. Replicas of both managed disks are present in the destination resource group. Good so far.
I then did a test failover and pulled up the VM and observed the same thing my customer did. The data disk was not attached even though it was replicated. I was able to manually attach it without an issue, but again, how does this work at scale? Even more interesting was the status of the VM in the Site Recovery section of the Recovery Services Vault. It did not show the data disk as being replicated.
I ran through the same process a few more times and ran into the same result each time. To make sure it wasn’t an issue with the information the portal was displaying, I wrote some quick Python code to hit the replicationprotecteditem endpoint within the ARM REST API. The results from the API also included only the OS disk in the replication status.
Was this a bug? Did both the customer and I mess something up in setting this up? Turns out it was neither. This is actually expected behavior when replicating a Windows VM when a data disk is attached that is uninitialized in the OS (operating system). For you young folks out there that have never initialized a disk in Microsoft Windows or those of you don’t spend much time in Windows, initialization consists of creating the partition table on the drive which must occur prior formatting the partition with a file system. So why is this required? I’m not really sure and can only theorize. A friend and I talked this over and we theorize it may be a requirement to ensure the disk has some unique identifier in the operating system which may not be able to be generated without the disk first being partitioned.
Note that this issue only occurs with Windows VMs with an uninitialized data disk. It does not occur if the disk has been initialized in Windows and does not occur at all with a Linux VM whether the disk has been partitioned or not. In those cases the data disk will be attached after the VM is failed over.
So there you go folks. If you decide to test out ASR for a proof-of-concept or just a learning experience, remember to initialize your disks on your Windows VMs!
I’ve been busy lately buried in learning and practicing Kubernetes in preparation for the Certified Kubernetes Administrator exam. Tonight I’m taking a break to bring you another entry into the “What If” series I started a few months back.
Let’s get right to it.
What if I need to access a Private Endpoint in a subscription associated with a different Azure AD tenant and I have an existing Azure Private DNS Zone already?
I’ve been helping a good friend who recently joined Microsoft to support his customer as he gets up to speed on the Azure platform. This customer consists of two very large organizations which have a high degree of independence. Each of these organizations have their own Azure AD tenant and their own Azure footprint. One organization is further along in their cloud journey than the other.
Organization A (new to Azure) needed to consume some data that existed in an Azure SQL database in an Azure subscription associated with Organization B’s tenant. Both organizations have strict security and compliance requirements so they are heavy users of Azure PrivateLink Endpoints. A site-to-site VPN (virtual private network) connection was established between the two organizations to facilitate network communication between the Azure environments.
The customer environment looked similar to the above where a machine on-premises in Organization A needed to access the Azure SQL database in Organization B. If you look closely, you probably see the problem already. From a DNS perspective, we have two Azure Private DNS Zones for privatelink.database.windows.net. This means we have two authorities for the same zone.
My peer and I went back on forth with a few different solutions. One solution seemed obvious in that organization A would manually create an A record in their Azure Private DNS zone pointing to IP of the PrivateLink Endpoint in Organization B. Since the organizations had connectivity between the two environments, this would technically work. The challenge with this pattern is it would introduce a potential bottleneck depending on the size of the VPN pipe. It could also lead to egress costs for Organization A depending on how the VPN connection was implemented.
The other option we came up with was to create a Private Endpoint in Organization A’s Azure subscription which would be associated with the Azure SQL instance running in Organization B’s Azure subscription. This would avoid any egress costs, we wouldn’t be introducing a potential bottleneck, and we’d avoid the additional operational head of having to manually manage the A record in Organization A’s Azure Private DNS Zone. Neither of us had done this before and while it seemed to be possible based on Microsoft’s documentation, the how was a bit lacking when talking PaaS services.
To test this I used two separate personal tenants I keep to test scenarios that aren’t feasible to test with internal resources. My goal was to build an architecture like the below.
So was it possible? Why yes it was, and an added bonus I’m going to tell you how to do it.
When you create a Private Endpoint through the Azure Portal, there is a Connection Method radio button seen below. If you’re creating the Private Endpoint for a resource within the existing tenant you can choose the Connect to an Azure resource in my directory option and you get a handy guided selection tool. If you want to connect to a resource outside your tenant, you instead have to select the Connect to an Azure resource by resource ID or alias. In this field you would end the full resource ID of the resource you’re creating the Private Endpoint for, which in this case is the Azure SQL server resource id. You’ll be prompted to enter the sub-resource which for Azure SQL is SqlServer. Proceed to create the Private Endpoint.
After the Private Endpoint has been created you’ll observe it has a Connection status of Pending. This is part of the approval workflow where someone with control over the resource in the destination tenant needs to approve of the connection to the Azure SQL server.
If you jump over to the other resource in the target tenant and select the Private endpoint connections menu option you’ll see there is a pending connection that needs approval along with a message from the requestor.
Select the endpoint to approve and click the approve button. At that point the Private Endpoint in the requestor tenant and you’ll see it has been approved and is ready for use.
This was a fun little problem to work through. I was always under the assumption this would work, the documentation said it would work, but I’m a trust but verify type of person so I wanted to see and experience it for myself.
I hope you enjoyed the post and learned something new. Now back to practicing Kubernetes labs!
Working for and with organizations in highly regulated industries like federal and state governments and commercial banks often necessitates diving REALLY deep into products and technologies. This means peeling back the layers of the onion most people do not. The reason this pops up is because these organizations tend to have extremely complex environments due the length of time the organization has existed and the strict laws and regulations they must abide by. This is probably the reason why I’ve always gravitated towards these industries.
I recently ran into an interesting use case where that willingness to dive deep was needed.
A customer I was working with was wrapping up its Azure landing zone deployment and was beginning to deploy its initial workloads. A number of these workloads used Microsoft Azure PaaS (platform-as-a-service) services such as Azure Storage and Azure Key Vault. The customer had made the wise choice to consume the services through Azure Private Endpoints. I’m not going to go into detail on the basics of Azure Private Endpoints. There is plenty of official Microsoft documentation that can cover the basics and give you the marketing pitch. You can check out my pasts posts on the topic such as my series on Azure Private DNS and Azure Private Endpoints.
This particular customer chose to use them to consume the services over a private connection from both within Azure and on-premises as well as to mitigate the risk of data exfiltration that exists when egressing the traffic to Internet public endpoints or using Azure Service Endpoints. One of the additional requirements the customer had as to mediate the traffic to Azure Private Endpoints using a security appliance. The security appliance was acting as a firewall to control traffic to the Private Endpoints as well to perform deep packet inspection sometime in the future. This is the requirement that drove me down into the weeds of Private Endpoints and lead to a lot of interesting observations about the behaviors of network traffic flowing to and back from Private Endpoints. Those are the observations I’ll be sharing today.
Virtual machine in hub runs Microsoft Windows DNS and is configured to forward all DNS traffic to Azure DNS (126.96.36.199)
Virtual machine in spoke is configured to use virtual machine in hub as a DNS server
Removed the route table from the spoke data subnet
Azure Private DNS Zone hosting the privatelink.blob.core.windows.net namespace
Azure Storage Account named mftesting hosting some sample objects in blob storage
Private Endpoint for the mftesting storage account blob storage placed in the spoke data subnet
The first interesting observation I made was that there was a /32 route for the Private Endpoint. While this is documented, I had never noticed it. In fact most of my peers I ran this by were never aware of it either, largely because the only way you would see it is if you enumerated effective routes for a VM and looked closely for it. Below I’ve included a screenshot of the effective routes on the VM in the spoke Virtual Network where the Private Endpoint was provisioned.
Notice the next hop type of InterfaceEndpoint. I was unable to find the next hop type of InterfaceEndpoint documented in public documentation, but it is indeed related to Private Endpoints. The magic behind that next hop type isn’t something that Microsoft documents publicly.
Now this route is interesting for a few reasons. It doesn’t just propagate to all of the route tables of subnets within the Virtual Network, it also propagates to all of the route tables in directly peered Virtual Networks. In the hub and spoke architecture that is recommended for Microsoft Azure, this means that every Private Endpoint you create in a spoke Virtual Network is propagated to as a system route to route tables of each subnet in the hub Virtual Network. Below you can see a screen of the VM running in the hub Virtual Network.
This can make things complicated if you have a requirement such as the customer I was working with where the customer wants to control network traffic to the Private Endpoint. The only way to do that completely is to create a /32 UDRs (user defined routes) in every route table in both the hub and spoke. With a limit of 400 UDRs per route table, you can quickly see how this may break down at scale.
There is another interesting thing about this route. Recall from effective routes for the spoke VM, that there is a /32 system route for the Private Endpoint. Since this is the most specific route, all traffic should be routed directly to the Private Endpoint right? Let’s check that out. Here I ran a port scan against the Private Endpoint using nmap using the ICMP, UDP, and TCP protocols. I then opened the Log Analytics Workspace and ran a query across the Azure Firewall logs for any traffic to the Private Endpoint from the VM and lo and behold, there is the ICMP and UDP traffic nmap generated.
Yes folks that /32 route is protocol aware and will only apply to TCP traffic. UDP and ICMP traffic will not be affected. Software defined networking is grand isn’t it? 🙂
You may be asking why the hell I decided to test this particular piece. The reason I followed this breadcrumb was my customer had setup a UDR to route traffic from the VM to an NVA in the hub and attempted to send an ICMP Ping to the Private Endpoint. In reviewing their firewall logs they saw only the ICMP traffic. This finding was what drove me to test all three protocols and make the observation that the route only affects TCP traffic.
Microsoft’s public documentation mentions that Private Endpoints only support TCP at this time, but the documentation does not specify that this system route does not apply to UDP and ICMP traffic. This can result in confusion such as it did for this customer.
So how did we resolve this for my customer? Well in a very odd coincidence, a wonderful person over at Microsoft recently published some patterns on how to approach this problem. You can (and should) read the documentation for the full details, but I’ll cover some of the highlights.
There are four patterns that are offered up. Scenario 3 is not applicable for any enterprise customer given that those customers will be using a hub and spoke pattern. Scenario 1 may work but in my opinion is going to architect you into a corner over the long term so I would avoid it if it were me. That leaves us with Scenario 2 and Scenario 4.
Scenario 2 is one I want to touch on first. Now if you have a significant background in networking, this scenario will leave you scratching your head a bit.
Notice how a UDR is applied to the subnet with the VM which will route traffic to Azure Firewall however, there is no corresponding UDR applied to the Private Endpoint. Now this makes sense since the Private Endpoint would ignore the UDR anyway since they don’t support UDRs at this time. Now you old networking geeks probably see the problem here. If the packet from the VM has to travel from A (the VM) to B (stateful firewall) to C (the Private Endpoint) the stateful firewall will make a note of that connection in its cache and be expecting packets coming back from the Private Endpoint representing the return traffic. The problem here is the Private Endpoint doesn’t know that it needs to take the C (Private Endpoint) to B (stateful firewall) to A (VM) because it isn’t aware of that route and you’d have an asymmetric routing situation.
If you’re like me, you’d assume you’d need to SNAT in this scenario. Oddly enough, due the magic of software defined routing, you do not. This struck me as very odd because in scenario 3 where everything is in the same Virtual Network you do need to SNAT. I’m not sure why this is, but sometimes accepting magic is part of living in a software defined world.
Finally, we come to scenario 4. This is a common scenario for most customers because who doesn’t want to access Azure PaaS services over an ExpressRoute connection vs an Internet connection? For this scenario, you again need to SNAT. So honestly, I’d just SNAT for both scenario 2 and 4 to make maintain consistency. I have successfully tested scenario 2 with SNAT so it does indeed work as you expect it would.
Well folks I hope you found this information helpful. While much of it is mentioned in public documentation, it lacks the depth that those of us working in complex environments need and those of us who like to geek out a bit want.
Hi there and welcome to the second post in my series about Azure Files integration with AD DS. In the first post I gave an overview of the service, the value proposition, its current limitations, and described the lab I’ll be using for this post. For this post I’ll be walking through the setup, examining some packet captures and Fiddler captures, and touching on a few of the gotchas I ran into.
Before I jump into the technical gooey goodness, I’m going to cover some prerequisites.
One obvious factoid is you’ll need a Windows AD domain up and running and the machine you connect to the share from will need to be joined to that domain. One disclaimer to keep in mind is you have a multiple Windows AD forest scenario, such as an account and resource forest, you’ll need to be aware of which domain you’re integrating the Azure File share with. If you integrate it with a resource forest but have your user accounts in the account forest, you’ll need to use name suffix routing. I’m not going to go into the details of name suffix routing, but if you’re curious you can read through this article. The short of it is the service principal name associated with the computer or service account used to represent the Azure Storage account the file share is created on uses the domain of files.core.windows.net. When performing the Kerberos authentication, the domain controller in the account forest wouldn’t know how to to direct the request to the resource forest because that domain will not be associated with your resource forest. For this lab I created a Windows AD domain with the namespace jogcloud.local
You will also need to ensure that you are synchronizing the users and groups from your Windows AD domain you want to be able to access the Azure File share to Azure AD. The tenant you synchronize to must be the same tenant the Azure subscription containing the Azure Storage account is associated with. Don’t worry about why right now, I’ll cover that later. For this lab I’ll be using my jogcloud.com tenant.
To store those wonderful files you’ll need an Azure Storage account. The storage account should be created in the same region (or closest region if on-premises) to the clients that will access the file share. This will ensure optimal performance and avoid cross region costs if your clients are in Azure. You can use either a storage account with the standard GPv2 (General Purpose v2) SKU or Premium FileStorage SKU if you need better performance and scale. For this lab I’ll be using the GPv2 SKU.
Lastly, networking requirements. Like all Azure PaaS offerings, Azure Storage is by default available over the public Internet. Since no sane human being wants to send SMB traffic over the public Internet, you have the option of using a private endpoint. For this lab I’ll be going the private endpoint route.
So prerequisites are now set, let’s jump into the setup.
Integrating Azure Storage account with Windows AD
The first step in the process to get this integration working is to get the Azure Storage account you’ll be using setup with an identity in Windows AD. A kind human being over at Microsoft wrote a wonderful Azure PowerShell module that makes what I’m about to do a hundred times easier and is the recommended way to go about this. I’m not going to use it for this demonstration because I want to walk through each of the steps in the process to better your understanding of the magic within the module.
Before you run any commands you’ll need to ensure you have the PowerShell modules below installed. You can validate this by running Get-Module -ListAvailable to display the PowerShell modules installed on the machine.
Now we need to create the security principal that is going to represent the Azure Storage account in Windows AD. You have the option of either using a traditional service account (user account) or a computer account. As of August 28th 2020, there are some limitations you’ll run into if you use a service account over a computer account. My recommendation is use a computer account for now. I’ll cover the limitation later on in this entry. For the purposes of this blog post, I’ll be using a service account.
One important thing to note here is you need to treat this just like you would a traditional service account. By this I mean you will want to create the account with a non-expiring password and put in appropriate controls to perform a controlled rotation of the credential to avoid service disruption.
Here I’ve created a service account with the name azurestorage and have set it with a non-expiring password.
Next up I’m going create a SPN (service principal name) for the service account. The SPN is going to identify the Azure Storage account to Windows AD and instruct the user’s system which service it needs to obtain a Kerberos ticket for. The SPN is going to use the CIFS service class and include the FQDN of the Azure Files endpoint on your storage account. It will look like cifs/stjogfileshare.file.core.windows.net. You can register the SPN using the setspn -S <SPN> <ACCOUNT_NAME>. The -S switch will validate there the SPN is not already registered to another security principal in the domain.
So you have a service account and an SPN. Now you need to create a credential in Azure Storage and associate that credential with the service account. To create that credential you’ll need to hop over to PowerShell and connect to Azure. Once connected you’ll use the New-AzStorageAccountKey and Get-AzStorageAccountKey cmdlets to create and retrieve the storage account key used for the integration. It’s important to note that this key (named kerb1 or kerb2) is only used to setup this integration and can’t be used for any control or data plane operations against the storage account.
Configure the service account with this key as its password.
The last piece in this step of the process is to enable the AD DS feature support for the storage account. To do this you’ll use the Set-AzStorageAccount cmdlet using the syntax below.
All of the inputs between Name and ActiveDirectoryAzureStorageSid can be obtained by using the Get-ADDomain cmdlet as seen below.
The ActiveDirectoryAzureStorageSid parameter can be obtain by using the Get-ADUser cmdlet as seen below.
Once you have the inputs, you’ll plug them in Set-AzStorageAccount cmdlet. If successful you’ll get a return similar to below.
If you’d like you can confirm the feature is enabled you can do that with the steps documented here. The AzFilesHybrid module I mentioned earlier also has some great debugging tools as outlined here.
Now that the integration is complete, I need to create a file share and configure authorization at the management plane. There are two separate layers of authorization occurring, one for access to the file share itself and the other for access to the files and folders within the file share. Access to the share itself is controlled by Azure RBAC and thus controlled by the Azure management plane. There are three roles built in roles provided that should service most use cases and these are:
Storage File Data SMB Share Read which allows read access to the file share over SMB
Storage File Data SMB Share Contributor which allows read, write, and delete access over the file share over SMB
Storage File Data SMB Share Elevated Contributor which allows read, write, delete, and modify of Windows ACLs of the file share over SMB
You are free to design your own custom roles, but those three built in roles are pretty much spot on as to what you’d see in your typical Windows File share-level permissions.
The second layer of authorization is controlled by the Windows ACLs (access control lists) associated with the share, files, and folders. These are your classic Windows ACLs you know and love and will be enforced by the Windows OS.
Just like on a traditional Windows file share, the most restrictive of controls will apply. This means if you’ve only been granted the Storage File Data SMB Share Read role, it won’t matter if you have full permissions in the Windows ACLs, you will only be able to read and will not be able to write.
Let me demonstrate this.
Here I have assigned the Bob Gray user the Storage File Data SMB Share Reader role on the stjogfileshare storage account. Bob Gray is Domain administrator on the jogcloud.local Windows AD domain and is a local administrator on the member server.
As seen below I’m able to successfully map the shared folder, but I’m unable to create folder on it because the management plane is restricting my access.
Running the klist command on the machine shows I successfully obtained a Kerberos ticket for file share.
The packet capture I ran when I mapped the share shows in the SMB conversation that the client and server are using the negotiate protocol (which includes the Kerberos protocol).
After the Kerberos ticket is obtained from the domain controller the client sets up the session with the storage account.
From this point forward, the encryption capabilities of SMB 3 are used to encrypt the session between the client and Azure storage account.
As of August 28th, 2020, the integration now supports AES256 encryption for Kerberos. However, this is only supported if you’re using a Computer account. This means that if you are already using the service with a service account and you want to move to AES256, you’ll need to migrate to using a Computer account. I expect support for a service account will be added sometime in the future, so check the official documentation for updates.
If you try to use a service account and attempt to enforce the use of AES256, your connection will fail. If you do a packet capture you’ll see the Azure Storage account throw a KRB Error: KRB5RB_AP_ERR_MODIFIED indicating the Azure Storage account was unable to validate the ticket that was passed because it doesn’t support the encryption algorithm used to secure it.
I went through and created a group in Windows AD named engineering and added Bob Gray to it. I then removed the Storage File Data SMB Share Read role assignment for Bob Gray and created a role assignment for the engineering group for the Storage File Data SMB Share Contributor role. I’m now able to create files and folders on the share as seen below.
Bringing up the permissions on the folders you’ll observe that that are a few default permissions which come out of the box. You can modify these default permission if you’d like (for example by removing Authenticated Users Read/Modify which is overly permissive). You do this by mounting the share as a super user using the standard storage keys. The process is outlined here.
That is pretty much all there is to it to the technical configuration.
So when you use this service what are some of the best practices that I would recommend?
Use a computer account if you require an encryption algorithm better than RC-HMAC. At this time, computer accounts are the only type of security principal which supports AES256 encryption.
Ensure you rotate the password at whatever interval aligns with your organizational security policy and any laws and regulations you may be subject to.
When you create your Azure RBAC role assignments, use synchronized groups vs synchronized users. You do this for the same reason you would on-premises, granting access per user is not scalable.
Dedicate the storage account you use for the file share to only file shares. Storage accounts have fixed limits that are shared across blobs, queues, tables, and files. You don’t want to get into a situation where you have to share those limits.
Do your research to determine if Premium FileStorage makes more sense than GPv2. It’s more costly but provides better performance and scale.
Try to deploy one file share per storage account if possible to ensure you get the maximum IOPS available for that file share. You can certainly put multiple file shares in the same storage account, but they will share the total IOPS available for the storage account.
Ensure you are replicating files to another storage account. Unlike blobs, you can’t read from the second region if you’re using a RA-GRS storage account. If you’re using the Premium Files SKU, the storage account will only support LRS and ZRS which makes this replication to a storage account in another region so important. You could use AzCopy, PowerShell, or Azure Data Factory.
That’s it folks! Hope this post helped you understand feature that much better.