DNS in Microsoft Azure – DNS Security Policies

Posted on August 3, 2025 by mattfeltonma

This is part of my series on DNS in Microsoft Azure.

Hi there folks! After a busy July packed with a vacation and an insane amount of work , I’m back with a new post. Today I’m going to cover a new feature that has been years coming. Yes folks, DNS query logging is now native to the platform with the introduction of DNS Security Policies into GA (generally available) last month. No longer will you have to solution around this long painful gap. In this post I’ll walk through what this new resource is, what it can do (beyond DNS query logging), cover the use cases I’ve tested with it, show you some samples of the logs, and finally cover some potential designs to incorporate it. Let’s dive in!

A long time coming

If you’ve ever spent time troubleshooting a connection error or trying to detect, block, and analyze malware you are likely familiar with the value of DNS query logs. The former makes it a must for day-to-day operations and the latter a critical piece of data for information security. Historically, it’s been a pain to gather this in Microsoft Azure. The wire server (magic IP, 168 address, whatever your favorite nickname) that is made available within a virtual network to use Azure’s built-in DNS resolution service has lacked the capability to capture DNS queries. This mean queries from compute within your virtual network that were resolving to Azure Private DNS zones or a public DNS zone via Azure-provided DNS weren’t captured. Even the introduction of the Azure Private Resolver didn’t address this gap. This lead to customers with requirements to capture DNS query logs having to get fancy.

The most common pattern customers used to address this gap was to introduce a third-party DNS service like an Infoblox, Bluecat, BIND server, or even Windows DNS Server that all compute running within Azure would use for resolution. While customers were able to use this pattern to get the logs, it meant more virtual machines, more costs, more overhead, and it was typically too expensive to implement for workloads that may require complete isolation and didn’t fit into a typical hub and spoke pattern.

**Example design for BYODNS for query logging**

When the Azure Private Resolver service got introduced along with DNS Forwarding Rule Sets, customers using Azure Firewall had the option of ditching the third-party DNS service and using Azure Firewall’s DNS proxy service which included DNS query logging (kind odd it went there first, right?). This was another common pattern I saw pop up in that Azure Firewall customer base.

**Example design using Azure Firewall for DNS query logging and Azure Private DNS Resolver**

Beyond whatever other creative ways customers were addressing this gap, it was a gap and it was costing customers extra money. In comes DNS Security Policies to save the day.

DNS Security Policies Components

DNS Security Policies provide 2 core functions today:

DNS query filtering
DNS query logging

Before I dive into those features in depth, I’m a fan of looking at the resource as a whole from the API layer to get an idea of the components, their purpose, and their relationships.

**DNS Security Policies and related resources**

DNS Security Policies fall under the Microsoft.Network resource provider and are regional resources. The simplest way to understand a resource provider is to think of a namespace in traditional programming. Within a namespace there are resource types (think classes) with specific resource operations. Within the Microsoft.Network resource provider, the three direct children resources that are key.

You’ll notice the Microsoft Learn documentation uses different terminology from what the API uses for some of the resources. To keep things simple, I’ll be using the Microsoft Learn documentation. Here is a quick cheat sheet:

DNS Resolver Policies -> DNS Security Policy
DNS Security Rules -> DNS Traffic Rules
DNS Resolver Domain Lists -> Domain Lists

Each DNS Security Policy has two children resources: DNS Traffic Rules and Virtual Network Links. DNS traffic rules are the guts of your logic for the DNS Security Policy. Each policy can have up to 10 rules (as of August 2025). Each rule consists of a priority (100 – 65000), action (block, allow, alert), and related domain list (I’ll cover these in a few). You can create multiple rules and order them in priority similar to the screenshot below.

Based on the above logic, when the DNS Security Policy triggers a rule based on the domain matching the associated domain list. If the domain being requested is in the list associated with the priority 100 rule, the query is blocked. If not, it’s then processed by the alert rule (which seems to do nothing in my experience as I’ll cover later). Finally, it will hit the last rule which will allow it through but log it.

As I covered above, each rule is associate with one or more domain list. Domain lists are sibling resources to DNS Security Policies. By being a sibling vs a child, they can be re-used across multiple DNS Security Policies (and whatever other use Microsoft comes up with). This allows you to define your domain lists centrally and re-use them across multiple rule sets if, for example, you wanted to maintain your domains lists consistently across environments (test/qa/prod/etc). Domain lists are pretty simple resources consisting of a domain name or wildcard (denoted by a period). It’s important to understand how the domains will be processed. For example (I’m going to steal this direct from the docs), if you allow contoso.com at rule 100 but block bad.contoso.com at rule 110 the query to bad.contoso.com will be allowed because it falls under contoso.com which was allowed by a higher priority rule.

The virtual network link resource is the other child of the DNS Security Policy. This functions similar to the virtual network links with Private DNS Zones as it associates the DNS Security Policy to a virtual network where it will process queries sent through the wire server (Azure-provided DNS). Each virtual network can be linked to one DNS Security Policy but each DNS Security Policy can be linked multiple virtual networks allowing you to use them for those virtual networks connected in a hub and spoke like architecture with centralized DNS as well as those virtual networks that may require complete network isolation.

**Example of DNS Security Policy virtual network links**

DNS Security Policies support diagnostic logging. This allows you to send each query captured by the policy to storage, event hub, or a log analytics workspace. If using a log analytics workspace, the logs are written to a table named DNSQueryLogs. Log entries will look like the below. You’ll get the key pieces of information such as source IP address of the query and the action taken on it. Here you’ll see the query was denied which is indicated by the ResolverPolicyRuleAction. The values here will be “Deny” for blocks, “None” for alerts, and “Allow” for anything allowed.

When the query is denied, instead of getting back an NXDomain, the machine making the query receives back a CNAME of blockpolicy.azuredns.invalid indicating the query has been blocked by DNS Security Policy. This is much better behavior than a NXDomain because now we know what the culprit for the failed DNS query is.

**Example of DNS query being denied by DNS Security Policy**

To visualize how the allow and deny works, I threw together two quick and dirty visual representations.

**Example of how Allow and Block DNS Traffic Rules work**

Scenarios you may be wondering about

Like many of you, I’m curious to see what does work and doesn’t work. I went through and tested a variety of scenarios. Here are a few below and my results when using these policies:

Machine using an external DNS server and is not using wire server (magic IP, 168 address, etc)
- Query is not logged by DNS Security Policies
Machine using its wire server in its virtual network
- Query is captured
Machine using Private DNS Resolver in the same virtual network
- Query is captured
Machine using a DNS Proxy which sits in front of the Private DNS Resolver
- Query is captured
Machine queries an A record or PTR record
- Query is captured
Machine queries AAAA record
- Query is captured
Machine queries using TCP-based query instead of UDP-based query
- Query is captured
PaaS Services tested successfully
- Azure Bastion
- Azure Firewall

How might you use this?

So now you better understand how the service works and what it does. I’ll now tell you how I’d use it. I’m sure folks smarter than me will come out with more effective ways, but here is how I’m envisioning it now.

Based on the testing I’ve done (and testing done by one of my wonderful peers Chris Jasset) the DNS Security Policies seem to take effect at the wire server. This means you’ll want to link the policies to the virtual networks where DNS packets are directed to the wire server. In a centralized DNS design such as below, this would be linked to the virtual network containing the Azure Private DNS Resolver or 3rd-party DNS solution. You would need one DNS Security Policy per region give they are regional in nature.

**Sample design for centralized DNS resolution**

If you’re using a distributed DNS model, or have isolated virtual networks, your design would look something more like below. Here the DNS Security Policies are linked to each virtual network to ensure the packet is captured at the wire server of the virtual network where the query originates.

**Sample design for distributed DNS Security Policy**

As for domain lists, I think most organizations will likely have three separate domain lists. One for block, one for alert (again I don’t find this super useful as of now), and one for allow. These domain lists could be established in a production subscription and shared across lower environments to ensure consistency of blocked domains across environments.

Summing it up

There are a few big takeaways for you this post:

It’s time to revisit how you’re capturing DNS query logs. If your only reason for implementing a third-party DNS service was DNS query logging, you may want to revisit that to see to see if this new solution is more cost effective.
Just like Azure Private DNS, don’t forget to link your policy to the right virtual network. Whatever virtual network you’re sending DNS queries to the wire server is where these should be linked.
DNS query logs are very chatty. You may want to look at ways of optimizing what you capture (if you’re sending it to a third-party logging solution) of how much you retain (if you’re keeping it in a Log Analytics Workspace). This is especially true if you use a wildcard in the allow to capture everything. PaaS especially is very chatty. If you aren’t careful about this, you’ll owe Microsoft a big fat check by the end of that first month.

Lastly, I threw together some samples of the creation of these resources in Terraform if you’re curious. You can find the code here.

Well folks, hopefully you learned something new today. Thanks as always for taking the time to read the content!

Azure Authorization – Azure RBAC Basics

Posted on March 13, 2024 by mattfeltonma

This is part of my series on Azure Authorization.

Welcome back folks. In this post I will be continuing my series on Azure authorization by covering the basics of Azure RBAC. I will review the components that make up Azure RBAC and touch upon some of the functionality I’ll be covering in future posts of this series. Grab a coffee, get comfy, and prepare to review some JSON!

In my first post in this series I touched on the basics of authorization in Azure. In that post I covered the differences between the management plane (actions on a resource) and the data plane (actions on data stored in the resource). I also called out the four authorization planes for Azure which include Entra ID Roles, Enterprise Billing Accounts, Azure Classic Subscription Admin Roles, and Azure RBAC (Role-Based Access Control). If you’re unfamiliar with these concepts, take a read through that post before you continue.

Azure RBAC is the primary authorization plane for Microsoft Azure. When making a request to the ARM (Azure Resource Manager) REST API it’s Azure RBAC that decides whether or not the action is allowed (with the exceptions I covered in my first post and some exceptions I’ll cover through this series). Azure RBAC is configured by using a combination of two resources which include the Azure RBAC Role Definition and Azure RBAC Role Assignment. At a high level, an Azure RBAC Role Definition is a collection of permissions and an assignment is the granting of those permissions to a security principal for an access scope. Definitions and assignments can only be created by a security principal that has been assigned an Azure RBAC Role with the permissions in the Microsoft.Authorization resource provider. The specific permissions include Microsoft.Authorization/roleDefinitions/* and Microsoft.Authorization/roleAssignments/*. Out of the box there are two built-in roles that have these permission which include User Access Administrator and Owner (there are a few others that are newer and I’ll discuss in a future post).

Let me dig deeper into both of these resources.

It all starts with an Azure Role Definition. These can be either built-in (pre-existing roles Microsoft provides out of the box) or custom (a role you design yourself). A role definition is made up of three different components which include the permissions (actions), assignable scopes, and conditions. The permissions include the actions that can be performed, the assignable scope is the access scopes the role definition can be assigned to, and the conditions further constrain the access for the purposes of a largely new and upcoming additional features for Azure RBAC.

**Azure RBAC Role Definition** **– Structure**

Permissions are divided into four categories which include actions, notActions, dataActions, and notDataActions. The actions and notActions are management plane permissions and dataActions and notDataActions are data plane permissions. As mentioned earlier, management plane is actions on the resource while data plane is actions on the data within the resource. Azure RBAC used to be management plane only, but has increasingly (thankfully) been expanded into the data plane. Actions and dataActions are the actions that are allowed and notActions and notDataActions are the actions which should be removed from the list of actions. That likely sounds very confusing and I’ll cover it in more depth in my next post. For now, understand that a permission in notAction or notDataAction is not an explicit deny.

The assignable scopes component is a list of places the role definition can be assigned (I’ll get to this later). The assignable scopes include management group (with some limitations for custom role definitions), subscription, resource group, or resource. Note that whatever scopes are includes, the scopes under it are also included. For example, an assignable scope of a management group would make the role definition usable at that management scope, subscriptions associated with that management group, resource groups within those subscriptions, and resources within those resource groups.

Lastly, we have the conditions. Conditions are a new feature of Azure RBAC Role Definitions and Assignments and are used to further filter the access based on other properties of the security principal, session, or resource. These conditions are used in newer features I’ll be covering throughout this series.

When trying to learn something, I’m a big fan of looking at the raw information from the API. It gives you all the properties you’d never know about when interacting via the GUI. Below is the representation of a built-in Azure RBAC Role Definition. You can dump this information using Azure PowerShell, Azure CLI, or the REST API. Using Azure CLI you would use the az role definition command.

There are a few things I want you to focus on. First, you’ll notice the id property. Each Azure RBAC Role Definition has a unique GUID which must be unique across the Entra ID tenant. The GUIDs for built-in roles should (never run into an instance where they were not) be common across all instances of Entra ID. Like all Azure objects, Azure RBAC Role Definitions also have be be stored somewhere. Custom Azure RBAC Role Definitions can be stored at the management group or subscription-level. I’d recommend you store your Azure RBAC Role Definitions at the management group level whenever possible so they can be re-used (there is a limit of 5,000 custom Azure RBAC Role Definitions per Entra ID tenant) and they exist beyond the lifetime of the subscription (think of use case where you re-use a role definition from subscription 1 in subscription 2 then blow up subscription 1, uh oh).

Next, you’ll see the assignableScopes property with a value of “/”. That “/” represents the root management group (see my post on the root management group if you’re unfamiliar with the concept). As of 3/2024, only built-in role definitions can have an assignable scope of “/”. When you create a custom role definition you will likely create it with an assignable scope of either a management group (good for common roles that will be used across subscriptions and to keep under the 5,000 role definition limit) or subscription (good for use cases where a business unit may have specific requirements).

Lastly, you’ll see that a condition has been applied to this Azure RBAC Role Definition. As of 3/2024 only built-in roles will include conditions in the definitions. I’ll cover what these conditions do in a future post.

Excellent, so you now grasp the Azure RBAC Role Definition. Let me next dive into Assignments.

Azure RBAC Role Assignments associate a role definition to a security principal and assign an access scope to those permissions. At a high-level, they are made up of four components which include the security principal (the entity assigned the role), role definition (the collection of permissions), scope (the access scope), and conditions (further filters on how this access can be exercised).

**Azure RBAC Role Assignment – Structure**

The security principal component is the entity that will be assigned the role for the specific access scope. This can be an Entra ID user, group, service principal, or managed identity.

The role definition is the Azure RBAC Role Definition (collection of permissions) the security principal will be assigned. This can include a built-in role or a custom role.

The scope is the access scope the security principal can exercise the permissions defined in the role definition. It’s most common to create Azure RBAC Role Assignments at the subscription and resource group scope. If you have a subscription strategy where every application gets its own subscription, the subscription scope may make sense for you. If your strategy is subscriptions at the business unit level you may create assignments at the resource group. Assignments at the management group tend to be limited to roles for the central IT (platform and security) team. Take note there are limits to the number of assignments at different scopes which are documented at this link. As of 3/2024 you cannot assign an Azure RBAC Role with dataActions or notDataActions permissions at the management group scope.

Let’s now take a look at the API representation of a typical role assignment. You can dump this information using Azure PowerShell, Azure CLI, or the REST API. When using Azure CLI you would do:

az role assignment list

Here there are a few properties to note. Just like the Azure Role Definitions, the id property of an Azure RBAC Assignment must contain a GUID unique to the Entra ID tenant.

The principalId is the object id of the security principal in Entra ID and the principalType is the object type of the security principal which will be user, group, or service principal. Why no managed identity? The reason for that is managed identities are simply service principals with some orchestration on top. If you’re curious as to how managed identities are represented in Entra ID, check out my post on orphaned managed identities.

The scope is the access scope the permissions will apply to. In the example above, the permissions granted by this role assignment will have the scope of the rg-demo-da-50bfd resource group.

This role assignment also has a condition. The capabilities of conditions and where they are used will be covered in a future post.

The last property I want to touch on is the delegatedManagedIdentityResourceId. This is a property used when Azure Lighthouse is in play.

Alright folks, that should give you a basic understanding of the foundations of Azure RBAC. Your key takeaways for today are:

Assigning a security principal permissions consists of two resources, the role definition (the set of permissions) and the assignment (combination of the role, security principal, and access scope).
Custom Azure RBAC Role Definitions should be created at the management group level where possible to ensure their lifecycle persists beyond the subscription they are used within. Be aware of the 5,000 per Entra ID tenant limit.
Azure RBAC Role Assignments are most commonly created at the subscription or resource group level. Usage at management groups will likely be limited to granting permissions for Central IT or Security. Be aware of the limits around the number of assignments.

In my next post I’ll dig deeper into how notActions and notDataActions works, demonstrate how it is not an explicit deny, and compare and contract an Azure RBAC Role Definition to an AWS IAM Policy.

Have a great week!

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Tag Archives: cybersecurity