Network Security Perimeters – NSP Components

Network Security Perimeters – NSP Components

This is part of my series on Network Security Perimeters:

  1. Network Security Perimeters – The Problem They Solve
  2. Network Security Perimeters – NSP Components
  3. Network Security Perimeters – NSPs in Action – Key Vault Example
  4. Network Security Perimeters – NSPs in Action – AI Workload Example

Welcome back fellow geeks!

Today I will be continuing my series on NSPs (Network Security Perimeters). In the last post I outlined the problems NSPs were built to solve. I covered how users of Azure have historically controlled inbound and outbound traffic for PaaS (platform-as-a-service) in Azure and the gaps and challenges of those designs. In this post I’m going to dig into the components that make up an NSP, what their function is, how they’re related, and how they work together.

A Quick Review

Before I dive into the gory details of NSP primitives, I want to do a quick refresh on terminology I’ll be using in this post. As I covered in the last post, I divide PaaS services in Azure into what I call compute-based PaaS and service-based PaaS. Service-based PaaS is PaaS where you upload data but don’t control the code executed by the PaaS whereas with compute-based PaaS you control the code executed within the service. NSPs shine in the service-based PaaS realm and that will be my focus for this series.

Compute-based PaaS vs Service-based PaaS

Securing service-based PaaS with the traditional tooling of IP whitelisting, service whitelisting, resource whitelisting, resource-specific outbound controls (and they are very rare) presented the problems below:

  1. Issues at scale (IP whitelisting).
  2. Certain features were not available in all PaaS (resource-based whitelisting or product-specific outbound controls).
  3. The configuration for inbound network control features lived as properties of the resource and could be configured differently by different teams resulting in inconsistent configurations.
  4. Logging widely differed across products.

All the challenges above demanded a better more standardized solution, and that’s where NSPs come in.

Network Security Perimeter Components

I’m a huge fan of breaking down any technology into the base components that make it up. It makes it way easier to understand how the hell these things work together holistically to solve a problem. Let’s do that.

Network Security Perimeter Hierarchy

At the highest level is the Network Security Perimeter resource. This is what I refer to as a top-level resource, meaning a resource that exists directly under an Azure Resource Provider and is visible under a resource group. The Network Security Perimeter resource exists in the Microsoft.Network resource provider, is regional in nature (vs global), and serve as the outer container for the logic of the NSP.

The resource is very simple and the only properties of note are name, location, and tags.

Network Security Perimeter

Profiles

Underneath the Network Security Perimeter resource is the Profile resource. The easiest way to think about a profile is that it’s a collection of inbound and outbound network access rules that you plan to tie to a resource or resources. Each Network Security Perimeter resource can have 1 or more profiles. It’s recommended you have less than 200 profiles, however, I have trouble thinking of a use case for that many unless you’re in an older and more legacy subscription model where you’re packing everything into as few subscriptions as possible (not where you should be these days).

From mucking around with profiles, I can see the use for putting resources in the same NSP into different profiles. For example, I may wrap an NSP around a Storage Account and the Key Vault which holds the CMK (customer-managed key) used to encrypt the Storage Account. I’d likely want one profile for the storage account with a large set of inbound rules and another profile for the Key Vault with a profile with no inbound or outbound rules restricting the Key Vault to be accessed by the Storage Account. My CI/CD pipeline could reach the Key Vault via a Private Endpoint as NSPs DO NOT affect traffic ingressing through the resource’s Private Endpoint. Resources within the same NSP can communicate with each other as long as they’re associated with a managed identity (according to docs, still want to test and validate this myself).

Network Security Perimeter Profiles

Access Rules

Next up you have the Access Rule resource which exists underneath the Profile resource. This is where the the rubber hits the road. Access rules should be familiar to you old folks like myself as they are very similar to firewall rules (at least today). You have a direction (inbound or outbound) and some type of type of property to filter by. For example, as of the date of this post, inbound traffic can support filtering on subscriptions (very similar to the resource whitelisting in Azure Storage) and IP-based rules (similar to IP whitelisting of traditional service firewall). Outbound traffic can be filtered by FQDN (fully qualified domain name). You can have up to 200 rules per profile (this is a hard limit).

If you’re nosy like I am, you may have glanced at the API reference. Inside the API reference, there are additional items that may someday come to fruition such as email addresses and phone numbers (really curious as to the use cases for these) and service tags (DAMN handy but not yet usable as of the date of this post).

Network Security Perimeter Profile Access Rules

Resource Association

The Resource Association resource exists underneath the Network Security Perimeter. Resource associations connect a supported Azure resource to Network Security Perimeter’s profile and dictate an access mode. There are two documented access modes today which include learning (since renamed transition mode in documentation), and enforced. Transition mode (or learning mode in the APIs) is the default mode and is the mode you’ll start with to understand the inbound and outbound traffic of the resources associated with the NSP. Only after you understand those patterns should you switch to enforced mode. In enforced mode the access rules applied the relevant profile will take effect and all inbound and outbound traffic outside the NSP will be blocked unless explicitly allowed.

Audit mode is a third mode that is available via the API but isn’t mentioned in main public documentation. Your use case for audit mode as far as I can see is if you (say you’re information security) want to audit inbound and outbound traffic out of PaaS resources that are associated to a Network Security Perimeter you do not control.

To answer a few questions that I know immediately popped in my head:

  1. Yes, you can associate resources to a network security perimeter that is in different a subscription.
  2. No, you cannot associate a resource to multiple profiles in enforced mode in the same Network Security Group. Associations to other profiles must be configured in access mode.
  3. No, you cannot associate a resource to multiple profiles in enforced mode in different Network Security Groups. Associations to other profiles must be configured in access mode.
Network Security Perimeter Resource Associations

Transition and Enforcement Mode

I want to dig a bit more into transition and enforcement mode and how they affect the resource-level service firewall controls.

In transition mode (which is the default mode) the NSP will evaluate the access rules of the profile the resource is associated with and will log whether the traffic would be allowed or denied (if diagnostic logging is enabled which you most definitely should enable and I’ll cover quickly below but more in depth in a future post) by the NSP and will then fallback to obeying the rules defined in the resource’s service firewall. This is a stellar way for you to get a baseline on how much shit will break with your new rules (and oh yes will shit break for some folks out there).

Once you understand the traffic patterns and design rules around what you want to allow in via the public side of the PaaS, you can flip the resource association to enforced mode. Once you flip that switch, the associated resources will have public access blocked both inbound and outbound. If you have IP whitelisting defined, those rules are ignored. If you checked off the “Allow Microsoft trusted services”, those rules are ignored. As I mentioned above, access through a Private Endpoint is never affected by NSPs as NSPs are concerned with traffic ingressing or egressing from the public sides of the PaaS resource.

It’s worth noting that a new value of the publicNetworkAccess property has been introduced. The publicNetworkAccess property is a property standard to PaaS (I haven’t seen one without it yet). Typically, the two values are either Enabled or Disabled. The property will be typically be set to Enabled if you’re allowing all public access or are restricting it to specific IP addresses, trusted services, specific resources, or service endpoints. It will be typically Disabled when you are blocking all public access and restricting it to Private Endpoints. Note that I use the word typically because this is Azure and there seem to always be exceptions. The new value introduced is SecuredByPerimeter. If you set the publicNetworkAccess property to this the resource is completely locked down to public traffic except for what is allowed in the NSP (even if no NSP is applied). If you plan on transitioning to NSPs fully for controlling public access (once all the resources you need have been onboarded) this is the setting you’ll want to go with.

Logging

The best feature of NSPs (in this dude’s opinion) is the logging feature. Like other Azure resources, NSPs support diagnostic settings. These are configured on a per Network Security Perimeter resource and include a plethora of logs categories. These diagnostic settings allow you to turn on detailed logging of traffic that comes in and out of an NSP. This provides you a standard log for all inbound and outbound traffic vs relying on the resource-level logs which can vary greatly per service (again, looking at you AI Search!).

Imagine being able to confirm that a user’s machine is accessing resource over the Microsoft public backbone vs a Private Endpoint and being able to see exactly what IP they’re using to do it. I can’t tell you how helpful this would have been over the past few years where forward web proxies were in use and were incorrectly configured for DNS affecting access to Private Endpoints.

This is such a damn cool feature that it is deserving of its own deep dive post. I’ll be adding this over the next few weeks.

Summing it up

For the purposes of this post, I really want you to take away an understanding of the NSP primitives, what they do, and how they’re related. Do some noodling on how you think you’d use these primitives and what use cases you might apply them to.

Here are some key points to walk away with:

  1. Network Security Perimeters rules apply only to public traffic. They do not affect traffic coming in through a Private Endpoint.
  2. Resources can be associated to Network Security Perimeters across subscriptions.
  3. Leverage transition mode (learning mode in the API) to understand what traffic is coming in and going out of your resources over the public endpoints and how your access rules may effect them.
  4. For resources that support NSPs (and are generally available) think about deploying those resources (it’s only a few as of the date of this post) with the publicAccessProperty set to SecuredByPerimeter at creation to ensure all public network access is controlled by a Network Security Perimeter. This will force you to use NSPs to control that traffic. Auditing or enforcing with Azure Policy would be nice control on top.
  5. Do some experimentation with the logging provided by NSPs. It will truly blow your mind how useful the logs are to troubleshooting networking issues and identifying network security threats.

Before I close this out, you’re probably wondering why I didn’t cover the Links resource. Well today, this resource doesn’t do anything and isn’t mentioned in the documentation. If you’re a close observer, you’ll notice the API properties of this resource hint to a possible upcoming feature (I ain’t spoiling anything here since the REST documentation is out there) that looks to provide a feature for NSP to NSP communication. We’ll have to wait and see!

In my next post I’ll walk through three common use cases for NSPs. I’ll cover how the problem was previously solved with service firewall controls and how it can be solved with NSPs. I’ll also walk through the configuration in the Portal and through Terraform. I’ll finish off this series (unless I think of anything else) with a deep dive into NSP logging.

See you next post!

Network Security Perimeters – The Problem They Solve

Network Security Perimeters – The Problem They Solve

This is part of my series on Network Security Perimeters:

  1. Network Security Perimeters – The Problem They Solve
  2. Network Security Perimeters – NSP Components
  3. Network Security Perimeters – NSPs in Action – Key Vault Example
  4. Network Security Perimeters – NSPs in Action – AI Workload Example

Hello folks!

Last month a much anticipated new feature became available. No, it wasn’t AI-related (hence why little fanfare) but instead one of the biggest network security improvements in Azure. In early August, Microsoft announced that Network Security Perimeters were finally generally available. If you’ve ever had the pain of making PaaS (platform-as-a-service) to PaaS work in Azure, or had an organization that hadn’t yet fully adopted PrivateLink Private Endpoints, or even had a use case where public access was required and there wasn’t a way around it, then Network Security Perimeters (NSPs) will be one of your favorite new features.

This is gonna be lengthy, so I’m going to divide this content into 2 -3 separate posts. In this first post I’m going to cover a bit of history and the problem NSPs were created to solve.

Types of PaaS

Like any cloud platform, Azure has many PaaS-based services. I like to mentally divide these services into compute-based PaaS (where you upload code and control the actions the code performs) and service-based PaaS (where you upload data but don’t control the code that executes the actions). Examples of compute-based PaaS would be App Services, Functions, AKS, and the like. Service-based PaaS would be things like Storage, Key Vault, and AI Search. I’m sure there are other more effective ways to divide PaaS, but this is the way I’m gonna do it and you’ll have to like it for the purposes of this post.

Compute-based PaaS has historically been easier to control both inbound and outbound given because the feature set to do that was built directly into the product. These control mechanisms consisted of Private Endpoints or VNet injection to control inbound traffic and VNet integration and VNet injection to control outbound traffic.

Controlling service-based PaaS was a much different story.

How Matt’s brain divides PaaS

Prior to the introduction of PrivateLink Private Endpoint support for service-based PaaS customers controlled inbound traffic destined for the public IP of the service instance using what I refer to as the service firewall. The service firewall has a few capabilities to control inbound traffic to the public IP address including:

  • IP-based whitelisting
  • Service-based whitelisting (via Trusted Microsoft Services)
  • Subnet whitelisting (via Service Endpoints)
  • Resource-based whitelisting

Not every service-based PaaS supports all of these capabilities. For example, resource-based whitelisting via the service firewall is only available in Storage today.

Service Firewall – IP Whitelisting

IP-based whitelisting is exactly what you think it is. You plug in an individual public IP or public IP CIDR (consider this a rule) and those sources are allowed through the service firewall and can create TCP connections. This feature was commonly used to allow specific customer IP addresses, often linked to forward web proxies, access to the service-based PaaS prior to the organization implementing Private Endpoints. The major consideration of this feature is there is a finite number of rules (400 last I checked) you can have. While this worked well for the forward web proxy use case, it would often be insufficient for PaaS to PaaS communication (such as Storage to Key Vault to retrieve a CMK for encryption or attempting to whitelist the entire Power BI service prior to its own vnet integration support).

Service Firewall – IP-based Rules

Service Firewall – Service Whitelisting

Next up you had service-based whitelisting through toggling the “Trusted Microsoft Services” option. This toggle would allow all of the public IP addresses belonging a specific set of Microsoft services which differs on a per service-based PaaS basis. This means that toggling that switch for Storage allows a different set of services versus toggling that for Key Vault. The differing of what constitutes a trusted service made this option painful because the documentation on what’s trusted for each service has never been documented well. Additionally, this allows ALL public IPs from that service used by any Microsoft customer not just the public IPs used by your instances of these services. This particular risk was always a pain point for most organizational security teams. Unfortunately, in the past, this was required to be enabled to facilitate any service-based PaaS to service-based PaaS communication. Even worse is the listing of trusted services didn’t include the service you actually needed.

Service Firewall – Trusted Microsoft Services exception

Service Firewall – Virtual Network Service Endpoints

Next up you have Virtual Network Service Endpoints (or subnet whitelisting as I think of them). Service Endpoints were the predecessor to Private Endpoints. They allowed you to limit access to the public IP address of an Azure PaaS instance based on the virtual network (really the subnet in the virtual network) id. They do this by injecting specific routes into the virtual network’s subnet where they are deployed. When the traffic egresses the subnet, it is tagged with an identifier which is used in the ACL (access control list) of the PaaS instance. These routes can be seen when viewing the effective routes on a NIC (network interface card) of a virtual machine running in the subnet.

Service Firewall – Service Endpoints Routes

You would then whitelist the virtual network’s subnet on the PaaS instance allowing that traffic to flow the PaaS instance’s public IP address.

Service Firewall – Service Endpoint Rules

The major security issue with Service Endpoints is they create a wide open data exfiltration point. Any traffic leaving a compute within that subnet will traverse directly to the PaaS instance and will ignore any UDRs (since the UDRs are less specific) bypassing any customer NVA (network virtual appliance like a firewall). Unlike Private Endpoints, Service Endpoints aren’t limited to a specific instance of your PaaS but allow network access to all instances of that PaaS. This means an attacker could take advantage of a Service Endpoint to exfiltrate data to their malicious PaaS instance with zero network visibility to the customer. Gross. There were attempts to address this gap with Service Endpoint Policies which allow you to limit the egress to specific PaaS instances, but these never saw wider adoption than storage.

Operationally, these things are a complete shit show. First off, they are non-routeable outside the subnet so they do you no good from on-premises. The other pain point with them is customers will often implement them without understanding how they actually work causing confusion on why Azure traffic is routing the way it is or trying to figure out why traffic isn’t getting to where it needs to go.

These days service endpoints have very limited use cases and you should avoid them where possible in favor of Private Endpoints. The exception being cost optimization. There is an inbound and outbound network charge for Private Endpoints which can get considerable when talking about Azure Backup or Azure Site Recovery at scale. Service Endpoints do not have that same inbound/outbound network cost and can help to reduce costs in those circumstances.

While Service Endpoints don’t really have much to do with NSPs, I did want to cover them because of the amount of confusion around how they work (and how much I hate them) and because they have traditionally been an inbound network control.

Service Firewall – Resource Whitelisting

Last but not least, we have resource whitelisting. Resource whitelisting is a service-firewall capability unique to Azure Storage. It allows you to permit inbound network connectivity to Azure Storage to a specific instance of an Azure PaaS service, all instances of a PaaS service in a subscription, or all within the customer tenant. The resource then uses it managed identity and RBAC assignments to control what it can do with the resource after it creates its network connection. This was a good example of early attempt to authorize network access based on a service identity. It was commonly used when an instance of Azure Machine Learning (AML), AI Foundry Hubs, AI Search, and the like needed to connect to storage. It provided a more restricted method of network control vs the traditional Allow Trusted Microsoft Services option.

Service Firewall – Resource Whitelisting

Ok… and what about outbound?

Another major issue you should notice is the above are all inbound. What the hell do you do about outbound controls for these service-based PaaS? Well, the answer has historically been jack shit for almost every PaaS service. The focus was all about controlling the inbound network traffic and authorization of the resource to hope no one does anything shady with the resource by having it make outbound calls to other services to exfiltrate data or do something else malicious. Not ideal right?

Some service-based PaaS services came up with creative solutions around this. Resources that fall under the Cognitive Services umbrella, such as the Azure OpenAI Service got data exfiltration controls. AI Search introduced the concept of Shared Private Access which didn’t really control the outbound access, but did allow you to further secure the downstream resources AI Search accessed. Each Product Group was doing their best within the confines of their bubble.

Network Security Perimeters to the rescue

You should now see the challenges that faced inbound and outbound network controls in service-based PaaS.

  • Sure you had lots of options for inbound controls, but they had issues at scale (IP whitelisting) or certain features were not available in all PaaS (resource-based whitelisting).
  • The configuration for inbound network control features lived as properties of the resource and could be configured differently by different teams resulting in access challenges.
  • Outbound, you had very few controls, and if there were controls, they differed on a per-service basis.
  • Some services would give great details as to the inbound traffic allowed or denied (such as Azure Storage) while other services didn’t give you any of that network information (I’m looking at you AI Search).

In my next post I’ll walk through how NSPs help to solve each of these issues and why you should be planning a migration over to them as soon as possible. I’ll walk through a few common examples of how NSPs can ease the burden using common service-based PaaS to service-based PaaS. I’ll also cover how they can be extremely useful in troubleshooting network connectivity caused by routing issues or even DNS issues.

See you next post!