Journey To The CKA

Hello fellow geeks! It’s been a busy past few months between work, exams, and a case of COVID. Thankfully I’m healthy once again and figured I would finally get around to writing up a post about my journey to the Certified Kubernetes Administrator.

Like many of you, I started my career racking and stacking physical servers and appliances before virtualization took off. Virtualization quickly became commonplace and cloud soon followed it. Living and working through these evolutions taught me the secret of surviving and thriving in this industry. Folks, that secret is you have to love learning because the learning will never stop. Another major evolution and another learning opportunity is presenting itself in the world of infrastructure. The virtualization layer is moving up and containerization is becoming the new norm.

Back in July of this year I made a commitment to focus a large portion of my learning time this year on containerization. I understood the very high level concept of containers, but not much more so this was really a ground zero learning plan. I know there are others in the same situation, so I wanted to share the approach I took and the training path I found that worked.

Whenever I learn a new technology, I always start with the history of the technology. What business problem is it solving that wasn’t already being solved? How and why did it come to be? The Essential Container Concepts course by Ell Marquez filled that gap. Ell does an amazing job walking through the history of containerization and how it came to be. Core concepts are covered in depth and explained in a way that is easy to understand for someone with a background in infrastructure.

Once I felt like I understood the basic concepts and how containerization came to be, I decided to learn about a container runtime. While there are a number of container runtimes out there, I picked Docker due to how prevalent it is within the industry. Here I decided to go all in and do the Docker – Deep Dive course by Travis Thomsen. This course is 13 hours of learning goodness with lots of labs. Travis does an amazing job starting with the basics and building to the more complex topics.

After I had a decent understanding of the container runtime, it was now time to dig into the management and orchestration component with the beast that is Kubernetes. Here I started with the Kubernetes Essentials by William Boyd. This is a relatively quick 4 hour course that lives up to its names and touches on essential concepts within Kubernetes. I followed that up with another course by William Boyd, Kubernetes the Hard Way. This is a guided run through Kelsey Hightower’s Kubernetes The Hardway module. This is a great way to see the guts behind Kubernetes and also a wonderful means for those of you with an infrastructure background to grasp what is happening behind the scenes.

Next up was the CKA prep course by Chad Cromwell. This was “ok”. The content was decent but the instructor’s way of speaking wasn’t my cup of tea. If anything, it’s worthwhile course for the labs and the additional hands on practice.

I rounded out the structured courses with CKA prep course by Mumshad Mannambeth. This course was absolutely amazing. The content was excellent, everything was explained in detail, and Mumshad manages to keep it engaging and entertaining. The KodeCloud labs that come with the course are insanely helpful for preparing for the hands on nature of the exam.

Outside of structured courses I did a ton of reading of the official Kubernetes documentation. Typically technical documentation is a struggle to get through due to insufficient information or poor writing, but the documentation for Kubernetes is stellar. It’s organized well and very detailed.

I don’t think I would have been able to pass the CKA without all of the resources above. The CKA is a completely hands on exam, so you have to know both the concepts and how to hammer away on the keyboard to execute those concepts to solve problems. Given this, you need to practice a lot. I heavily used the hands on exercises in Mumshad’s course (and Chad’s to a much smaller degree). Additionally, I did the tasks in the official Kubernetes documentation over and over again until I was comfortable.

Even with all the preparation, it still took multiple passes on the exam to clear it. This was my first ever exam failure in my 15 years of taking technical exams. It was by far the most challenging exam I have ever taken and I’m thankful for my wonderful peers who kept me motivated to charge through even after failing. Go into this exam knowing if you come from a similar background as me, you will likely fail your first attempt and that’s ok. You get a free retake and and an opportunity to better yourself.

I highly recommend you infrastructure folks start this journey sooner rather than later. Whether or not Kubernetes retains its control over the space remains up the air, but the concept of pushing up that virtualization layer is here to stay. You will get value from this learning path and you’ll keep yourself relevant in the industry.

As I do for all my exams I’ve published my study guide on GitHub.

Well folks, hopefully this summary helps you in your own learning journey. Have a great holiday and a happy New Year!

What If… Volume 2

Hi there folks!

I’ve been busy lately buried in learning and practicing Kubernetes in preparation for the Certified Kubernetes Administrator exam. Tonight I’m taking a break to bring you another entry into the “What If” series I started a few months back.

Let’s get right to it.

What if I need to access a Private Endpoint in a subscription associated with a different Azure AD tenant and I have an existing Azure Private DNS Zone already?

I’ve been helping a good friend who recently joined Microsoft to support his customer as he gets up to speed on the Azure platform. This customer consists of two very large organizations which have a high degree of independence. Each of these organizations have their own Azure AD tenant and their own Azure footprint. One organization is further along in their cloud journey than the other.

Organization A (new to Azure) needed to consume some data that existed in an Azure SQL database in an Azure subscription associated with Organization B’s tenant. Both organizations have strict security and compliance requirements so they are heavy users of Azure PrivateLink Endpoints. A site-to-site VPN (virtual private network) connection was established between the two organizations to facilitate network communication between the Azure environments.

Customer Environment

The customer environment looked similar to the above where a machine on-premises in Organization A needed to access the Azure SQL database in Organization B. If you look closely, you probably see the problem already. From a DNS perspective, we have two Azure Private DNS Zones for privatelink.database.windows.net. This means we have two authorities for the same zone.

My peer and I went back on forth with a few different solutions. One solution seemed obvious in that organization A would manually create an A record in their Azure Private DNS zone pointing to IP of the PrivateLink Endpoint in Organization B. Since the organizations had connectivity between the two environments, this would technically work. The challenge with this pattern is it would introduce a potential bottleneck depending on the size of the VPN pipe. It could also lead to egress costs for Organization A depending on how the VPN connection was implemented.

The other option we came up with was to create a Private Endpoint in Organization A’s Azure subscription which would be associated with the Azure SQL instance running in Organization B’s Azure subscription. This would avoid any egress costs, we wouldn’t be introducing a potential bottleneck, and we’d avoid the additional operational head of having to manually manage the A record in Organization A’s Azure Private DNS Zone. Neither of us had done this before and while it seemed to be possible based on Microsoft’s documentation, the how was a bit lacking when talking PaaS services.

To test this I used two separate personal tenants I keep to test scenarios that aren’t feasible to test with internal resources. My goal was to build an architecture like the below.

Target architecture

So was it possible? Why yes it was, and an added bonus I’m going to tell you how to do it.

When you create a Private Endpoint through the Azure Portal, there is a Connection Method radio button seen below. If you’re creating the Private Endpoint for a resource within the existing tenant you can choose the Connect to an Azure resource in my directory option and you get a handy guided selection tool. If you want to connect to a resource outside your tenant, you instead have to select the Connect to an Azure resource by resource ID or alias. In this field you would end the full resource ID of the resource you’re creating the Private Endpoint for, which in this case is the Azure SQL server resource id. You’ll be prompted to enter the sub-resource which for Azure SQL is SqlServer. Proceed to create the Private Endpoint.

Private Endpoint Creation

After the Private Endpoint has been created you’ll observe it has a Connection status of Pending. This is part of the approval workflow where someone with control over the resource in the destination tenant needs to approve of the connection to the Azure SQL server.

Private Endpoint in pending status

If you jump over to the other resource in the target tenant and select the Private endpoint connections menu option you’ll see there is a pending connection that needs approval along with a message from the requestor.

Private Endpoint request

Select the endpoint to approve and click the approve button. At that point the Private Endpoint in the requestor tenant and you’ll see it has been approved and is ready for use.

This was a fun little problem to work through. I was always under the assumption this would work, the documentation said it would work, but I’m a trust but verify type of person so I wanted to see and experience it for myself.

I hope you enjoyed the post and learned something new. Now back to practicing Kubernetes labs!

Interesting behaviors with Private Endpoints

Interesting behaviors with Private Endpoints

Update September 2022 – The route summarization feature officially went generally available! This feature allows you to summarize a single address block and route it to your NVA for inspection instead of having to do /32s for each private endpoint. Note that SNAT is still a requirement to ensure symmetric traffic flow.

Hi folks!

Working for and with organizations in highly regulated industries like federal and state governments and commercial banks often necessitates diving REALLY deep into products and technologies. This means peeling back the layers of the onion most people do not. The reason this pops up is because these organizations tend to have extremely complex environments due the length of time the organization has existed and the strict laws and regulations they must abide by. This is probably the reason why I’ve always gravitated towards these industries.

I recently ran into an interesting use case where that willingness to dive deep was needed.

A customer I was working with was wrapping up its Azure landing zone deployment and was beginning to deploy its initial workloads. A number of these workloads used Microsoft Azure PaaS (platform-as-a-service) services such as Azure Storage and Azure Key Vault. The customer had made the wise choice to consume the services through Azure Private Endpoints. I’m not going to go into detail on the basics of Azure Private Endpoints. There is plenty of official Microsoft documentation that can cover the basics and give you the marketing pitch. You can check out my pasts posts on the topic such as my series on Azure Private DNS and Azure Private Endpoints.

This particular customer chose to use them to consume the services over a private connection from both within Azure and on-premises as well as to mitigate the risk of data exfiltration that exists when egressing the traffic to Internet public endpoints or using Azure Service Endpoints. One of the additional requirements the customer had as to mediate the traffic to Azure Private Endpoints using a security appliance. The security appliance was acting as a firewall to control traffic to the Private Endpoints as well to perform deep packet inspection sometime in the future. This is the requirement that drove me down into the weeds of Private Endpoints and lead to a lot of interesting observations about the behaviors of network traffic flowing to and back from Private Endpoints. Those are the observations I’ll be sharing today.

For this lab, I’ll be using a slightly modified version of my simple hub and spoke lab. I’ve modified and added the following items:

  • Virtual machine in hub runs Microsoft Windows DNS and is configured to forward all DNS traffic to Azure DNS (168.63.129.16)
  • Virtual machine in spoke is configured to use virtual machine in hub as a DNS server
  • Removed the route table from the spoke data subnet
  • Azure Private DNS Zone hosting the privatelink.blob.core.windows.net namespace
  • Azure Storage Account named mftesting hosting some sample objects in blob storage
  • Private Endpoint for the mftesting storage account blob storage placed in the spoke data subnet
Lab environment

The first interesting observation I made was that there was a /32 route for the Private Endpoint. While this is documented, I had never noticed it. In fact most of my peers I ran this by were never aware of it either, largely because the only way you would see it is if you enumerated effective routes for a VM and looked closely for it. Below I’ve included a screenshot of the effective routes on the VM in the spoke Virtual Network where the Private Endpoint was provisioned.

Effective routes on spoke VM

Notice the next hop type of InterfaceEndpoint. I was unable to find the next hop type of InterfaceEndpoint documented in public documentation, but it is indeed related to Private Endpoints. The magic behind that next hop type isn’t something that Microsoft documents publicly.

Now this route is interesting for a few reasons. It doesn’t just propagate to all of the route tables of subnets within the Virtual Network, it also propagates to all of the route tables in directly peered Virtual Networks. In the hub and spoke architecture that is recommended for Microsoft Azure, this means that every Private Endpoint you create in a spoke Virtual Network is propagated to as a system route to route tables of each subnet in the hub Virtual Network. Below you can see a screen of the VM running in the hub Virtual Network.

Effective routes on hub VM

This can make things complicated if you have a requirement such as the customer I was working with where the customer wants to control network traffic to the Private Endpoint. The only way to do that completely is to create a /32 UDRs (user defined routes) in every route table in both the hub and spoke. With a limit of 400 UDRs per route table, you can quickly see how this may break down at scale.

There is another interesting thing about this route. Recall from effective routes for the spoke VM, that there is a /32 system route for the Private Endpoint. Since this is the most specific route, all traffic should be routed directly to the Private Endpoint right? Let’s check that out. Here I ran a port scan against the Private Endpoint using nmap using the ICMP, UDP, and TCP protocols. I then opened the Log Analytics Workspace and ran a query across the Azure Firewall logs for any traffic to the Private Endpoint from the VM and lo and behold, there is the ICMP and UDP traffic nmap generated.

Captured UDP and ICMP traffic

Yes folks that /32 route is protocol aware and will only apply to TCP traffic. UDP and ICMP traffic will not be affected. Software defined networking is grand isn’t it? 🙂

You may be asking why the hell I decided to test this particular piece. The reason I followed this breadcrumb was my customer had setup a UDR to route traffic from the VM to an NVA in the hub and attempted to send an ICMP Ping to the Private Endpoint. In reviewing their firewall logs they saw only the ICMP traffic. This finding was what drove me to test all three protocols and make the observation that the route only affects TCP traffic.

Microsoft’s public documentation mentions that Private Endpoints only support TCP at this time, but the documentation does not specify that this system route does not apply to UDP and ICMP traffic. This can result in confusion such as it did for this customer.

So how did we resolve this for my customer? Well in a very odd coincidence, a wonderful person over at Microsoft recently published some patterns on how to approach this problem. You can (and should) read the documentation for the full details, but I’ll cover some of the highlights.

There are four patterns that are offered up. Scenario 3 is not applicable for any enterprise customer given that those customers will be using a hub and spoke pattern. Scenario 1 may work but in my opinion is going to architect you into a corner over the long term so I would avoid it if it were me. That leaves us with Scenario 2 and Scenario 4.

Scenario 2 is one I want to touch on first. Now if you have a significant background in networking, this scenario will leave you scratching your head a bit.

Microsoft Documentation Scenario 2

Notice how a UDR is applied to the subnet with the VM which will route traffic to Azure Firewall however, there is no corresponding UDR applied to the Private Endpoint. Now this makes sense since the Private Endpoint would ignore the UDR anyway since they don’t support UDRs at this time. Now you old networking geeks probably see the problem here. If the packet from the VM has to travel from A (the VM) to B (stateful firewall) to C (the Private Endpoint) the stateful firewall will make a note of that connection in its cache and be expecting packets coming back from the Private Endpoint representing the return traffic. The problem here is the Private Endpoint doesn’t know that it needs to take the C (Private Endpoint) to B (stateful firewall) to A (VM) because it isn’t aware of that route and you’d have an asymmetric routing situation.

If you’re like me, you’d assume you’d need to SNAT in this scenario. Oddly enough, due the magic of software defined routing, you do not. This struck me as very odd because in scenario 3 where everything is in the same Virtual Network you do need to SNAT. I’m not sure why this is, but sometimes accepting magic is part of living in a software defined world.

Finally, we come to scenario 4. This is a common scenario for most customers because who doesn’t want to access Azure PaaS services over an ExpressRoute connection vs an Internet connection? For this scenario, you again need to SNAT. So honestly, I’d just SNAT for both scenario 2 and 4 to make maintain consistency. I have successfully tested scenario 2 with SNAT so it does indeed work as you expect it would.

Well folks I hope you found this information helpful. While much of it is mentioned in public documentation, it lacks the depth that those of us working in complex environments need and those of us who like to geek out a bit want.

See you next post!

What If… Volume 1

Welcome back fellow geeks!

During brainstorming sessions with peers or customer conversations, “what if” type scenarios pop up. These are typically scenarios that aren’t documented at all or buried deep in the depths of the Internet. I thought it would be fun to create an ongoing series where I share some of the “what if” scenarios I run into and what I found when I labbed them out.

Lately I’ve been having a lot of internal and customer conversations around DNS and resolution with Azure Private Endpoints. Out of those conversations came some interesting “what if” scenarios. Those scenarios will be the subject of this post.

What if I create a second Private Endpoint for a single Azure resource and register it to the same Azure Private DNS zone?

This question popped up in some ongoing discussions around disaster recovery when Private Endpoints are in use. Specifically, the discussion was around Azure Storage accounts configured for GRS (geo-redundant storage). In this scenario a customer is accessing a storage account from on-premises via a Private Endpoints and is blocking all access to the storage account over the public endpoint. The customer has configured the storage account to register its record in the Azure Private DNS zone.

Lab layout

In the event an entire region fails, the private endpoint (seen above with IP of 10.0.1.4) in the primary region will become unavailable causing traffic to drop. If the customer creates a second private endpoint (seen above with the IP of 10.1.1.4) either ahead of time or after the failure what would happen when the private endpoint tried to register a duplicate record in Azure Private DNS?

Would the registration of the A record fail? Would it add an IP to the record set? Would it overwrite it?

The answer is it will overwrite the record. This means that the A record see in the above diagram for st1.privatelink.blob.core.windows.net would be overwritten to point to the new Private Endpoint address of 10.1.1.4.

What if I don’t want to depend solely on Azure Private DNS?

This conversation has popped up a lot lately with customers and in comments in my Azure DNS and Private Link blog series. As I discuss in that series, in its current state Private Endpoints depend heavily on Azure Private DNS. Azure Private DNS is relatively new so it’s a very basic DNS service with no fancy geo-load balancing capabilities or probes. This can create administrative overhead in the case of disaster recovery scenarios. Customers are eager to leverage their own DNS products such as InfoBlox or F5 GTMs due to the advanced capabilities of those products.

The challenge is customers are stuck using the privatelink namespaces (such as privatelink.database.windows.net) Microsoft has defined for the services. Now that wouldn’t be a problem if the customer could access the service directly by the privatelink FQDN (fully-qualified domain name), but that won’t work for encryption in transit to Microsoft PaaS (platform-as-a-service) offerings because the privatelink FQDN isn’t supported on the certificates provisioned to the services. This results in the dreaded certificate name mismatch scenario where the client (your machine) can’t verify the identity of the server because the server’s certificate doesn’t contain the identity you’re trying to access (mystuff.privatelink.database.windows.net). This requires you access the server using the public FQDN (mystuff.database.windows.net) which goes through the resolution path I discuss in my private link DNS series.

Unfortunately, there aren’t a ton of great options. I walk through some of the options in my series and Dan Mauser has done a wonderful job walking through some others in his postings. In his post, he discusses two solutions:

  1. Creating a conditional forwarder for the FQDN of the Azure resource
  2. Creating a forward lookup zone for the FQDN of the Azure resource.

The most common on-premises integration pattern looks something like the below In this pattern on-premises DNS servers are configured with a conditional forwarder to send all DNS queries for Azure PaaS services (such as database.windows.net) to a DNS resolver in Microsoft Azure. That resolver is configured with a standard forwarder to send all of its queries to the 168.63.129.16 virtual IP.

Scenario 5
Common resolution pattern for Private Endpoints

One of the downfalls of this pattern if you’re forwarding queries for all of public namespaces of the Azure PaaS DNS services you’re using up to Azure. If your ExpressRoute drops or S2S VPN (Site-to-Site VPN) drops, those queries will either timeout and fail to resolve or timeout and resolve to the public IP addresses.

In scenario 1, customers try to avoid that problem by creating conditional forwarders for each Azure resource’s private endpoint. For example, if you have a database named mydb.database.windows.net, you would create a conditional forwarder on-premises with the name of mydb.database.windows.net and point it to your upstream Azure DNS server which would go through the standard resolution path to resolve to the A record in Azure Private DNS.

In scenario 2, customers try to avoid the same problem by creating a forward zone for each Azure resource’s private endpoint. The concept is the same as a conditional forwarder where the forward zone is named the same as your resource (mydb.database.windows.net) but with the difference that your DNS server will resolve the record authoritatively.

So in short, both of these options work but in no way are they scalable to manage from a lifecycle perspective because of the scale and ephemeral nature of cloud. Creation or deletion of a forward zone or conditional forwarder is server-level change making it more likely someone will break something versus modification of an A record thus increasing the risk of this pattern. Finally, if you are using an Active Directory-integrated DNS zone, you get the added shi*t show of bloating your Active Directory DIT with the creations and deletions of all these records at scale.

My recommendation is to stick with the standard pattern I outlined above if you can. The Azure Private DNS service will evolve over time and more than likely new capabilities will be added to help address these gaps.