A Deep Dive Into Azure Route Server

Hello again fellow geeks.

I recently had a customer reach out to me with an interest in learning more about ARS (Azure Route Server). The customer hoped it might ease some of the burden of managing routing across a large Azure implementation. I had yet to mess around with ARS since it only recently went GA (generally available), so I figured this would be a great opportunity to build out a lab and give it a whirl. Let’s get to it!

ARS is one of Microsoft’s new networking offerings in Azure offering a managed routing service that is hooked into Azure’s SDN (software defined network). The way I like to think of the service is a couple of VMs (virtual machines) that are managed by Microsoft, running a BGP (Border Gateway Protocol) service, and which have the ability to program routes directly into VNets (virtual networks). In addition to introducing some pretty cool new networking patterns such as an SD WAN and ExpressRoute pattern and dual-homed network, the feature that most interested my customer was its capability of BGP peering with a customer’s NVA (network virtual appliance) such as a Palo Alto or Cisco appliance and injecting those learned dynamic routes into the VNet. This is the feature which I’ll cover in this blog post.

I did some thinking about how I wanted to lab this out and what I wanted to use as an NVA. Since the behavior I wanted to test was primarily BGP, I figured I’d keep it simple and run a Linux VM which would host a lightweight BGP service. I found a wonderful post from Adam Stuart (seriously a great read and really cool Azure networking pattern in his post) which mentioned Exabgp. After doing a bit of research on it, it looked relatively easy to setup and use so the choice was made (Thanks Adam!) In addition to the NVA, I decided to build out a hub-and-spoke architecture and make use of my home pfSense appliance for S2S VPN (site-to-site virtual private network) connectivity and to replicate an on-premises environment. The result was the lab pictured below (image 1).

Image 1 – Azure Route Server Lab

One thing you may notice in the above image is ARS has a public IP address associated to it. Remember when I mentioned this is a couple of Microsoft-managed VMs? Well this public IP facilitates Microsoft management of the VMs similar to other managed services such as Azure SQL MI (Managed Instance). Before you ask, no you can’t associate an NSG (Network Security Group) to the subnet ARS is provisioned into, and the subnet has to be named RouteServerSubnet.

I setup a S2S VPN connection and BGP peering between my pfSense appliance and Azure VPN Gateway and advertised the set of routes documented in the lab diagram (image 1). I then provisioned a couple of Ubuntu VMs with two in the hub, one in the first spoke, and one in the second spoke.

Exabgp was a bit painful to setup because the documentation out there is a bit sparse and I had to piece it together from multiple sources. Given that, I’m going to spend a few minutes walking through the setup.

I first setup my ARS using the instructions in this link. Ensure you use a valid ASN for your NVA (autonomous system number).

Once ARS is setup, you can begin setting up Exabgp on the VM using the commands below.

sudo apt update
sudo apt install exabgp

You’ll then need to modify the service file located in /lib/systemd/system/exabgp.service to uncomment the lines below.

...
[Service]
#User=exabgp
#Group=exabgp
Environment=exabgp_daemon_daemonize=false
PermissionsStartOnly=true
ExecStartPre=-mkfifo /run/exabgp.in
...

Now you’ll need to create a configuration file for exabgp and save it to /etc/exabgp/exabgp.conf. Below is the configuration file I used for the lab.

neighbor 10.0.2.4 {
        router-id 10.0.0.4;
        local-address 10.0.0.4;
        local-as 65010;
        peer-as 65515;

        static {
                route 192.168.10.0/24 next-hop 10.0.0.4;
        #       route 192.168.1.0/24 next-hop 10.0.0.4 as-path [65010 65010] community [65010:2];
                route 192.168.1.0/24 next-hop 10.0.0.4 community [65010:2];
                route 10.1.0.0/16 next-hop 10.0.0.4;
                route 10.10.0.0/16 next-hop 10.0.0.4;
                route 0.0.0.0/0 next-hop 10.0.0.4;
        }
}
neighbor 10.0.2.5 {
        router-id 10.0.0.4;
        local-address 10.0.0.4;
        local-as 65010;
        peer-as 65515;

        static {
                route 192.168.10.0/24 next-hop 10.0.0.4;
        #       route 192.168.1.0/24 next-hop 10.0.0.4 as-path [65010 65010] community [65010:2];
                route 192.168.1.0/24 next-hop 10.0.0.4 community [65010:2];
                route 10.1.0.0/16 next-hop 10.0.0.4;
                route 10.10.0.0/16 next-hop 10.0.0.4;
                route 0.0.0.0/0 next-hop 10.0.0.4;
        }
}

Each instance of ARS is configured to be highly available and is deployed across availability zones if the region supports it. It comes with two BGP peering IPs you’ll need to peer with and advertise your routes to. If you only peer one with or advertise different routes, you’ll get funky behaviors. In the configuration file above, each JSON object represents one of these peers. You’ll notice I’m advertising a number of routes which will demonstrate different behaviors I’ll walk through with this post.

Once you’ve done the above you can start the service and validate that it successfully started.

sudo systemctl start exabgp
sudo systemctl status exabgp

Give it a minute and then you can run the following command to output the routes the exabgp service has learned from ARS.

sudo exabgpcli show adj-rib in extensive

In the output below (image 2) you can see ARS advertising the address space of the hub and spoke1 Vnet.

Image 2 – Exabgp output

So why isn’t spoke 2 being advertised? Well ARS requires the peerings between the VNets to be configured with UseRemoteGateway and AllowGatewayTransit properties (referred to in the Portal as Use the remote virtual network’s gateway or Route Server and Use this virtual network’s gateway or Route Server) in order for ARS to propagate routes between the Vnets. As you’ll note from the lab image, I enabled these settings for spoke 1 but not spoke 2.

This requirement does present a challenge in that when it’s enabled ARS propagates routes to the peered Vnets, but so does your ExpressRoute or VPN Gateway. My customer base typically requires all traffic leaving the spoke to flow through a security appliance. If you do not enable this setting the spoke only knows about itself and the hub VNet. To direct the traffic through an appliance in a hub you only need two UDRs (one for 0.0.0.0/0 and one for the hub VNet address space). This makes it easy to stamp out spokes from a networking perspective and optionally audit and enforce with Azure Policy.

Looking at the effective routes of the VM in spoke 1 (image 3), you can see routes highlighted in red are routes being propagated into the spoke by my VPN Gateway which is receiving them from my pfSense appliance. To ensure traffic between on-premises and Azure flows through a security appliance you’d need to define all of the routes you’re propagating over your ExpressRoute/VPN in your NVA. This way ARS propagates them to the peered VNets and overrides the ones coming in from the gateway.

Image 3 – Spoke 1 Effective Routes

Recall from my lab image (image 1), I’m sending 192.168.0.1/24, 192.168.2.0/24, 192.168.3.0/24, and 192.168.4.0/24 over BGP to Azure. In the image above (image 3) you’ll notice three out of four of these routes are coming from my VPN Gateway (10.0.3.5), but 192.168.0.1/24 is coming from Exabgp VM (10.0.0.4). The documentation says that when two routes for the same address space but different AS PATH lengths are received from different NVAs, ARS will only program the route with the shorter AS PATH. Apparently this isn’t just a trait of ARS and is a trait of the Azure SDN.

To validate this is the case, I edited my Exabgp config file and appended a longer AS PATH length to the route as seen below.

Exabgp config with modified AS PATH

After about a minute, the route table on my spoke 1 VM now displays the route for 192.168.0.1 coming from the VPN Gateway because the route coming from my Exabgp now has a longer AS PATH. This indicates that when two routes for the same address space are advertised, the route with the shortest AS PATH gets programmed to the VNet while the route with the longer AS PATH is seemingly discarded. I would have expected to see the 192.168.0.1/24 coming from on-premises with an Active value of Invalid, but instead it’s discarded.

Image 4 – VPN Gateway with shorter AS PATH

The next route I want to look at it is 10.1.0.0/16 which is the address space of spoke 1. I configured the Exabgp VM to advertise this route to ARS. Looking at the effective routes for the VM in the hub (VM-DEMO) the only route for this address space is the system route for the peering. This is because system routes for the VNet, VNet peering, or service endpoints are the preferred routes, even if BGP routes are more specific. This means you can’t use ARS to push routes that would force traffic from a spoke to flow through an NVA in the hub because of the VNet peering system route. For that use case it looks like you will need to continue to use UDRs (user defined routes)

I have the default route of 0.0.0.0/0 which I am advertising from the Exabgp VM. You can see in the above images is propagating to both the hub and spoke. Take note that if you advertise this route, you’re going to want to ensure you place a route table and UDR on the subnet your NVAs are in. Otherwise you’ll run into a scenario where you’ll have a routing loop because the default route would be received by the NVAs subnet.

Let’s pull the VPN Gateway into the mix. ARS does support BGP peering with an ExpressRoute or VPN Gateway. This allows you to propagate the routes ARS is learning from the NVA back on-premises. You enable this functionality by enabling the Branch-to-branch feature of ARS. In my lab environment, I commented out the default route in my Exabgp conf file because I didn’t want to send that back on-premises because I don’t know how to filter out routes in FRRoute (the BGP service running on pfSense). I then ran an az network vnet-gateway list-learned-routes and confirmed my VNG is now receiving routes from ARS as seen in the image below (image 5).

Image 5 – VPN Gateway learned routes

On the pfSense side I was now able to see the routes coming from my NVA coming over the VPN Gateway and back my on-premises appliance. These are the routes with an AS PATH of 65510 65515 65010 in the image below (image 6).

Image 6 – pfSense learned routes

The above shows that the routes coming from the NVA are being propagated back on-premises. What about the routes coming from on-premises? Are they being propagated all the way to the NVA? Let’s check it out!

To verify this I printed the routes ARS is advertising to the NVA about using az network routeserver peering list-advertised-routes. The routes in bold are the routes coming from the pfSense appliance showing that ARS is receiving the routes and advertising them to the Exabgp VM.

RouteServiceRole_IN_0:
- asPath: '65515'
  localAddress: 10.0.2.4
  network: 10.0.0.0/16
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0
- asPath: '65515'
  localAddress: 10.0.2.4
  network: 10.1.0.0/16
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0
- asPath: 65515-65510-65501
  localAddress: 10.0.2.4
  network: 192.168.4.0/24
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0
- asPath: 65515-65510-65501
  localAddress: 10.0.2.4
  network: 192.168.2.0/24
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0
- asPath: 65515-65510-65501
  localAddress: 10.0.2.4
  network: 192.168.3.0/24
  nextHop: 10.0.2.4
  origin: Igp
  weight: 0

Lastly, I wanted to see the routes and details on the Exabpg VM. For that I ran the command below on the VM.

sudo exabgpcli show adj-rib in extensive
Image 7 – Exabgp learned routes

In the image above (image 7) the on-premises routes for the 192.168.0 address spaces have been learned and have the AS PATH leading back on-premises. Note the community value of 65517:65517 being attached to the routes. That is not coming from my pfSense appliance but rather the VPN Gateway. The community tag is used to identify these routes as coming from a VPN Gateway and are used by Microsoft for filtering.

Well folks that about wraps it up. The key takeaways of my time with ARS were the following:

  • ARS requires the UseRemoteGateway and AllowGatewayTransit properties be configured on VNet peering. This makes managing traffic flow between an spoke and on-premises a bit more complicated. Instead of defining a 0.0.0.0/0 route and calling it a day, you would need to define all the routes in your NVA and propagate them using ARS.

    This isn’t necessarily a bad thing, you’re just shifting management of routing outside the Azure control plane and into the NVA data plane. That may be your preference.
  • When multiple routes for the same address space and different length AS PATHs are propagated to a VNet, the Azure SDN will only program the route the shortest AS PATH. The other route with the longer AS PATH is discarded.
  • You can’t use an NVA and ARS to propagate the hub and spoke VNet address spaces back on-premises because system routes for the VNet, VNet peering, and Service Endpoints supersede routes propagated via BGP. Instead, you could use a larger summarized route encompassing the entire address space for the VNets in your region and propagate that back on-premises using the NVA and ARS.
  • You can use an NVA and ARS to propagate routes from Azure back to on-premises. A use case might be that you want all Internet-bound traffic to egress out of Azure because you get better performance than you’re getting from your ISP on-premises. I’ve never seen it, but nevertheless. 🙂

Hopefully some of the above helps you become more familiar with Azure Route Server and what the benefits and considerations are.

See you next post!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s