Thursday, May 30, 2013

BGP in juniper: Network redundancy & traffic engineering

In network and system architecture in general, it is a fact that things go down. System and processes fail and therefore we must build redundancy. A production network connected to a single link is a catastrophe waiting to happen. If that particular link fails the production network will no longer have access to the Internet and vice versa.

This is why it is always recommended to have another ISP link connected to our network. A practical solution is to connect that link on another router and have a redundancy protocol like VRRP running between the two router. But more about on such a practice in a future post. We will not only get link redundancy but also a network load balancer of sorts, using three ISP links connected to the same router. We will use policies to ensure that traffic for a particular subnet comes via link of our choice. This leads us to the question, why BGP is really required?

Need for BGP to do multihoming:


To receive the full internet routing table, we need higher end models like MX80 in Juniper which are capable of managing 400,000 plus routes. Multihoming can double or triple this. Such routers can be very expensive and such decisions have to be taken very carefully.

One question you may ask is what is the need to know route to every network when we can simply configure a static route pointing to your next hop? You can also have a floating static route to your other ISP, so that if the primary link goes down the other route will take preference. You can moreover do a host of filter based forwarding to manipulate outbound traffic.

First of all static routes bring a set of problems. The first option to have a backup static routes will lead to one pipe being over utilized while the other wont send out any traffic. Remember that for any enterprise serving content, outbound traffic is always more than the inbound traffic.

Having filter based forwarding so that a particular subnet takes one link while another takes the other link has scalability issues. As you add links or subnets you will have to carefully select which link to send out traffic by. In case you need to shift some traffic you really don't have many choices. Such a strategy balances traffic based on your subnets.

Having a full BGP table gives you full flexibility to manipulate traffic how you want. As you progress through the series all such aspects will be covered. Before delving into traffic manipulation it is important to learn a key concept known as AS paths.

AS Paths:

A distance vector protocol depends on the hop count, whereas a path vector protocol depends upon path count. For example, suppose our organisation is the originator of a subnet. It will therefore announce that subnet to its ISP with its own ASN say 100. The ISP will then prepend its own ASN (say 150) and announce it to its upstream or peer.

The upstream peer now has a route to the destination via AS 150. Moreover is has a direct reachability to this autonomous system. Therefore it is deemed as a valid route and placed in the routing table, as long as it not receiving announcement for the same subnet from other peers.


Shaping inbound traffic:


BGP uses a path selection algorithm to determine which route for a particular destination is the best. This link describes it. When a router receives routes from different ISPs it runs this algorithm. If a rule matches for both routes the next rule is looked at.  Usually the tie breaker occurs at the 4th step. A route having the minimum number of AS Paths is considered a better route.

In the illustration below is is clear that RTR1 is two AS hops away from RTR5. In an ideal case when the router will run the BGP path selection algorithm it fill find a tie even at rule 4 and proceed to the next rule. Unfortunately as we go down the list, it becomes more and more out of our hands. The last step is to select the route from a peer having lowest peer IP. Clearly we cannot control such factors. What we can control are some of the BGP attributes. What if could announce our routes by prepending the route with our own AS number? Let us look at an example for better understanding.

Suppose you have three subnets and want majority of traffic via each of the three links an with automatic link failover. This is where the attributes come into picture. By modifying them we can tell the whole world that a particular path is better than the others.

Note: The arrows represent the direction of route advertisements, not the data traffic. Path of data traffic is explained below.

In this diagram RTR5 receives a total of 9 routes, 3 per subnet , all announced by our router RTR1. To install a route for 200.200.200.0/23 it will run the path selection algorithm. Let us go through all the steps.


  1. Verify the next hop can be resolved.  True for all routes. (next hop being AS 300, 350 or 400)
  2. Since all are BGP paths, all will have default preference and therefore we have tie at this step as well.
  3. By default ISPs do not modify local preference (more about it in the next article) of its link and therefore a till at this step too.
  4. This is where the game changes. This is the first rule which us, AS 100 has control over. The below table describes how we have announced our subnets. When these announcements reach RTR5 it checks which announcements have the smalles AS path. 
As path prepending
Path prepending in action


For 200.200.200.0/23 path via AS 300 will be installed in the routing table. For, 150.150.150.0/23 path via 400 and for 100.100.100.0/23 path via AS 350.

It is very obvious that this leads to load sharing as traffic for different subnets will travel different links. A question you may ask is why send prepended announcements in the first place? We could have simply announced each subnet via one of the three links.  The answer is again network more specifically link redundancy. If one of the link goes down or an intermediate router goes down RTR5 will have alternate paths to reach RTR1. It will quickly run the best path algorithm and install the new route in the table.

Isn't it a brilliant technique? We have killed two birds with one stone. We have not only achieved automatic failover but shaped our traffic to come from different links. So if your subnet , 100.100.100.0/23 receives more traffic you can buy more bandwidth from ISP having AS 400. It is more scalable and convenient.

The final part is obviously the configuration and policy to achieve such a functionality in a juniper router. For more clarity on the basics of applying import and export policies click here and here.

Configuration:

Scenario discussion: RTR1 will be the customer router which needs to announce subnets mentioned above. Remember that only active routes can be announced via a dynamic routing protocol. Therefore we need these subnets in our forwarding table. To achieve that i have create a discard static route for these subnets. 

Just run set routing-options static discard in the configuration mode.

discard static routes are also present in the routing table
Discard static routes are also present in the routing table
Remember that the default policy for BGP in juniper is to export all active bgp routes. So if I announce all my subnets from rtr1 than all the other routers will relay that information to rtr5. You only need to set up basic bgp session between all the connected routers. If you do not know how to click here.

Configuration on rtr1:
set protocols bgp group test2 type external
set protocols bgp group test2 import import_bgp
set protocols bgp group test2 neighbor 10.10.10.2 export export_bgp_rtr2
set protocols bgp group test2 neighbor 10.10.10.2 peer-as 300
set protocols bgp group test2 neighbor 30.30.30.2 export export_bgp_rtr3
set protocols bgp group test2 neighbor 30.30.30.2 peer-as 400
set protocols bgp group test2 neighbor 20.20.20.2 export export_bgp_rtr4
set protocols bgp group test2 neighbor 20.20.20.2 peer-as 350

set policy-options policy-statement export_bgp_rtr2 term 1 from protocol static
set policy-options policy-statement export_bgp_rtr2 term 1 from route-filter 200.200.200.0/2 exact
set policy-options policy-statement export_bgp_rtr2 term 1 from route-filter 200.200.200.0/23 exact
set policy-options policy-statement export_bgp_rtr2 term 1 then accept
set policy-options policy-statement export_bgp_rtr2 term 2 from protocol static
set policy-options policy-statement export_bgp_rtr2 term 2 from route-filter 150.150.150.0/23 exact
set policy-options policy-statement export_bgp_rtr2 term 2 then as-path-prepend 100
set policy-options policy-statement export_bgp_rtr2 term 2 then accept
set policy-options policy-statement export_bgp_rtr2 term 3 from protocol static
set policy-options policy-statement export_bgp_rtr2 term 3 from route-filter 100.100.100.0/23 exact
set policy-options policy-statement export_bgp_rtr2 term 3 then as-path-prepend "100 100"
set policy-options policy-statement export_bgp_rtr2 term 3 then accept
set policy-options policy-statement export_bgp_rtr3 term 1 from protocol static
set policy-options policy-statement export_bgp_rtr3 term 1 from route-filter 200.200.200.0/23 exact
set policy-options policy-statement export_bgp_rtr3 term 1 then as-path-prepend 100
set policy-options policy-statement export_bgp_rtr3 term 1 then accept
set policy-options policy-statement export_bgp_rtr3 term 2 from protocol static
set policy-options policy-statement export_bgp_rtr3 term 2 from route-filter 150.150.150.0/23 exact
set policy-options policy-statement export_bgp_rtr3 term 2 then as-path-prepend "100 100"
set policy-options policy-statement export_bgp_rtr3 term 2 then accept
set policy-options policy-statement export_bgp_rtr3 term 3 from protocol static
set policy-options policy-statement export_bgp_rtr3 term 3 from route-filter 100.100.100.0/23 exact
set policy-options policy-statement export_bgp_rtr3 term 3 then accept
set policy-options policy-statement export_bgp_rtr4 term 1 from protocol static
set policy-options policy-statement export_bgp_rtr4 term 1 from route-filter 200.200.200.0/23 exact
set policy-options policy-statement export_bgp_rtr4 term 1 then as-path-prepend "100 100"
set policy-options policy-statement export_bgp_rtr4 term 1 then accept
set policy-options policy-statement export_bgp_rtr4 term 2 from protocol static
set policy-options policy-statement export_bgp_rtr4 term 2 from route-filter 150.150.150.0/23 exact
set policy-options policy-statement export_bgp_rtr4 term 2 then accept
set policy-options policy-statement export_bgp_rtr4 term 3 from protocol static
set policy-options policy-statement export_bgp_rtr4 term 3 from route-filter 100.100.100.0/23 exact
set policy-options policy-statement export_bgp_rtr4 term 3 then as-path-prepend 100
set policy-options policy-statement export_bgp_rtr4 term 3 then accept
set policy-options policy-statement import_bgp term RFC_1918 from route-filter 192.168.0.0/16 exact
set policy-options policy-statement import_bgp term RFC_1918 then reject
set policy-options policy-statement import_bgp term deny_own_pool from route-filter 200.200.200.0/23 orlonger
set policy-options policy-statement import_bgp term deny_own_pool then reject

Routing table on RTR5 
root@rtr5> show route protocol bgp

inet.0: 13 destinations, 18 routes (13 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.100.100.0/23   *[BGP/170] 00:06:18, localpref 100
                      AS path: 400 100 I
                    > to 60.60.60.1 via em3.0
                    [BGP/170] 00:06:18, localpref 100
                      AS path: 350 100 100 I
                    > to 50.50.50.1 via em1.0
                    [BGP/170] 00:06:18, localpref 100
                      AS path: 300 100 100 100 I
                    > to 40.40.40.1 via em0.0
150.150.150.0/23   *[BGP/170] 00:06:18, localpref 100
                      AS path: 350 100 I
                    > to 50.50.50.1 via em1.0
                    [BGP/170] 00:06:18, localpref 100
                      AS path: 300 100 100 I
                    > to 40.40.40.1 via em0.0
192.168.0.0/16     *[BGP/170] 00:11:04, localpref 100
                      AS path: 400 I
                    > to 60.60.60.1 via em3.0
200.200.200.0/23   *[BGP/170] 00:09:16, localpref 100
                      AS path: 300 100 I
                    > to 40.40.40.1 via em0.0
                    [BGP/170] 00:11:04, localpref 100
                      AS path: 400 100 100 I
                    > to 60.60.60.1 via em3.0
                    [BGP/170] 00:06:22, localpref 100
                      AS path: 350 100 100 100 I
                    > to 50.50.50.1 via em1.0



Clearly rtr5 will prefer different AS for the three subnets. Moreover RTR5 is learning two alternate paths for a particular subnet. (* marked route is the best route). So even if any of the intermediate link goes down, bythe virtue of a dynamic routing protocol-BGP our subnets will still be reachable.

This post gave the primary technique used by network administrators to engineer inbound traffic. What about outbound traffic? More on that in the next post!

Important Links:

4 comments:

  1. We will not only get link redundancy but also a network load balancer of sorts, using three ISP links connected to the same router. We will use policies to ensure that traffic for a particular subnet comes via link of our choice. This leads us to the question, why BGP is really required? methods to generate website traffic

    ReplyDelete
    Replies
    1. Those policies to ensure that traffic comes via a particular link are actually applied on the bgp protocol. In the bgp protocol you may increase AS path by padding your own AS number multiple times. If you do not have BGP running on your router than there is NO way you can influence inbound traffic since it is upto the ISP to decide which link to send your traffic towards

      Hope that answers your question. thanks for reading my blog! :)

      Delete
  2. Great series of posts on BGP. Question on the static route with a nexthop of discard: If we do indeed have those three subnets reachable via the customer router (RTR1- I'm assuming there is some IGP or static route to those destinations), wouldn't there already be active in the routing table? I guess I'm trying to understand the relationship between IGP's or internal reachabliity and BGP- i.e, how routes get rejected into BGP itself in the first place.

    ReplyDelete
    Replies
    1. Thanks namitha for the good words. Yes if it in the active table then the route discard is not needed. I did a simulation in GNS and therefore configured as static route.

      It can also be a starting step if you are unsure of making changes in production network. :)

      Delete