logo CCIE Blog

Helping you become a Cisco Certified Internetwork Expert


rss Entries (RSS) | rss Comments (RSS)
Welcome to Internetwork Expert's CCIE Blog

Welcome to Internetwork Expert’s CCIE Blog! This site is dedicated to helping you in your pursuit of becoming a Cisco Certified Internetwork Expert in Routing & Switching, Voice, Security, Service Provider, and Storage. Through this blog you can submit questions to our expert instructors, Brian Dennis - Quad-CCIE #2210, Brian McGahan – Triple CCIE #8593, and Petr Lapukhov - Quad-CCIE #16379. Check back daily as this blog will be updated frequently.

Click here to submit a question.

May 6th, 2008

Understanding the IP Multicast Helper-Map Command

Hi Brian,

I have a problem with the multicast helper topic, the case when a broadcast network is separated by a multicast network, and then again it continues. Can you discuss this topic?

Thanks,

Nizami

Hi Nizami,

The multicast helper-map command is similar in theory to how the unicast “ip helper-map” works. With the IP helper map feature, IP broadcast packets, such as UDP based DHCP requests, have their destination addresses translated to a unicast address, such as the DHCP server. With the IP multicast helper map feature, IP broadcast packets have their destination addresses translated to a multicast address.

The common design application of this feature is in financial trading networks where a legacy stock ticker application sends packets out as broadcast UDP. The router on the attached segment can then convert the broadcast destination to multicast, send the packet into the multicast transit network, and then on the last hop router attached to the receiver translate the multicast packet back to a broadcast. This allows the network to scale above a flat layer 2 design where all application senders and receivers are in the same IP subnet, to a hierarchical layer 3 routed multicast network, without the application itself being modified.

Configuration-wise the feature is implemented on two devices, the first hop router attached to the broadcast sender, and the last hop router attached to the broadcast receiver. The first hop router listens for broadcast packets to be received on the incoming interface attached to the sender. Based on an access-list match (usually the UDP port of the application), the router translates the destination address to a user defined multicast address, and forwards the packet out interfaces running PIM according to the multicast routing table. This design therefore assumes that the underlying PIM topology is built end-to-end. Once the last hop router receives the traffic on the incoming interface facing the multicast network, the traffic is again categorized by an access-list, and additionally by the multicast group used on the first hop. Based on the directed broadcast address defined on the last hop router the traffic is then dropped off on the LAN segment facing the receiver.

In our particular design the network looks like this:

SW1 — R4 -– R3 — R2 — R1 — SW2

SW1 is the broadcast sender (i.e. the source application), SW2 is the receiver (i.e. the destination application), R4 is the first hop router, and R1 is the last hop router. IGP and PIM adjacencies exist between R4 – R3, R3 – R2, and R2 – R1.

R4’s configuration, the first hop router, looks as follows:

R4#
interface FastEthernet0/0
 description TO SENDER APPLICATION – SW1
 ip address 173.20.47.4 255.255.255.0
 ip multicast helper-map broadcast 224.1.2.3 100
!
ip forward-protocol udp 31337
access-list 100 permit udp any any eq 31337

This configuration means that if R4 receives a UDP broadcast going to port 31337 inbound on Fa0/0 it will be translated to the multicast address 224.1.2.3. Note that the use of the “ip forward-protocol” command is necessary in order to process switch UDP traffic going to the port in question. Without process switching the helper-map feature can not correctly categorize and translate the traffic.

R1’s configuration, the last hop router, looks as follows:

R1#
interface Serial0/0.102 point-to-point
 description TO R2
 ip address 173.20.12.1 255.255.255.0
 ip pim dense-mode
 ip multicast helper-map 224.1.2.3 173.20.18.255 100
 frame-relay interface-dlci 102
!
interface FastEthernet0/0
 description TO RECEIVER – SW2
 ip address 173.20.18.1 255.255.255.0
 ip directed-broadcast
!
ip forward-protocol udp 31337
access-list 100 permit udp any any eq 31337

This configuration means that if R1 receives a UDP multicast going to the group address 224.1.2.3 at port 31337 inbound on S0/0.102 it will be translated to the directed broadcast address 173.20.18.255. Since the link 173.20.18.0/24 is directly connected and has the directed broadcast address of 173.20.18.255 by default, the configuration implies that traffic matching the helper map on S0/0.102 will be sent as a broadcast out Fa0/0.

Note the use of the “ip forward-protocol” command as before in order to process switch the UDP traffic. Additionally the “ip directed-broadcast” command is enabled on the last hop outgoing interface since in current IOS versions this is disabled by default for security purposes.

To verify the functionality of this feature we can use the IP SLA feature in the IOS to generate broadcast UDP traffic on the sender. This configuration on SW1 is as follows:

rtr 1
 type udpEcho dest-ipaddr 255.255.255.255 dest-port 31337 source-ipaddr 173.20.47.7 source-port 12345 control disable
 timeout 0
 frequency 5
rtr schedule 1 life forever start-time now

This config means that SW1 will generate a UDP packet sourced from the address 173.20.47.7 at port 12345 going to the address 255.255.255.255 at port 31337 every 5 seconds, and will not wait for a response back. The following debug on R4, the first hop router, verifies that the packet is received and is translated into multicast.

Rack20R4#debug ip packet detail
IP packet debugging is on (detailed)
IP: s=173.20.47.7 (FastEthernet0/0), d=255.255.255.255, len 44, rcvd 2
    UDP src=12345, dst=31337
Rack20R4#undebug all
All possible debugging has been turned off

Rack20R4#debug ip mpacket
IP multicast packets debugging is on
IP(0): s=173.20.47.7 (FastEthernet0/0) d=224.1.2.3 (Serial0/0) id=0, ttl=254, prot=17, len=44(44), mforward
Rack20R4#undebug all
All possible debugging has been turned off

From the unicast “debug ip packet detail” we can see the packet is received in Fa0/0 from SW2 with the proper destination and port information. Next the multicast “debug ip mpacket” shows us that the packet has been translated to multicast address 224.1.2.3 and is forwarded out Serial0/0 towards R3.

As R4, R3, R2, and R1 receive the multicast packet the multicast routing table is populated as follows.

Rack20R4#show ip mroute 224.1.2.3
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 224.1.2.3), 01:24:42/stopped, RP 0.0.0.0, flags: D
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Serial0/0, Forward/Dense, 01:24:42/00:00:00

(173.20.47.7, 224.1.2.3), 00:01:27/00:02:58, flags: T
  Incoming interface: FastEthernet0/0, RPF nbr 0.0.0.0
  Outgoing interface list:
    Serial0/0, Forward/Dense, 00:01:27/00:00:00

Rack20R3#show ip mroute 224.1.2.3
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 224.1.2.3), 01:25:36/stopped, RP 0.0.0.0, flags: D
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Serial1/1.312, Forward/Dense, 01:25:36/00:00:00
    Serial1/0, Forward/Dense, 01:25:36/00:00:00

(173.20.47.7, 224.1.2.3), 00:02:22/00:02:54, flags: T
  Incoming interface: Serial1/0, RPF nbr 173.20.0.4
  Outgoing interface list:
    Serial1/1.312, Forward/Dense, 00:02:23/00:00:00

Rack20R2#show ip mroute 224.1.2.3
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 224.1.2.3), 01:25:27/stopped, RP 0.0.0.0, flags: D
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Serial0/0.213, Forward/Dense, 01:25:27/00:00:00
    Serial0/0.201, Forward/Dense, 01:25:27/00:00:00

(173.20.47.7, 224.1.2.3), 00:02:12/00:02:54, flags: T
  Incoming interface: Serial0/0.213, RPF nbr 173.20.23.3
  Outgoing interface list:
    Serial0/0.201, Forward/Dense, 00:02:13/00:00:00

Rack20R1#show ip mroute 224.1.2.3
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 224.1.2.3), 01:25:42/stopped, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Serial0/0.102, Forward/Dense, 01:25:42/00:00:00

(173.20.47.7, 224.1.2.3), 00:02:27/00:02:57, flags: PLTX
  Incoming interface: Serial0/0.102, RPF nbr 173.20.12.2
  Outgoing interface list: Null

Once the packet is received on R1, the last hop router, the “debug ip mpacket” shows the packet coming in as multicast, while the “debug ip packet detail” shows that the packet being converted back into a broadcast. This is also verified by the “debug ip packet” output on SW2, the receiver of the packet.

Rack20R1#debug ip mpacket
IP multicast packets debugging is on
IP(0): s=173.20.47.7 (Serial0/0.102) d=224.1.2.3 id=0, ttl=251, prot=17, len=48(44), mroute olist null
Rack20R1#undebug all
All possible debugging has been turned off

Rack20R1#debug ip packet detail
IP packet debugging is on (detailed)
IP: tableid=0, s=173.20.47.7 (Serial0/0.102), d=173.20.18.255 (FastEthernet0/0), routed via RIB
Rack20R1#undebug all
All possible debugging has been turned off

Rack20SW2#debug ip packet
IP packet debugging is on
IP: s=173.20.47.7 (Vlan18), d=255.255.255.255, len 44, rcvd 2
IP: s=173.20.47.7 (Vlan18), d=255.255.255.255, len 44, stop process pak for forus packet
Rack20SW2#undebug all
All possible debugging has been turned off

This feature can also be used in the opposite manner, where a multicast packet is received, converted to broadcast, and then converted back to multicast. In either case the configuration depends on the design and functionality of the source and destination application.

May 5th, 2008

Understanding BGP Outbound Route Filtering (BGP ORF)

Hi Brian,

I’m having a problem with Workbook Volume 1 Version 4.1. ORF (Outbound Route Filtering) isn’t working for me. Any help would be appreciated.

Thank you,

JoeT

Hi Joe,

First off let’s talk a little bit about what BGP ORF (Outbound Route Filtering) is designed to do for us, and then we’ll take a look at some implementation examples.

From a customer’s point of view there are typically a limited amount of choices for what routes you can receive from your Service Provider via BGP. Usually the Service provider will give the customer the option of sending them a full table view (currently about 260,000 prefixes), just a default route, or some specific subset of the table such as a default route and the Service Provider’s locally originated prefixes. In other words a BGP Service Provider generally will not implement a complex outbound filtering policy for the customer. Instead, if the customer wants to receive just a subset view of the BGP table, the Customer Edge (CE) router has to filter prefixes inbound as they are received from the upstream Provider Edge (PE) router.

From the SP’s point of view this is the optimal design for administration. They don’t need to worry about change requests constantly coming from the customer about what routes they want to see and what routes they don’t want to see. Likewise from the customer’s point of view this is the optimal administrative design, as they do not need to send change control requests to the provider, and can arbitrarily change their filtering design on the fly. However from a device resource point of view this is not optimal from both the PE and CE routers’ perspective. The SP’s PE router must still send the full BGP table to the customer, even if the CE router filters out 99% of it. Likewise the CE router must still process all of the BGP UPDATE messages, even if the majority of them are ultimately filtered out.

Let’s take this a look at the result of this in the context of the following design:

AS100 — AS200
(PE) -– (CE)

AS 200 has an upstream peering to its Service Provider, AS 100. The BGP table of AS 200 appears as follows:

AS200_CE#show ip bgp
BGP table version is 12, local router ID is 10.0.0.200
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 0.0.0.0          10.0.0.100               0             0 100 i
*> 28.119.16.0/24   10.0.0.100                             0 100 54 i
*> 28.119.17.0/24   10.0.0.100                             0 100 54 i
*> 112.0.0.0        10.0.0.100                             0 100 54 50 60 i
*> 113.0.0.0        10.0.0.100                             0 100 54 50 60 i
*> 114.0.0.0        10.0.0.100                             0 100 54 i
*> 115.0.0.0        10.0.0.100                             0 100 54 i
*> 116.0.0.0        10.0.0.100                             0 100 54 i
*> 117.0.0.0        10.0.0.100                             0 100 54 i
*> 118.0.0.0        10.0.0.100                             0 100 54 i
*> 119.0.0.0        10.0.0.100                             0 100 54 i

Let’s suppose that from AS 200’s perspective the only routes that they want to receive from AS 100 are the default route plus the networks 28.119.16.0/24 and 28.119.17.0/24. Traditional filtering would dictate that on the CE router a prefix-list would be configured and applied as follows:

router bgp 200
 neighbor 10.0.0.100 remote-as 100
 neighbor 10.0.0.100 prefix-list AS_100_INBOUND in
!
ip prefix-list AS_100_INBOUND seq 5 permit 0.0.0.0/0
ip prefix-list AS_100_INBOUND seq 10 permit 28.119.16.0/24
ip prefix-list AS_100_INBOUND seq 15 permit 28.119.17.0/24

The result of this configuration in AS 200’s BGP table is as follows:

AS200_CE#show ip bgp
BGP table version is 4, local router ID is 10.0.0.200
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 0.0.0.0          10.0.0.100               0             0 100 i
*> 28.119.16.0/24   10.0.0.100                             0 100 54 i
*> 28.119.17.0/24   10.0.0.100                             0 100 54 i

Although the filtering goal is achieved, efficiency is not. From the below debug output we can see exactly how AS 200’s CE router processes the updates from the upstream PE and makes a decision on what to install:

AS200_CE#debug ip bgp updates
BGP updates debugging is on for address family: IPv4 Unicast
AS200_CE#clear ip bgp 100
%BGP-5-ADJCHANGE: neighbor 10.0.0.100 Down User reset
%BGP-5-ADJCHANGE: neighbor 10.0.0.100 Up
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, metric 0, path 100
BGP(0): 10.0.0.100 rcvd 0.0.0.0/0
BGP(0): Revise route installing 1 of 1 routes for 0.0.0.0/0 -> 10.0.0.100(main) to main IP table
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, path 100 54
BGP(0): 10.0.0.100 rcvd 115.0.0.0/8 -- DENIED due to: distribute/prefix-list;
BGP(0): 10.0.0.100 rcvd 114.0.0.0/8 -- DENIED due to: distribute/prefix-list;
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, path 100 54
BGP(0): 10.0.0.100 rcvd 119.0.0.0/8 -- DENIED due to: distribute/prefix-list;
BGP(0): 10.0.0.100 rcvd 118.0.0.0/8 -- DENIED due to: distribute/prefix-list;
BGP(0): 10.0.0.100 rcvd 117.0.0.0/8 -- DENIED due to: distribute/prefix-list;
BGP(0): 10.0.0.100 rcvd 116.0.0.0/8 -- DENIED due to: distribute/prefix-list;
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, path 100 54
BGP(0): 10.0.0.100 rcvd 28.119.17.0/24
BGP(0): 10.0.0.100 rcvd 28.119.16.0/24
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, path 100 54 50 60
BGP(0): 10.0.0.100 rcvd 113.0.0.0/8 -- DENIED due to: distribute/prefix-list;
BGP(0): 10.0.0.100 rcvd 112.0.0.0/8 -- DENIED due to: distribute/prefix-list;
BGP(0): Revise route installing 1 of 1 routes for 28.119.16.0/24 -> 10.0.0.100(main) to main IP table
BGP(0): Revise route installing 1 of 1 routes for 28.119.17.0/24 -> 10.0.0.100(main) to main IP table
AS200_CE#

Note that the AS200_CE router generates the log message “DENIED due to: distribute/prefix-list;” for every prefix that is filtered. This means that if this were the public BGP table of 260,000+ routes this router would have to process every update message just to discard it. This is where BGP Outbound Route Filtering (ORF) comes in.

Outbound Route Filtering Capability for BGP-4 is currently an IETF draft (http://www.ietf.org/internet-drafts/draft-ietf-idr-route-filter-16.txt) that describes an optimization on how prefix filtering can occur between a Customer Edge (CE) router and a Provider Edge (PE) router that are exchanging IPv4 unicast BGP prefixes. In the design we saw above the upstream PE router sent the full BGP table downstream to the CE router, and filtering was applied inbound on the downstream CE. With BGP ORF the downstream CE router dynamically tells the upstream PE router what routes to filter *outbound*. This means that the downstream CE router will only receive update messages about the prefixes that it wants.

Implementation wise the first step of this feature is for the BGP neighbors to negotiate that they both support the BGP ORF capability. Configuration-wise this looks as follows:

AS100_PE#
router bgp 100
 neighbor 10.0.0.200 remote-as 200
 !
 address-family ipv4
 neighbor 10.0.0.200 capability orf prefix-list receive
 neighbor 204.12.25.254 activate
 exit-address-family

AS200_CE#
router bgp 200
 neighbor 10.0.0.100 remote-as 100
 !
 address-family ipv4
 neighbor 10.0.0.100 capability orf prefix-list send
 neighbor 10.0.0.100 prefix-list AS_100_INBOUND in
 exit-address-family
!

The result of this configuration on AS 200’s CE is the same, however the behind the scenes mechanism by which it is accomplished is different. First, AS100_PE and AS200_CE negotiate the BGP ORF capability during initial BGP peering establishment. The success of this negotiation can be seen as follows.

AS100_PE#show ip bgp neighbors 10.0.0.200 | begin AF-dependant capabilities:
  AF-dependant capabilities:
    Outbound Route Filter (ORF) type (128) Prefix-list:
      Send-mode: received
      Receive-mode: advertised
  Outbound Route Filter (ORF): received (3 entries)
                                 Sent       Rcvd
  Prefix activity:               ----       ----
    Prefixes Current:               2          0
    Prefixes Total:                 2          0
    Implicit Withdraw:              0          0
    Explicit Withdraw:              0          0
    Used as bestpath:             n/a          0
    Used as multipath:            n/a          0

                                   Outbound    Inbound
  Local Policy Denied Prefixes:    --------    -------
    ORF prefix-list:                      8        n/a
    Total:                                8          0
  Number of NLRIs in the update sent: max 4, min 2

*OUTPUT OMITTED*

AS200_CE#show ip bgp neighbors 10.0.0.100 | begin AF-dependant capabilities:
  AF-dependant capabilities:
    Outbound Route Filter (ORF) type (128) Prefix-list:
      Send-mode: advertised
      Receive-mode: received
  Outbound Route Filter (ORF): sent;
  Incoming update prefix filter list is AS_100_INBOUND
                                 Sent       Rcvd
  Prefix activity:               ----       ----
    Prefixes Current:               0          3 (Consumes 156 bytes)
    Prefixes Total:                 0          4
    Implicit Withdraw:              0          1
    Explicit Withdraw:              0          0
    Used as bestpath:             n/a          3
    Used as multipath:            n/a          0

                                   Outbound    Inbound
  Local Policy Denied Prefixes:    --------    -------
    Suppressed duplicate:                 0          1
    Bestpath from this peer:              3        n/a
    Total:                                3          1
  Number of NLRIs in the update sent: max 0, min 0

*OUTPUT OMITTED*

Next, AS 200’s CE router tells AS 100’s PE router which prefixes it wants to receive. The logic of this configuration is that AS 200 is “sending” a prefix-list of what routes it wants, while AS 100 is “receiving” the prefix-list of what the downstream neighbor wants. The reception of the prefix-list by the upstream PE can be verified as follows.

AS100_PE#show ip bgp neighbors 10.0.0.200 received prefix-filter
Address family: IPv4 Unicast
ip prefix-list 10.0.0.200: 3 entries
   seq 5 permit 0.0.0.0/0
   seq 10 permit 28.119.16.0/24
   seq 15 permit 28.119.17.0/24

AS100_PE#show ip prefix-list

Note that AS 100’s PE router received the list from AS 200’s CE, but the prefix-list does not show up locally in the running config. AS 100’s PE router then turns around and uses the prefix-list as an outbound filter towards the downstream CE. This can be verified two ways, by viewing the UPDATE messages on the downstream CE, and by looking at what the upstream PE is sending.

AS100_PE#show ip bgp neighbors 10.0.0.200 advertised-routes
BGP table version is 11, local router ID is 10.0.0.100
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Originating default network 0.0.0.0

   Network          Next Hop            Metric LocPrf Weight Path
*> 28.119.16.0/24   204.12.25.254            0             0 54 i
*> 28.119.17.0/24   204.12.25.254            0             0 54 i

Total number of prefixes 2
AS100_PE#

AS200_CE#debug ip bgp updates
BGP updates debugging is on for address family: IPv4 Unicast
AS200_CE#clear ip bgp 100
AS200_CE#
BGP(0): no valid path for 0.0.0.0/0
BGP(0): no valid path for 28.119.16.0/24
BGP(0): no valid path for 28.119.17.0/24
%BGP-5-ADJCHANGE: neighbor 10.0.0.100 Down User reset
BGP(0): nettable_walker 0.0.0.0/0 no best path
BGP(0): nettable_walker 28.119.16.0/24 no best path
BGP(0): nettable_walker 28.119.17.0/24 no best path
%BGP-5-ADJCHANGE: neighbor 10.0.0.100 Up
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, metric 0, path 100
BGP(0): 10.0.0.100 rcvd 0.0.0.0/0
BGP(0): Revise route installing 1 of 1 routes for 0.0.0.0/0 -> 10.0.0.100(main) to main IP table
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, path 100 54
BGP(0): 10.0.0.100 rcvd 28.119.17.0/24
BGP(0): 10.0.0.100 rcvd 28.119.16.0/24
BGP(0): Revise route installing 1 of 1 routes for 28.119.16.0/24 -> 10.0.0.100(main) to main IP table
BGP(0): Revise route installing 1 of 1 routes for 28.119.17.0/24 -> 10.0.0.100(main) to main IP table
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, metric 0, path 100
BGP(0): 10.0.0.100 rcvd 0.0.0.0/0...duplicate ignored
AS200_CE#

Note that the above output is different from the previous debug of AS 200’s CE, because now it does not receive the extra update messages. AS 200 instead now receives only the routes that it has requested of the upstream PE.

If edits of the filter are required the downstream CE can change the prefix-list, and then through the BGP Route Refresh capability, advertise the new prefix-list upstream to the PE to be used as a new downstream filter. Configuration wise this is accomplished as follows.

AS200_CE#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
AS200_CE(config)#ip prefix-list AS_100_INBOUND permit 114.0.0.0/8
AS200_CE(config)#end
AS200_CE#
%SYS-5-CONFIG_I: Configured from console by console
AS200_CE#clear ip bgp 100 in prefix-filter
AS200_CE#
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, path 100 54
BGP(0): 10.0.0.100 rcvd 114.0.0.0/8
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, path 100 54
BGP(0): 10.0.0.100 rcvd 28.119.17.0/24...duplicate ignored
BGP(0): 10.0.0.100 rcvd 28.119.16.0/24...duplicate ignored
BGP(0): Revise route installing 1 of 1 routes for 114.0.0.0/8 -> 10.0.0.100(main) to main IP table
BGP(0): 10.0.0.100 rcvd UPDATE w/ attr: nexthop 10.0.0.100, origin i, metric 0, path 100
BGP(0): 10.0.0.100 rcvd 0.0.0.0/0...duplicate ignored
AS200_CE#

From the “debug ip bgp updates” output we can now see that the upstream PE added the update 114.0.0.0/8, in addition to the previous three prefixes that were installed. Upstream verification on the PE also indicates this.

AS100_PE#show ip bgp neighbors 10.0.0.200 received prefix-filter
Address family: IPv4 Unicast
ip prefix-list 10.0.0.200: 4 entries
   seq 5 permit 0.0.0.0/0
   seq 10 permit 28.119.16.0/24
   seq 15 permit 28.119.17.0/24
   seq 20 permit 114.0.0.0/8
AS100_PE#show ip bgp neighbors 10.0.0.200 advertised-routes
BGP table version is 11, local router ID is 10.0.0.100
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Originating default network 0.0.0.0

   Network          Next Hop            Metric LocPrf Weight Path
*> 28.119.16.0/24   204.12.25.254            0             0 54 i
*> 28.119.17.0/24   204.12.25.254            0             0 54 i
*> 114.0.0.0        204.12.25.254                          0 54 i

Total number of prefixes 3
AS100_PE#

For more information on this feature:

Outbound Route Filtering Capability for BGP-4
http://www.ietf.org/internet-drafts/draft-ietf-idr-route-filter-16.txt

BGP Prefix-Based Outbound Route Filtering
http://www.cisco.com/en/US/docs/ios/12_2t/12_2t11/feature/guide/ft11borf.html

February 23rd, 2008

Catalyst QoS: The 3550 Explained

QoS features available on Catalyst switch platforms have specific limitations, dictated by the hardware design of modern L3 switches, which is heavily optimized to handle packets at very high rates. Catalyst switch QoS is implemented using TCAM (Ternary Content Addressable Tables) - fast hardware lookup tables - to store all QoS configurations and settings. We start out Catalyst QoS overview with the old, long time available in the CCIE lab, the Catalyst 3550 model.

Read the rest of this entry »

January 28th, 2008

Poor Man’s VPLS

Let’s say you get a bunch of inexpensive (but a bit outdated) routers (36XX or 72Xx) and some really nice (maybe not so cheap) Cisco switches (e.g. 3550/3560) and you would like to provide a VPLS-like service to your customers. Since VPLS is a service available only on more powerful Cisco platforms, we have to figure a way to simulate Multipoint Ethernet L2 VPN over a packet switching network (PSN) using only “convenient” point-to-point L2 VPN services.

Let model a situation where we have a number of routers connected over (PSN), with an ethernet switch connected to router at every location:

VPLS with L2TPV3

What we can do, is connect ethernet ports using pseudowires to form a virtual ring topology over PSN. That is, refeferring to our picture, xconnect routers’ ethernet ports counter-clockwise, say xconnect E0/0 of R3 with E0/1 of R4, then E0/0 of R4 with E0/1 of R5 and finally E0/0 of R5 with E0/1 of R3. Effectively, we will form an ethernet ring, partially connected over convenient switches, and partially using L2VPN pseudowires. Router configurations look pretty much similar, for example at R3 we would have something like this


R3:
pseudowire-class PW_CLASS
 encapsulation l2tpv3
 ip local interface Loopback0
!
interface Loopback0
 ip address 150.1.3.3 255.255.255.255

!
! Xconnecting E0/0 of R3 with E0/1 of R4
!
interface Ethernet0/0
 no ip address
 xconnect 150.1.4.4 34 encapsulation l2tpv3 pw-class PW_CLASS

!
! Xconnecting E0/1 of R3 with E0/0 of R5
!
interface Ethernet0/1
 no ip address
 xconnect 150.1.5.5 35 pw-class PW_CLASS

!
! Frame-Relay is used to connect to other routers (PSN network)
!
interface Serial1/0
 no ip address
 encapsulation frame-relay
!
interface Serial1/0.34 point-to-point
 ip address 150.1.34.3 255.255.255.0
 frame-relay interface-dlci 304
!
interface Serial1/0.35 point-to-point
 ip address 150.1.35.3 255.255.255.0
 frame-relay interface-dlci 305 

!
! OSPF is used as a sample IGP
!
router ospf 1
 router-id 150.1.3.3
 log-adjacency-changes
 network 0.0.0.0 255.255.255.255 area 0

Speaking honestly, it’s not “classic” VPLS in true sense:

Firstly, STP should be running over ring topology, in order to block redundant ports. One can use star topology and disable STP, but this will introduce a single point of failure into the network. Classic VPLS does not run STP over packet core, only a full-mesh of pseudowires.

Secondly, there is no MAC-address learning for pseudowires, since they are point-to-point in essense. MAC addresses are learned by switches, and this impose a usual scalability restriction (though cisco switches may allow you to scale to a few thousands of MAC addresses in their tables).

However, this is funny and simple example of how you can use a simple concept to come up with a more complicated solution.

January 25th, 2008

BGP Time-Based Policy Routing

Sometimes people need to conditionally advertise routes into BGP table based on time of day. Say, we may want to adversite IGP prefix 150.1.1.0/24 with community 1:100 during daytime and with community 1:200 at the other time. Back in days, the procedure was easy - you had to create time based ACL, and use it in route-map to set communities:


time-range DAY
 periodic daily 9:00 to 18:00

access-list 101 permit ip any any time-range DAY

route-map SET_COMMUNITY 10
 match ip address 101
 set community 1:100
!
route-map SET_COMMUNITY 20
 set community 1:200

This construct worked fine back in days with 12.2T and 12.3 IOSes up to 12.3(17)T. However, since 12.3(17)T, BGP scanner behavior has changed significally. Up to the new version, redistribution into BGP table was based on BGP scanner periodically polling the IGP routes every scan-interval (one minute by default). With the new IOS code, redistribution is purely event driven: a new route is added/deleted from BGP table based on event, signaled by IGP (e.g. IGP route withdrawn, next-hop change etc). This change in BGP scanner behavior was not clearly documented, unlike the related BGP support for next-hop address tracking feature. Ovbsiously, a change in time-range is not treated as an IGP event, hence the filter does not work anymore.

Still, there is a number of workarounds. Here is one of them: we use time-based ACL to filter or permit ICMP packets, and advertise routers based on that virtual “reachability” info.

First, we create time-range and time-based access-list:


time-range DAY
 periodic daily 9:00 to 18:00
!
access-list 101 permit ip any any time-range DAY

Next we create a special loopback interface, which is used send ICMP echo packets to “ourself” and attach the ACL to the interface to filter incoming (looped back) packets:


interface Loopback0
 ip address 150.1.1.1 255.255.255.255
 ip access-group 101 in

We create a new IP SLA monitor, to send ICMP echo packets over loopback interface. If the time-based ACL permit pings, the monitor state will be “reachable”


ip sla monitor 1
 type echo protocol ipIcmpEcho 150.1.1.1
 timeout 100
 frequency 1

Next we track our “pinger” state. The first tracker is “on” when the loopback is “open” by packet filter, the second one is active when the time-based ACL filters packets:


track 1 rtr 1 reachability
!
! Inverse logic
!
track 2 list boolean and
 object 1 not

The we create two static routes, bound to the mentioned trackets. That is, the static route with tag 100 is only active when loopback is “open”, i.e. time-based ACL permits packets. The other static route is active only when time-range is inactive (the second tracker tells that the destination is “reachable”):


ip route 150.1.1.0 255.255.255.0 Loopback0 150.1.1.254 tag 100 track 1
ip route 150.1.1.0 255.255.255.0 Loopback0 150.1.1.253 tag 200 track 2

Now we redistribute static routes into BGP, based on tag values, and also set communities based on the tags:


router bgp 1
 redistribute static route-map STATIC_TO_BGP
!
route-map STATIC_TO_BGP permit 10
 match tag 100
 set community 1:100
!
route-map STATIC_TO_BGP permit 20
 match tag 200
 set community 1:200

This is also a funny example of how you can tie up together multiple IOS features at the same time.

January 20th, 2008

Example Configurations for PPP over Ethernet (PPPoE)

Below are a couple example configurations for PPPoE. Note that you can run into MTU issues when trying to use OSPF over PPPoE. This can easily be resolved by using the “ip ospf mtu-ignore” command as the dialer interface’s MTU is 1492 while the virtual-template’s (virtual-access) MTU is 1500.

*** Client ***
interface Ethernet0/0
 pppoe enable
 pppoe-client dial-pool-number 1
!
interface Dialer1
 ip address 142.1.35.5 255.255.255.0
 encapsulation ppp
 dialer-pool 1
 dialer persistent

*** Server ***

vpdn enable
!
vpdn-group CISCO
 accept-dialin
 protocol pppoe
 virtual-template 1
!
interface Ethernet0/0
 pppoe enable
!
interface Virtual-Template1
 ip address 142.1.35.3 255.255.255.0

The next example is using DHCP to assign the client their IP address:

*** Client ***

interface Ethernet0/1
 pppoe enable
 pppoe-client dial-pool-number 1
!
interface Dialer1
 ip address dhcp
 encapsulation ppp
 dialer pool 1
 dialer persistent

*** Server ***

ip dhcp excluded-address 191.1.45.1 190.12.45.3
!
ip dhcp pool MYPOOL
 network 191.1.45.0 255.255.255.0
!
vpdn enable
!
vpdn-group CISCO
 accept-dialin
 protocol pppoe
 virtual-template 1
!
interface Ethernet0/0
 pppoe enable
!
interface Virtual-Template1
 ip address 191.1.45.5 255.255.255.0
 peer default ip address dhcp-pool MYPOOL

January 11th, 2008

BGP Order of Preference

For inbound updates the order of preference is:
1. route-map
2. filter-list
3. prefix-list, distribute-list

For outbound updates the order of preference is:
1. prefix-list, distribute-list
2. filter-list
3. route-map

January 11th, 2008

QoS Order of Operations

Inbound
1. QoS Policy Propagation through Border Gateway Protocol (BGP) (QPPB)
2. Input common classification
3. Input ACLs
4. Input marking (class-based marking or Committed Access Rate (CAR))
5. Input policing (through a class-based policer or CAR)
6. IP Security (IPSec)
7. Cisco Express Forwarding (CEF) or Fast Switching

Outbound
1. CEF or Fast Switching
2. Output common classification
3. Output ACLs
4. Output marking
5. Output policing (through a class-based policer or CAR)
6. Queueing (Class-Based Weighted Fair Queueing (CBWFQ) and Low Latency Queueing (LLQ)), and Weighted Random Early Detection (WRED)

January 11th, 2008

IOS Egress and Ingress Order of Operations

Egress Features
1. WCCP Redirect
2. NAT Inside-to-Outside
3. Network Based Application Recognition (NBAR)
4. BGP Policy Accounting
5. Output QoS Classification
6. Output ACL check
7. Output Flexible Packet Matching (FPM)
8. DoS Tracker
9. Output Stateful Packet Inspection (IOS FW)
10. TCP Intercept
11. Output QoS Marking
12. Output Policing (CAR)
13. Output MAC/Precedence Accounting
14. IPsec Encryption
15. Egress NetFlow
16. Egress Flexible NetFlow
17. Egress RITE
18. Output Queuing (CBWFQ, LLQ, WRED)

Ingress Features
1. IP Traffic Export (RITE)
2. QoS Policy Propagation through BGP (QPPB)
3. Ingress Flexible NetFlow
4. Network Based Application Recognition (NBAR)
5. Input QoS Classification
6. Ingress NetFlow
7. IOS IPS Inspection
8. Input Stateful Packet Inspection (IOS FW)
9. Input ACL
10. Input Flexible Packet Matching (FPM)
11. IPsec Decryption (if encrypted)
12. Unicast RPF check
13. Input QoS Marking
14. Input Policing (CAR)
15. Input MAC/Precedence Accounting
16. NAT Outside-to-Inside
17. Policy Routing

January 6th, 2008

Understanding BGP Regular Expressions

Hi Brian,

Can you explain the easiest way to construct a regular expression in BGP?

Thanks,

Rowan

Hi Rowan,

Regular expressions are strings of special characters that can be used to search and find character patterns. Within the scope of BGP in Cisco IOS regular expressions can be used in show commands and AS-Path access-lists to match BGP prefixes based on the information contained in their AS-Path.

In order to understand how to build regular expressions we first need to know what the character definitions are for the regex function of IOS. The below table illustrates the regex characters and their usage. This information is contained in the Cisco IOS documentation under the Appendix of Cisco IOS Terminal Services Configuration Guide, Release 12.2.

+------------------------------------------------------+
| CHAR | USAGE                                         |
+------------------------------------------------------|
|  ^   | Start of string                               |
|------|-----------------------------------------------|
|  $   | End of string                                 |
|------|-----------------------------------------------|
|  []  | Range of characters                           |
|------|-----------------------------------------------|
|  -   | Used to specify range ( i.e. [0-9] )          |
|------|-----------------------------------------------|
|  ( ) | Logical grouping                              |
|------|-----------------------------------------------|
|  .   | Any single character                          |
|------|-----------------------------------------------|
|  *   | Zero or more instances                        |
|------|-----------------------------------------------|
|  +   | One or more instance                          |
|------|-----------------------------------------------|
|  ?   | Zero or one instance                          |
|------|-----------------------------------------------|
|  _   | Comma, open or close brace, open or close     |
|      | parentheses, start or end of string, or space |
+------------------------------------------------------+

Some commonly used regular expressions include:

+-------------+---------------------------+
| Expression  | Meaning                   |
|-------------+---------------------------|
| .*          | Anything                  |
|-------------+---------------------------|
| ^$          | Locally originated routes |
|-------------+---------------------------|
| ^100_       | Learned from AS 100       |
|-------------+---------------------------|
| _100$       | Originated in AS 100      |
|-------------+---------------------------|
| _100_       | Any instance of AS 100    |
|-------------+---------------------------|
| ^[0-9]+$    | Directly connected ASes   |
+-------------+---------------------------+

Let’s break some of the above expressions down step-by-step. The first one “.*” says to match any single character (“.”), and then find zero or more instances of that single character (“*”). This means zero or more instances or any character, which effectively means anything.

The next string “^$” says to match the beginning of the string (“^”), and then immediately match the end of the string (“$”). This means that the string is null. Within the scope of BGP the only time that the AS-Path is null is when you are looking at a route within your own AS that you or one of your iBGP peers has originated. Hence this matches locally originated routes.

The next string “^100_” says to match the beginning of the string (“^”), the literal characters 100, and then a comma, an open or close brace, an open or close, a parentheses, the start or end of the string, or a space (“_”). This means that the string must start with the number 100 followed by any non-alphanumeric character. In the scope of BGP this means that routes which are learned from the AS 100 will be matched, as 100 will be the first AS in the path when AS 100 is sending us routes.

The next string “_100$” is the exact opposite of the previous one. This string says to start with any non-alphanumeric character (“_”), followed by the literal characters 100, followed by the end of the string (“$”). This means that AS 100 is the last AS in the path, or in other words that the prefix in question was originated by AS 100.

The next string “_100_” is the combination of the two previous strings with some extra matches. This string means that the literal characters 100 are set between any two non-alphanumeric characters. The first of these could be the start of the string, which would match routes learned from AS 100, while the second of these could be the end of the string, which would match routes originated in AS 100. Another case could be that the underscores represent spaces, in which the string would match any other AS path information as long as “ 100 ” is included somewhere. This would match any routes which transit AS 100, and therefore “_ASN_” is generally meant to match routes that transit a particular AS as defined by the number “ASN”.

The final string “^[0-9]+$” is a little more complicated match. Immediately we can see that the string starts (“^”), and we can see later that it ends (“$”). In the middle we see a range of numbers 0-9 in brackets, followed by the plus sign. The numbers in brackets mean that any number from zero to nine can be matched, or in other words, any number. Next we have the plus sign which means one or more instances. This string “[0-9]+” therefore means one or more instance of any number, or in other words any number including numbers with multiple characters (i.e. 1, 12, 123, 1234, 12345678, etc.). When we combine these all together this string means routes originated in any directly connected single AS, or in other words, the routes directly originated by the peers of your AS.

Now let’s look at a more complicated match, and using the above character patterns we will see how we can construct the expression step by step. Suppose we have the following topology below, where we are looking at the network from the perspective of AS 100.

+--------+ +--------+ +--------+ +--------+
| AS 200 |-| AS 201 |-| AS 202 |-| AS 203 |\
+--------+ +--------+ +--------+ +--------+ \
                                             \
           +--------+ +--------+ +--------+\  \
           | AS 300 |-| AS 301 |-| AS 302 | \  \
           +--------+ +--------+ +--------+  \  -+--------+
                                              >--| AS 100 |
                      +--------+ +--------+  /  -+--------+
                      | AS 400 |-| AS 401 | /  /
                      +--------+ +--------+/  /
                                             /
                                 +--------+ /
                                 | AS 500 |/
                                 +--------+

AS 100 peers with ASes 203, 302, 401, and 500, who each have peers as diagramed above. AS 100 wants to match routes originated from its directly connected customers (ASes 203, 302, 401, and 500) in addition to routes originated from their directly connected customers (ASes 202, 301, and 400). The easiest way to create this regular expression would be to think about what we are first trying to match, and then write out all possibilities of these matches. In our case these possibilities are:

203
203 202
302
302 301
401
401 400
500

Now we could simply create an expression with multiple lines (7 lines to be exact) that would match all of the possible AS paths, but suppose that AS 100 wants to keep this match as flexible as possible so that it will apply to any other ASes in the future. Now let’s try to generalize the above AS-Path information into a regex.

First off we know that each of the matches is going to start and going to end. This means that the first character we will have is “^” and the last character is “$”. Next we know that between the “^” and “$” there will be either one AS or two ASes. We don’t necessarily know what numbers these ASes will be, so for the time being let’s use the placeholder “X”. Based on this our new possible matches are:

^X$
^X X$

Next let’s reason out what X can represent. Since X is only one single AS, there will be no spaces, commas, parentheses, or any other special type characters. In other words, X must be a number. However, since we don’t know what the exact path is, we must take into account that X may be a number with more than one character (i.e. 10, 123, or 10101). This essentially equates to one or more instance of any number zero through nine. In regular expression syntax our two matches would therefore now read:

^[0-9]+$
^[0-9]+ [0-9]+$

This expressions reads that we either have a number consisting of one or more characters zero through nine, or a number consisting of one or more characters zero through nine followed by a space and then another number consisting of one or more characters zero through nine. This brings our expression down to two lines as opposed to our original seven, but let’s see how we can combine the above two as well. To combine them, first let us compare what is different between them.

^[0-9]+$
^[0-9]+ [0-9]+$

From looking at the expressions it is evident that the sequence “ [0-9]+” is the difference. In the first case “ [0-9]+” does not exist in the expression. In the second case “ [0-9]+” does exist in the expression. In other words, “ [0-9]+” is either true or false. True or false (0 or 1) is represented by the character “?” in regex syntax. Therefore we can reduce our expression to:

^[0-9]+ [0-9]+?$

At this point we run into a problem with the order of operations of the regex. As denoted above the question mark will apply only to the plus sign, and not to the range [0-9]. Instead, we want the question mark to apply to the string “ [0-9]+” as a whole. Therefore this string needs to be grouped together using parentheses. Parentheses are used in regular expressions as simply a logical grouping. Therefore our final expression reduces to:

^[0-9]+( [0-9]+)?$

Note that to match a question mark in IOS, the escape sequence CTRL-V or ESC-Q must be entered first, otherwise the IOS parser will interpret the question mark as an attempt to invoke the context sensitive help.

-->