SDN in Action: Deploy VXLAN with MP-BGP EV_P_N
薛国锋
The initial VXLAN standard relies on the data-plane flood-and-learn behavior for remote VTEP peer discovery and end-host learning, which presents a challenge for its scalability. In order to overcome the limitations of the flood-and-learn, MP-BGP EV_P_N is designed and used as the control plane for VXLAN, and provides protocol-based VTEP peer discovery and end-host reachability information distribution. Futuremore it provides control-plane and data-plane separation, and serves as the unfied control plane for both Layer 2 and Layer 3 forwarding – IRB(Integrated Routing and Bridging) in a VXLAN overlay network. VXLAN and MP-BGP EV_P_N are the important additons to SDN allowing it to scale better and interact with standards-track protocols.
Today we are going to build a mini-lab environment with GNS3 and NX-OSv 9000, and get some hands-on experience with VXLAN and MP-BGP EV_P_N:
- Setup the mini-lab environment
- VXLAN with MP-BGP EV_P_N – Bridging
- VXLAN with MP-BGP EV_P_N – Integrated Routing and Bridging
Setup the mini-lab environment
Download the appliance file of NX-Osv 9000 for GNS3 and its images:
- ‘cisco-nxosv9k.gns3a’;
- ‘hda.qcow2’;
- ‘OVMF-20160813.fd’.
Import ‘cisco-nxosv9k.gns3a’ to GNS3:
Run NX-OSv 9000 the first time:
Abort Auto Provisioning and continue with normal setup ?(yes/no)[n]: yes
Do you want to enforce secure password standard (yes/no) [y]: yes
Enter the password for "admin": 2018ciscoUS
Confirm the password for "admin": 2018ciscoUS
Would you like to enter the basic configuration dialog (yes/no): no
User Access Verification
login: admin
Password: 2018ciscoUS
switch# conf
switch(config)# hostname spine
spine# copy running-config startup-config
spine# copy running-config myconfig.bak
spine# copy myconfig.bak startup-config
spine# reload
This command will reboot the system. (y/n)? [n] yes
loader > boot nxos.7.0.3.I7.2.bin
VXLAN with MP-BGP EV_P_N – Bridging
In this scenario, MP-BGP EV_P_N is deployed on the leaf nodes to distribute MAC addresss-only routes, and VTEP devices in a given VNI can know about other MAC address of end hosts in the same VNI. Distribution of MAC addresses through MP-BGP EV_P_N allows unknown unicast flooding in the VXLAN to be reduced significantly.
The EV_P_N NLRI is carried in BGP using the BGP multiprotocol extension with a new address family called L2V_P_N EV_P_N. Similar to the V_P_Nv4 address family in the BGP MPLS-based IP V_P_N, the L2V_P_N EV_P_N address family for EV_P_N uses route distinguishers (RDs) to maintain uniqueness among identical routes in different VRF instances, and route targets (RTs) to define the policies that determine how routes are advertised and shared by different VRF instances.
pc11> ip 192.168.1.1/16
pc11> save
pc12> ip 192.168.1.2/16
pc12> save
pc21> ip 192.168.2.1/16
pc11> save
pc22> ip 192.168.2.2/16
pc22> save
spine1# show ip route
spine2# show ip mroute
leaf1# show nve peers
leaf1# show nve vni
leaf1# show vxlan interface
leaf1# show bgp l2V_P_N eV_P_N summary
leaf1# show system internal l2fwder mac
leaf1# show bgp l2V_P_N eV_P_N
leaf1# show l2route eV_P_N mac all
leaf1# show l2route eV_P_N mac-ip all
VXLAN Bridging: 192.168.1.1 -> 192.168.2.2, VNI 100192
VXLAN Bridging: 192.168.2.2 -> 192.168.1.1, VNI 100192
VXLAN with MP-BGP EV_P_N – Integrated Routing and Bridging
MP-BGP EV_P_N is designed to distribute NLRI – network layer reachability information for the network. It can advertise both the MAC and IP addresses of EV_P_N VXLAN end hosts, and accordingly support the integrated routing and bridging. Each VTEP performs local-data-plane learning to obtain MAC and IP address information from its locally attached hosts and then distribute the informtion through the MP-BGP EV_P_N control plane to other VTEPs. The underlay network provides IP reachability for all the VTEP addresses that are used to route the encapsulated VXLAN packets toward the egress VTEP, and doesn’t need to learn the EV_P_N routes, which simplifies the underlay network operation and increases its stability and scalability.
Two integrated routing and bridging models are defined: asymmetric IRB and symmetric IRB. Symmetric IRB is widely adopted for its scalability advantages and simplified Layer 2 and Layer 3 multitenancy support. With symmetric IRB, both the ingress and egress VTEPs perform Layer 2 and Layer 3 lookups.
For a tenant, its VRF instance in each VTEP is mapped to a unique Layer 3 VNI in the network. All inter-VXLAN routed traffic is encapsulated with the Layer 3 VNI in the VXLAN header. The receiving VTEP uses this VNI to determine the VRF context in which the inner IP packet needs to be forwarded. This VNI also provides the basis for enforcing Layer 3 segmentation in the data plane. Each VTEP has a unique router MAC address that other VTEPs can use as the inner destination MAC address for the routed VXLAN packet.
An EV_P_N VXLAN tenant can have multiple Layer 2 networks, each with a corresponding VNI. These Layer 2 networks are bridge domains in the overlay network. The VNIs that are associated with them are often referred to as Layer 2 VNIs. If VXLAN routing is required, then each tenant also needs a Layer 3 VNI in symmetric IRB. Although a VTEP can have all or a subset of the Layer 2 VNIs in a VXLAN EV_P_N overlay, it must have the Layer 3 VNI for inter-VXLAN routing. All VTEPs in an EV_P_N must have the same Layer 3 VNI.
When the destination MAC address in the original packet header does not belong to the local VTEP, then VXLAN bridging needs to occur. In other words, the originating VTEP will perform a Layer 2 lookup and bridge the packet to the destination VTEP. If the destination MAC address matches the anycast gateway MAC address, then VXLAN routing needs to occur. In this case, the originating VTEP will perform a Layer 3 lookup and then encapsulate the packet with the Layer 3 VNI. The destination VTEP will then receive the traffic and will perform another routing lookup based on the inner IP header.
In MP-BGP EV_P_N, any VTEP participating in a VNI can use the distributed anycast gateway feature for end hosts in its IP subnet by supporting the same virtual gateway IP address and the virtual gateway MAC address. With the anycast gateway function in EV_P_N, an end host in a VNI can always use its local VTEP for this VNI as its default gateway to send traffic outside its IP subnet. This capability enables optimal forwarding for northbound traffic from end hosts in the VXLAN overlay network. A distributed anycast gateway also offers the benefit of transparent host mobility in the VXLAN overlay network. Because the gateway IP address and virtual MAC address are identically provisioned on all VTEPs within a VNI, when an end host moves from one VTEP to another, it doesn’t need to send another ARP request to relearn the gateway MAC address.
pc1_1> ip 4.1.1.10/24 4.1.1.1
pc1_1> save
pc1_2> ip 4.1.1.11/24 4.1.1.1
pc1_2> save
pc2_1> ip 4.2.2.10/24 4.2.2.1
pc2_1> save
pc2_2> ip 4.2.2.11/24 4.2.2.1
pc2_2> save
leaf1# show nve peers
leaf1# show nve vni
leaf1# show vxlan interface
leaf1# show l2route eV_P_N mac all
leaf1# show l2route eV_P_N mac-ip all
leaf1# show ip route vrf vxlan-900001
VXLAN Bridging: 4.1.1.10 -> 4.1.1.11, VNI 2001001
VXLAN Bridging: 4.2.2.10 -> 4.2.2.11, VNI 2001002
VXLAN Routing: 4.1.1.10 -> 4.2.2.11, VNI 900001
leaf1# ping 4.1.1.1 vrf vxlan-900001
leaf1# ping 4.1.1.10 vrf vxlan-900001
leaf1# ping 4.2.2.1 vrf vxlan-900001
leaf1# ping 4.2.2.10 vrf vxlan-900001
leaf1# ping 4.1.1.11 vrf vxlan-900001 // would not work because of the anycast GW IP
leaf1# ping 4.2.2.11 vrf vxlan-900001 // would not work because of the anycast GW IP
VXLAN BGP EV_P_N ON NEXUS 9000V:
Deploy a VXLAN Network with an MP-BGP EV_P_N Control Plane:
VXLAN Design and Deployment:
Cisco Nexus 9000 Series NX-OS VXLAN Configuration Guide: