VXLAN is being deployed in numerous production environments
and is supported by quite a few networking equipment vendor as well as software
vendors such as VMware and Red Hat. Not
until August 2014 that the specification is moved from IETF draft status to RFC
7348.
RFC stands for Request For Comments. As a software developer
for networking equipment, I have to know the RFC in and out. This RFC is the functional specification for
the feature that I develop. Test group
will test the feature to make sure the feature function as what the RFC
describes.
Today, let’s take a look at what is in RFC 7348.
The title of RFC 7348 is “Virtual
eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized
Layer 2 Networks over Layer 3 Networks.”
RFC 7348 is relatively a short document compare to other RFC
that specify other communication protocols.
The table of contents has 10 sections.
6 of them are standard sections for all RFC. To break down this RFC to the core, we can
look at the following sections:
- VXLAN Problem Statement
- VXLAN operation
- VXLAN Frame Format
- VXLAN Security Considerations
VXLAN Problem Statement
Basically RFC 7348 identified 3 problem areas in the
networking infrastructure within the data center that VXLAN is designed to
resolve. Server virtualization in the
data center works best in a flat layer 2 network. Only until VMworld 2014 that VMware announced
in vSphere 6 (as of this writing, it is still under beta) can support vMotion
across vCenter and long distance.
Limitations Imposed by Spanning Tree and VLAN Ranges
Spanning Tree is used to safe guard loops in a Layer 2
network. When there is a loop in a Layer
2 network, it will cause a broadcast storm and the network will be
in-operable. Spanning Tree Protocol is designed
and used to protect the Layer 2 network.
While it is able to protect the Layer 2 network the price to pay is
there are links in the network is not used as well as multipathing not an
option in the network design. Both of these
problems ties back to ROI (Return On Investment) not to a point where it should
be.
Another problem is that the VLAN ID is a 12-bit field thus
causing data center in flat Layer 2 network limited to only 4094 VLANs.
Multi-tenant Environment
Multi-tenant is a virtualized data center is a definite
requirement. Tenant isolation is an
absolute requirement.
In a Layer 2 network VLAN is a popular way to achieve tenant
isolation. The number of VLANs limited
to 4094 as described in the previous section is a major stumbling block to data
center design. When each tenant requires
to have more than one VLAN as in the example of a 3-tier web application, the
number of VLANs available in a flat Layer 2 network become a more limited and
yet required resource.
A layer 3 network is another possible way of network
isolation for multi-tenant but it has it limitation. This will limit each tenant to have unique IP
subnets. Also, this Layer 3 isolation
solution limits user not able to use Layer 2 or non Layer 3 protocols for inter
VM communication.
Inadequate Table Sizes at ToR Switch
The use of ToR switch to connect servers on a rack is a
common data center design. With multiple
virtual machines running on the virtualized servers, the number of MAC address
that a ToR switch that has to learn. As
the number of virtual machine that a virtualized server can host is increasing
the MAC address table used by the ToR switch becomes inadequate
VXLAN Operations
Section 4 of RFC 7348 describes
the operation of VXLAN. As indicated in
the title of this RFC, VXLAN is a framework for overlaying virtualized Layer 2
networks over Layer 3 networks. It is
designed specifically to overcome the problems that we face in the data center.
VXLAN is meant to extend a Layer 2 network over a Layer 3
network by the use of tunneling technology - encapsulating an UDP packet over
the original Layer 2 frame.
RFC 7348 outline the following
rules:
- Each overlay is termed a VXLAN segment.
- Only VMs within the same VXLAN segment can
communicate with each other
- Each VXLAN segment is identified by a 24-bit
segment ID (VNI).
- VNI identifies the scope of the inner MAC frame
originated by the individual VM
- VNI is an outer header that encapsulates the
inner MAC frame originated by the individual VM.
- VXLAN segment and VXLAN overlay network are interchangeable
in the RFC.
- VXLAN tunnels are stateless connection between 2
end points.
- Each end point is called a VXLAN Tunnel End
Point (VTEP)
- VTEP can be implemented on a virtual switch,
physical switch or physical server either on hardware or software.
- Use of data plane learning.
- Multicast is used for carrying unknown
destination, broadcast and multicast frames (BUM traffic).
- VTEPs MUST NOT fragment VXLAN packets.
The last 3 points are worth looked into a little bit more.
Data Plane Learning
Data plane learning means there is no control plane for
VXLAN. Not until now that I realize I never truly understand what a control
plane is. Often time people will say
generically that SDN is the separation of the control and data plane. I work on networking area for almost 20 years
and mostly Layer 2 networking features.
I say mostly because I worked on UDP Relay Agent and DHCP Snooping in
which I have to know a little bit of IP forwarding. Data plane is the forwarding operation of networking
equipment.
So what exactly is a control plane? For any network engineers working on Layer 3
networks this is very obvious to them. Control
plane is a Layer 3 networking concept.
BGP is an example of a control plane protocol. Routers exchange routing information via the
control plane.
VXLAN uses data plane learning just like the source learning
on Ethernet (Layer 2) switches. VTEP is
responsible for learning the virtual machine’s MAC address and associate this
with VXLAN segment/VXLAN Network Identifier.
This learning process is very important to the efficiency of the
operation of VXLAN networks.
Multicast for BUM traffic
VXLAN is to extend Layer 2 segments to other data
center. In a Layer 2 network, unknown
destination, broadcast and multicast frames are flooded to all the devices on
the same broadcast domain. With VXLAN,
RFC 7348 specifically spell out IP multicast is used for sending BUM traffic to
other VTEPs of the same VXLAN segment.
While this works perfectly in the functional level, all
Layer 3 network engineers always try to avoid IP multicast.
Cisco Nexus 1000V has Unicast-ONLY VXLAN and MAC
Distribution Mode. In the MAC Distribution
Mode, there is a centralized controller.
This is same as introducing a control plane for VXLAN.
VMware NSX has its NSX Controller. When a VM is provisioned, it will register
itself to the NSX controller. ARP request from the VMs are sent to the controller where the controller is aware of the MAC address and VTEP/VNI association. This again
is introducing the control plane for VXLAN.
To work with other VTEPs, IP multicast still needs to be
used.
RFC 7348 suggests to use bidirectional IP multicast protocol
such as PIM-SM to build efficient multicast forwarding trees.
VTEPs MUST NOT fragment VXLAN packets
Due to encapsulation, VXLAN adds an extra 50 Bytes of
overhead. The “MUST NOT” is written in
bold in the RFC. This requirement has
huge implication the MTU of the underlay Layer 3 network. MTU for Ethernet v2 is 1500. VMware recommend setting the MTU size to 1600
or to use jumbo frame option end to end.
I know of an installation that ran into MTU problem and end
up not deploying VXLAN.
VXLAN Frame Format
Section 5 of RFC 7348 details the frame format of
VXLAN. As a developer of network
engineer that needs to troubleshoot VXLAN problem needs to know this frame
format very well.
I believe WireShark is able to decode VXLAN traffic. UDP port 4789 is assigned for VXLAN traffic.
VXLAN Security Considerations
While security consideration is a standard section for RFCs,
it is also worth looking into.
Quoting directly from the RFC:
Traditionally, Layer 2 networks can only be attacked from
'within' by rogue end points -- either
- by having
inappropriate access to a LAN and snooping on traffic,
- by injecting spoofed
packets to 'take over' another MAC address, or
- by
flooding and causing denial of service.
VXLAN increases the attack surface for these kinds of
attacks.
While not going into detail, the security consideration
section of this RFC suggests the following ways to safe guard VXLAN networks.
· Continue to use the traditional way of
mitigating rogue end points attack by limiting the management and administrative
scope of who deploys and manages VM/gateways in a VXLAN environment.
- Use of 802.1X for admission control for
individual end points.
- Use of 5-tuple-based ACL.
- Use of IPsec to authenticate and optionally
encrypt VXLAN traffic.
- Use of designated VLAN for VXLAN traffic.
- Use of secure method on the management plane of
the VTEP.
This summarize RFC 7348.