Understanding and Implementing Flow Control on Dell Force10 Switches

Ethernet flow control allows for a receiving node to temporarily stop the transmission of data from the sending node. As defined by IEEE 802.3x this is accomplished via the PAUSE frame.

Flow control is useful in cases where a node on the network is transmitting data faster than the receiving node on the network can accept it; the goal is to properly handle input buffer congestion while preventing packet loss. The PAUSE frame is used to tell the sending node on the link to temporarily stop or ‘pause’ as the receiving node cannot handle the rate at which the data is being sent. The receiving node sends a PAUSE frame to the sending node which then halts the transmission of further data for a specified period of time.

An Ethernet frame is used to carry the ‘PAUSE’ command with the ‘Ethertype’ field always set to ‘0x8808’, and the ‘Opcode’ field always set to ‘0x0001′. When a node becomes overwhelmed with traffic from the other end of the link, it sends a PAUSE frame to the reserved 48-bit destination multicast address of ’01-80-C2-00-00-01’. In this respect the node does not need to discover and store the address of the node at the other end of the link. Without flow control enabled on the switch, the overloaded device will drop packets. The PAUSE frame has the structure shown below.

PAUSE Frame

PAUSE Frame

The PAUSE frame includes a two byte unsigned integer (0 through 65535) in hex which tells the sending node how long to pause. A values of ‘0’ tells the end device to resume transmission. The pause time is measured in units of pause ‘quanta’, where each unit is equal to 512 bit times. Just to give an example, with Gigabit Ethernet, a pause time of 0xFFFF (65536) equates to 33.55 msec. If you would like to understand the details of this calculation further, please see the following website link: http://wiki.networksecuritytoolkit.org – Ethernet Flow Control Pause Frame (IEEE 802.3x).

It’s important to note that if an additional PAUSE frame arrives before the pause time has expired from the prior PAUSE frame, its pause time parameter replaces the prior pause time; this is why a PAUSE frame with a value of ‘0’ for the pause time causes the data transmission to resume immediately.

The below lab diagram demonstrates how flow control, a mechanism that employs PAUSE frames to control packet loss, works under congestion conditions. The server has a 1 Gb NIC and is receiving traffic from two PCs both with a 1 Gb NIC at a faster rate than it can handle.

Lab Diagram - Flow Control

Lab Diagram - Flow Control

Now, if the server is congested and without flow control configured on the switch, if PC 1 and PC 2 send traffic to the server, the packets will just be dropped. However, if I have flow control enabled and it is supported on all devices, the server will inform the switch via PAUSE frames to ‘pause transmission’ until notified to proceed. It is important to note as shown above that PAUSE frames are a direct-link mechanism. PAUSE frames do not propagate directly from link to link. The switch starts to build a queue and once that queue reaches a certain threshold, the switch is forced to send a PAUSE frame to the PC to avoid dropping frames. By this mechanism, PAUSE frames are propagated indirectly.

Since I am using a Dell Force10 S50N as my switch, below I show how to quickly configure flow control on the switch.

Configuring Flow Control on Dell Force10 S50N

Configuring Flow Control on Dell Force10 S50N

Below I capture the PAUSE frame being sent from the server; you can also download the Wireshark PAUSE frame packet capture file (.pcap) from the download section or direct link here.

Wireshark PAUSE frame capture from server

Wireshark PAUSE frame capture from server

Finally, an important note to mention here is that the IEEE 802.3x flow control defined in 1997 and discussed here causes the entire link to pause traffic under congestion. This is not an ideal result for networks carrying multiple types of traffic with different priorities. It is for this reason that Quality of Service (QoS) fails to operate properly with this flow control/PAUSE frame mechanism.

Fortunately, the follow-on priority-based flow control (PFC) (IEEE 802.1Qbb standard approved in 2011), provides a link-level flow control mechanism that can be controlled independently for each Class of Service (CoS), as defined by the IEEE P802.1p group. PFC will be the mechanism used as part of the data center bridging (DCB) protocol to ensure zero loss under congestion for the converged networks of the future. Switches employing DCB solutions will be at a minimum 10 GbE switches such as the Dell Force10 S4810 that allow for the bandwidth requirements of large converged data center solutions. I will discuss DCB and PFC in greater detail in a future blog.

This entry was posted in Dell Force10, Force10 Networks, Labs, Networking, Servers, Technology and tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

2 Responses to Understanding and Implementing Flow Control on Dell Force10 Switches

  1. Bryan L. says:

    The Force10 S55 switch supports only the RX control option. Is this an important design consideration? How often does a switch need to TX a pause frame? I would imagine it receives pause frames far more often than it transmits. Great blog!

  2. Humair says:

    Yes, typically the end node will more often send the pause frames. The switch running at line rate typically won’t be the choke point unless you’re oversubscribed at which point you want your switch to transmit pause frames. With many customers, I typically see the storage device or host server throttling the network due to either over-subscription of the storage / host server or some misconfiguration. I’ve also seen bad/buggy storage firmware cause issues. If the network is designed properly with this limitation in mind, you should be able to avoid issues.

Leave a Reply

Your email address will not be published. Required fields are marked *


7 × = forty nine