NSX-V 6.3: Control Plane Resiliency with CDO Mode

I also published this blog post about the CDO Mode feature introduced in NSX-V 6.3 on the VMware NSX Network Virtualization Blog on March 4, 2017. The full blog post is provided below and can also be seen on the VMware NSX Network Virtualization Blog site.

VMware NSX Network Virtualization Blog
Title: NSX-V 6.3: Control Plane Resiliency with CDO Mode
Author: Humair Ahmed
Date Published: March 4, 2017

NSX-V 6.3, released in February, introduced many new features. In my last blog post, NSX-V 6.3: Cross-VC NSX Security Enhancements, I discussed several new Cross-VC NSX security features. In this post I’ll discuss another new feature called Controller Disconnected Operation (CDO) mode which provides additional resiliency for the NSX control plane. Note, in NSX-V 6.3.1, CDO mode is a tech preview feature; the feature GA’ed in NSX-V 6.3.2.

The NSX Controllers already offer inherent resiliency for the control plane by design in several ways:

complete separation of control plane and data plane (even if entire controller cluster is down, data plane keeps forwarding)
controller cluster of three nodes allows for loss of controller with no disruption to NSX control plane
vSphere HA provides additional resiliency by recovering the respective NSX controller on another node if host it’s running on fails

For the reasons mentioned above, it’s a rare event and unlikely that communication would be lost with the entire NSX Controller Cluster. In NSX-V 6.3, this control plane resiliency is enhanced even further via CDO mode.

CDO mode targets specific scenarios where control plane connectivity is lost; for example, a host losing control plane connectivity, losing control plane connectivity to the controller cluster, or NSX controllers are down. CDO mode enhances control plane resiliency for both single site and multi-site environments. However, multi-site environments and typical multi-site solutions such as disaster recovery (DR) provide a good use case for CDO mode; this is explained in more detail further below. Below I dig into the details of how CDO mode works and how it provides additional resiliency for specific scenarios.

CDO mode is enabled from the NSX Manager at the transport zone level. It can be enabled on a local transport zone or/and on a universal transport zone. When enabled on a universal transport zone, it must be enabled from the Primary NSX Manager. The screenshot below shows CDO mode being enabled on a universal transport zone via the Primary NSX Manager.

Figure 1: Enabling CDO Mode at Transport Zone Level

Figure 2: CDO Mode Enabled on Universal Transport Zone

Logical Switches

Figure 3: CDO Logical Switch Not Visible Under ‘Logical Switches’ Tab

Figure 4: New Universal Logical Switch Selects Next Available VNI 900001 as VNI 900000 is Being Used by the CDO Logical Switch

Figure 5: No CDO Mode, Controller Cluster Up, with Two VMs on the Same Host and Same Universal Logical Switch

Figure 6: NSX Manager Central CLI Displaying VTEP Table on Controller for VNI 900002

Figure 7: NSX Controller Cluster Down, but Communication Between VMs on Universal Logical Switch VNI 900002 Continues to Work

Figure 8: NSX Controller Cluster Down, but Communication Between VMs on Universal Logical Switch VNI 900002 Continues to Work

Host 1

Host 2

Figure 9: NSX Manager Central CLI Displaying VTEP Table on Controller for VNI 900002

Figure 10: ‘Host 1’ Has the VTEP Entry for ‘Host’ 2 for VNI 900002

Figure 11: ‘Host 2’ Has the VTEP Entry for ‘Host 1’ for VNI 900002

Host 1

Host 2

Host 1

Figure 12: NSX Controller Cluster Down and VM on Universal Logical Switch VNI 900002 on ‘Host 1’ vMotions to ‘Host 2’ with no Data Plane Disruption

When a VM vMotions or moves either by manual vMotion/intervention or in an automated way (such as DRS) to another host that was never a member of the respective logical switch before control plane connectivity loss.
When a new VM connected to a logical switch is powered-on on a host which was not a member of the respective logical switch before control plane connectivity loss.

In both of the above scenarios, a new host has become a member of a specific logical switch/VNI, however, since the control plane connectivity is lost or controllers unavailable, the NSX Controllers cannot be notified of the new member for the logical switch, and, without CDO mode, the new logical switch membership information cannot be distributed to the other hosts. Figure 13 below helps visualize the issue.

Figure 13: NSX Controller Cluster Down and VM on Universal Logical Switch VNI 900002 on ‘Host 1’ vMotions to ‘Host 2’ Causing Data Plane Disruption

Figure 14: Leveraging CDO Mode

NSX-V 6.3 documentation

@Humair_Ahmed

NSX-V 6.3: Control Plane Resiliency with CDO Mode

3 Responses to NSX-V 6.3: Control Plane Resiliency with CDO Mode

Leave a Reply

Blog Archives

Search Blog Archives