Enhanced Disaster Recovery with Cross-VC NSX and SRM

I also published this blog post about enhanced Disaster Recovery with Cross-VC NSX and SRM on the VMware NSX Network Virtualization Blog on April 21, 2016. The full blog post is provided below and can also be seen on the VMware NSX Network Virtualization Blog site.

VMware NSX Network Virtualization Blog
Title: Enhanced Disaster Recovery with Cross-VC NSX and SRM
Author: Humair Ahmed
Date Published: April 21, 2016

VMware NSX with SRM for Enhanced Disaster Recovery

Check-out the new Disaster Recovery with NSX and SRM white paper that explains how Cross-VC NSX + VMware’s SRM offers an Enhanced Disaster Recovery (DR) solution and solves many of the challenges related to traditional DR solutions. This solution provides consistent logical networking and security across protected and recovery sites and faster recovery upon disaster scenarios. A summary and overview of the solution is provided below.

NSX can be used with a DR orchestration tool such as VMware’s Site Recovery Manager (SRM) for a robust DR solution. Further, integration between NSX and SRM provides an enhanced DR solution with additional automation. In this blog post, we’ll provide an overview of such an enhanced DR solution leveraging Cross-VC NSX and SRM. For a quick refresher on the Cross-VC NSX feature introduced in NSX 6.2, see the prior blog post, Cross-VC NSX for Multi-site Solutions. For additional information on the solution described in this post check-out the new Disaster Recovery with NSX and SRM white paper.

As discussed in the prior blog, Cross-VC NSX for Multi-site Solutions, the Cross-VC NSX feature allows for the creation of universal objects that can span across multiple vCenter domains which could also be at different sites. Universal logical networks leveraging universal networking and security constructs such as Universal Logical Switch (ULS), Universal Distributed Logical Router (UDLR), and Universal Distributed Firewall (UDFW) can now be created across multiple vCenter domains/sites.

SRM has tight integration with vSphere and NSX, offers an integrated storage replication option, and can manage and test recovery plans. It’s capable of orchestrating recovery for multiple failure scenarios including Partial Application Failure, Full Application Failure, and Site Failure. You can get more information on VMware SRM here.

Figure 1: SRM and vSphere Integration

Figure 1: SRM and vSphere Integration

SRM 6.1 integration with Cross-VC NSX resolves some of the difficult challenges faced by traditional disaster recovery solutions such as remapping of networks/IP addresses and synchronizing security policies. With the Cross-VC NSX feature, since logical networking and security span across multiple vCenter domains/sites there is no need to remap IP addresses/networks or manually sync security policies.

Remapping of application IP addresses can lead to additional work updating other services utilizing the respective application IP address such as DNS, security policies, and possibly other services/applications. In consequence, the entire DR process can consume additional time for complete application recovery. However, with Cross-VC NSX and automatic network mapping integration provided with SRM, applications can quickly recover at the recovery site and maintain their respective IP addresses as shown in Figure 2 below. Since the networking and security are synced and consistent across sites, no additional work or time is required for updating other network services or applications.

Figure 2: App IP Address Maintained Upon Application Recovery

Figure 2: App IP Address Maintained Upon Application Recovery

VMware SRM has integration with Cross-VC NSX where the universal networks are automatically mapped and consistent across the protected and recovery sites via NSX logical networking spanning both sites. This automatic mapping is shown below in Figure 3. See the Enhanced Integration with VMware NSX section of the What’s New in VMware Site Recovery Manager 6.1 document.

Figure 3: SRM – Automatic Network Mapping with Cross-VC NSX Universal Logical Networks

Figure 3: SRM – Automatic Network Mapping with Cross-VC NSX Universal Logical Networks

In addition to the networking consistency across sites, Cross-VC NSX also tackles security challenges in regards to disaster recovery solutions. When an application is recovered at the recovery site by a disaster recovery orchestration tool like SRM, security policies need to be synced to the recovery site to ensure the correct security policies are applied to the application. With Cross-VC NSX, in addition to the logical networking enabled by the Universal Logical Switch and Universal Distributed Logical Router, the Universal Distributed Firewall as shown in Figure 2 above enables universal security policies and micro-segmentation across vCenter domains/sites.

Figure 4 below shows a deployment leveraging Cross-VC NSX and SRM for DR. In this deployment model, we demonstrate with a feature called Local Egress which allows for site-specific local egress and control of N/S egress traffic via filtering of routing information based on a unique site-specific ID called Locale ID. In a prior blog, Cross-VC NSX deployment leveraging routing metric to control N/S egress was demonstrated: Cross-VC NSX for Multi-site Solutions.

In the network diagram in Figure 4, logical networking and security span both the protected and recovery sites providing for consistent networking and security policies across sites. No manual re-mapping of IP addresses or syncing of security policies is required.

Figure 4: Disaster Recovery Leveraging Cross-VC NSX and SRM

Figure 4: Disaster Recovery Leveraging Cross-VC NSX and SRM

VMware SRM requires multiple vCenters and has a 1:1 relationship with vCenter which provides separation of fault domains; the protected site is managed by one vCenter while the recovery site is managed by another vCenter. Since Cross-VC NSX allows for logical networking and security across multiple vCenter domains/sites, SRM and Cross-VC NSX are complementary providing an enhanced DR solution. Note, both SRM 6.0 and 6.1 are supported with Cross-VC NSX; SRM 6.1 is recommended and provides for automatic mapping between source and destination networks.

In Figure 4, SRM has place-holder VMs at the recovery site and is replicating the VM data between different clusters within different vCenter domains. Upon Full or Partial Application failure, SRM will orchestrate the recovery of respective App workloads to the recovery site. The Cross-VC NSX networking/deployment configuration and SRM recovery plan and workflow will dictate the traffic flow for N/S egress. Also note, prefix lists are used to ensure the Web network is only advertised from the protected site.

Partial application failure is defined as when only part of an application, such as a web server within a 3-tier application (Web, App, DB) failed. Full application failure means the entire application (Web, App, and DB) failed. In the diagram in Figure 5, upon partial application failure, where the Web tier of the application recovers at the recovery site via SRM, N/S egress at protected site will continue to be used based on our recovery design and workflow. This is accomplished via the Local Egress feature where the Locale ID at recovery site is set to match the unique Locale ID of the protected site.

Local Egress uses Locale ID to filter routes out a certain N/S Egress point. By default, the Locale ID within each vCenter/NSX Manager domain is set to the UUID of the respective NSX Manager, however this can be modified if desired. By default, all hosts in the respective vCenter/NSX Manager domain inherit this Locale ID value.

Figure 5: Upon Partial Application Failover, Site 1 N/S Egress Still Used

Figure 5: Upon Partial Application Failover, Site 1 N/S Egress Still Used

In case of edge failure at the protected site, or upon full application failure, where the App and DB tier of the application also recovers at the recovery site and requirements dictate change in N/S egress, egress is switched to recovery site. This is accomplished via the Local Egress feature where the Locale ID at recovery site is changed from the protected site Locale ID value to reflect the unique Locale ID value of recovery site. At this time the runbook automation workflow also changes the prefix list to advertise Web tier N/S from recovery site instead of protected site.

Figure 6: Upon Full Application Failover or Edge Failure, Site 2 N/S Egress Used

Figure 6: Upon Full Application Failover or Edge Failure, Site 2 N/S Egress Used

For additional information check-out the new Disaster Recovery with NSX and SRM white paper.

Follow me on Twitter: @Humair_Ahmed

This entry was posted in Network Architecture, Networking, Security, Technology, Virtualization and Cloud Computing, VMware, VMware, VMware and tagged , , , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


+ 9 = eleven