Powered by System Center
Virtual Guest cluster and NIC teaming in the host results in an evicted cluster Node #Broadcom #Emulex
Recently I was involved in an implementation of a private cloud based on Hyper-V Server 2012 and System Center 2012 SP1. We’ve build a two node Hyper-V cluster (HP DL 980 servers) dedicated for Fabric Management. Both nodes in the cluster have a total of four 10Gbit interfaces (Emulex). Two of them are combined in a NIC TEAM and used for host networks and the other two are combined in a NIC TEAM for Virtual Machine networks. The TeamingMode is configured as SwitchIndependent and the LoadBalancingAlgorithm is configured with the HyperVPort setting.
To clarify, the network looks like this:
On this Hyper-V cluster we installed a guest cluster that consist of two Windows Server 2012 virtual machines with two vNICs (one LAN adapter and a cluster adapter). During the installation and configuration of this guest cluster both virtual machines reside on one host. As soon as the installation and configuration of the guest cluster was finished we moved one of the virtual machine (that is one of the cluster members) to another host. After the virtual machine was moved to another host the cluster member was evicted in the cluster after a couple of seconds.
Although the cluster member was evicted with a message that network communication was not possible both virtual machine can succesfully ping each other on all networks. DNS was also functioning correct. Also all host networks are available from each host. Moving the virtual machine back to the host on which the other cluster member resides fixed the problem and the cluster returns online with all nodes online.
The cluster validation report on the Hyper-V cluster and on the guest clusters does not point out any problems with the cluster configuration.
Changing teaming modes and/ or load balancing algorithms of the Hyper-V virtual switch does not change (or solve) anything. However when we delete the NIC teams and connect the virtual switch to a single (stand-alone) adapter the problem was gone. With this configuration of single NICs the guest cluster members could resist on different nodes without being evicted in the cluster. As soon as the NIC team was restored and the virtual switch was connected to the NIC team again the cluster member on the other host was evicted again.
Microsoft Premier Support involved
After some troubleshooting days and nights Microsoft Premier Support was contacted and a case was logged with this issue. Premier Support engineers went onsite and performed some serious debug sessions on host and guest clusters. They also tried to simulate the situation of the customer but did not encounter any problems at all with guest clusters. However they were using other NICs than the customer was using. So Premier Support asked us to create a new NIC team upon two different physical adapters (no Emulex NICs). The server was equiped with two Broadcom 1Gbps NICs (which were not in use) and for this test purpose we decide to make a team with these two Broadcom NICs.
After creating the team and configuring the virtual switch to make use of the team that consist of the Broadcom adapters we moved the cluster member to another Hyper-V host and guess what… cluster member was evicted from the cluster again. So it makes no difference if we make use of a NIC team with Emulex or Broadcom interfaces, in both situations the guest cluster will fail.
Premier Support told us that they’re using INTEL NICs in their test scenario and that was the only difference with our setup. We decide to add two INTEL NICs to both Hyper-V hosts, add both NICs to a NIC team and pointed the virtual switch to the ‘INTEL based’ NIC team. Fingers crossed… We moved one of the cluster members to another Hyper-V host and the guest cluster was not evicting any node!!! From now on we could conclude the following:
- When we build a guest cluster (two virtual machines) in a Hyper-V environment and the virtual machines are connected with a virtual switch which is connected to a NIC TEAM with Broadcom or Emulex physical adapters we can not seperate the cluster members. When the cluster members are seperated on two hosts, one of the cluster members will be evicted in the cluster.
- The problem does not exist when the virtual switch is connected to a single (stand-alone) NIC.
- The problem does not exist when the NIC team consist of INTEL adapters.
Right now this case is still under investigation but there is a workaround for NIC teams with Broadcom or Emulex adapters. The workaround is disable checksum offloading on the Hyper-V hosts for the physical NICs that are member of the NIC team:
Get-NetAdapterChecksumOffload -name "NameOfAdapter"
Disable-NetAdapterChecksumOffload -name "NameOfAdapter"
An updated driver or firmware for Broadcom and Emulex is expected to solve this issue.
|Print article||This entry was posted by Peter Noorderijk on June 18, 2013 at 08:01, and is filed under Peter Noorderijk. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.|
about 1 week ago - 10 comments
See for the latest updates the end of this post. In this post Marc van Eijk points out connectivity issues with VMs and vNICs. At random virtual machine or vNIC would loose connectivity completely. After a simple live migration the virtual machine would resume connectivity. Marc has already logged a support case at Microsoft and…
about 1 week ago - 10 comments
A couple of week ago Windows Server 2012 R2 and System Center 2012 R2 reached the GA milestone. We started with a LAB environment for validation our designs. During the deployment we were experiencing connectivity issues with VMs and vNICs. At random virtual machine or vNIC would loose connectivity completely. After a simple live migration…
System Center VMM 2012 R2 Bare Metal Deployment with Converged Fabric and Network Virtualization – Part 2 Components
about 3 months ago - 5 comments
This blog series is divided in the following parts. System Center VMM 2012 R2 Bare Metal Deployment with Converged Fabric and Network Virtualization – Part 1 Intro System Center VMM 2012 R2 Bare Metal Deployment with Converged Fabric and Network Virtualization – Part 2 Components Host Groups Logical networks and IP Pools VM Networks Uplink…
System Center VMM 2012 R2 Bare Metal Deployment with Converged Fabric and Network Virtualization – Part 1 Intro
about 3 months ago - 6 comments
Windows Server 2012 introduced a whole new spectrum of networking configurations with NIC Teaming and QoS. In previous versions the possibilities were limited but this also meant you had a limited amount of choices to make. These new features provide a huge amount of possible configurations, but it also requires you to make more decisions.…
Disable: NetFTFlt driver (Microsoft Failover Cluster Virtual Adapter Performance Filter (NetFT-LWF) ) when using Windows Server 2012
about 4 months ago - 1 comment
A while ago I wrote a blog about problems with virtual guest clusters and NIC teaming. See this link. I ended this blog with a workaround: disable checksum offloading. Today I received a message from Microsoft Premier Support that they found the root cause for this problem: The NetFTflt (Microsoft Failover Cluster Virtual Adapter Performance…
about 4 months ago - 2 comments
[Update July 13, 2013 - I was able to deploy the newly issued KB2855336 to all of my physical and guest cluster nodes. There have been no issues so far. The same KB will also show up in most of your VMs as it is a collection of 21 updates touching all kinds of bugs…
about 6 months ago - No comments
We already know that we wouldn’t have to wait four years to get significant new features in Windows Server & Hyper-V. Looking at the list of builds since the first version of Hyper-V, we can observe there were considerable intervals between the releases of Windows Server 2008 (R2) and Windows Server 2012. We’ve seen three…
about 8 months ago - 1 comment
Currently I’m involved in a private cloud project. In this project we really using beast of machines. We’re using HP DL980 servers. These servers have 1TB of memory and 8 – 10 Core CPUs. So we have 80 CPU cores available and with hyperthreading enabled 160 logical processors. We are using Hyper-V Server 2012 as…
about 9 months ago - No comments
When you configure Live Migration settings on a Windows Server 2012 Hyper-V host then you have two options for authentication of Live Migration sessions: Use Credential Security Support Provider (CredSSP) Use Kerberos Kerberos is my recommendation to customers. This is more secure than CredSSP. However the Kerberos option requires constrained delegation. If you do not…
about 9 months ago - No comments
Today I received news that EMC has released its latest version of the VNX Operating System which covers NAS and Block functionality (VNX OE 32/8) and Maintenance Release 1 (MR1). If you are an EMC customer, this important release is now available for download: For Windows Server 2012 Hyper-V customers this is especially very good…