November 2013 UR KB2887595 causing network problems

We recently published a number of blogs (blog1, blog2) about network connectivity problems with our HP ProLiant Gen8 blade servers with Emulex adapters. Via the comment section on those blogs we saw we were not alone and several others were suffering from problems or even BSODs using other adapters than Emulex. Also Broadcom and Chelsio were reported as having issues. We can now also add Intel network adapters to the list (see afterthought at the end of this blog).

I just saw an interesting pointer by Didier van Hoye to a blog by Michael Rueefli aka Dr. MIRU from Switzerland, who suffered from BSODs after installing KB2887595 (November 2013 Update Rollup) on his Windows Server 2012 R2 Hyper-V clusters using Intel 82599 dual-port 10Gbps network adapters.  Michael traced the BSOD back to the Mslbfoprovider which is the core driver for the inbox NIC teaming in Windows Server 2012 R2.

Previously I referred to as KB2887595 as as an update containing a stealth update (replacing Windows binaries without documentation). Evidently, this update has changed at least part of the lbfo network adapter teaming subsystem. Instead of fixing some of the network connectivity problems we had already encountered, it seems that in some cases, BSODs may occur on Windows Server 2012 R2 hosts with teamed Intel network adapters.

Like in our case with Emulex network adapters, the temporary workaround was to remove KB2887595 and reinstall the v2 version of KB2887595, then disable VMQ hardware offload. See Michael’s blog for instructions.

The Intel 82599 controller is also used by HP and is known as HP 560FLB Flexible LOM for HP Gen8 blade servers.

image image

Afterthought: if so many different network adapters suffer from serious networking problems, it can be coincidence that all network vendors overlooked something in the testing process. On the other hand we could well face a bug in the NIC teaming subsystem of Windows Server 2012 R2. We will probably not find out before the end of the year.

Despite the trouble you may face, we at Hyper-V.nu wish you a quiet Christmas and a stable 2014!

image

11 Comments

  1. December 23, 2013    

    Just building a new two node 2012 R2 cluster on HP 360 G7, using a mix of Broadcom based HP NC382i DP Quad and Intel NICs based HP NC364T Quad NICs. 5 NICs will be used in a converged fabric (switch-independant, dynamic), all Broadcom and one Intel for standby (connected to a FE switch), sure hope I don’t face this issue.

  2. January 15, 2014    

    Possibly related, although I will look into the KB to see if its applied. Thanks.

    My writeup below:

    Have seen the strangest issue on 2 out of 3 new HP Gen 8s, the HP kit uses “HP FlexFabirc NIC 554FLB”

    Background:
    The client uses Citrix Xenapp Provisioning servers (PVS) to deploy fresh images to XenApp servers on a regular basis. The PVS server and the XenApp servers are hosted on a Microsoft Hyper-V 2012 Cluster. Each XenApp server is connected to both a production network (with a synthetic NIC and static MAC address) and a PVS network (with a emulated NIC and static MAC address).
    Each hyper-v host has a NIC team connected to a Logical Switch managed by SCVMM 2012 SP1. The Logical switch supports multiple VM Network connections including a production network and a PVS network. The cluster hosts are a mixture of HP G7s and HP G8.

    Issue:
    The new G8s (with the exception of one) are experiencing an issue where XenApp servers running on them are unable to PXE boot using their legacy adapters.
    The PXE boot process seems to be unable to receive the broadcast?
    When both the PVS server and the XenApp server are on the same host then the PXE response is received but the TFTP transaction doesn’t take place.

    Solution:
    Disable VMQ on hosts (on the Muliplexor drive, not the NICs).

    Alternative Workaround:
    Remove affected hosts as possible owners for XenApp servers that require PXE boot.

  3. January 16, 2014    

    Same issue here, BSOD when performing multiple live migrations. We’re able to recreate it in our Lab environment, currently working with MS support to get this fixed.

  4. January 17, 2014    

    Hans – Any update on this issue? We got hit with the exact same scenario. HyperV 2012 cluster (was completely stable) and then after Windows Updates applied on Dec 22nd we started to see BSOD’s and literally what looks like a cascade failure of the entire cluster. We’ve seen it in our Production 2012 (non-R2) based cluster, AND also in our 2012 R2 Test Cluster.

    Had a look at Dr. Miru article and he ends it on a cliff hanger too. Has anyone nailed down that uninstalling the KB2887595 is a complete cure (to at least stabilize things).

    We opened a case with Microsoft about it noting this article, and they also noted they’ve started to hear about it from other sources. Of course, they haven’t told us anything about a fix.

    Anyways, if you have any more data or can stir the pot, I think a bunch of out here are watching for more info.

    Thanks!!!
    Steve

    • adminHans's Gravatar adminHans
      January 22, 2014    

      Hi Steve,

      Extremely slow progress on this. HP says it cannot reproduce. The MS engineer involved has requested Emulex adapters to create a repro himself. We described the exact steps to repro so I’m guessing we dealing with HP/Emulex incompetence or delaying tactics. Very annoyed about this.

      Best regards, hans

  5. January 28, 2014    

    Maybe KB 2913659 helps?

    Best regards,
    Ulrich

  6. February 4, 2014    

    I have a VM which has a connection to an iSCSI volume. If VMQ is enabled the network reset randomly after high throughput copy jobs.

    I have the HP 554LFB adapters.

    BR/TL

  7. February 4, 2014    

    and I have the patch KB2913659 installed!

  8. JEE's Gravatar JEE
    April 10, 2014    

    I’m having problems on HP 465C Gen 8 servers with Emulex. I’m not using VM on these blades but our software which accesses a back end database seems to have a compatibility issue and runs extremely slow when running queries back to the database. I don’t have this issue on older Gen 7 blades that don’t have the Emulex interfaces.

  1. Microsoft Most Valuable Professional (MVP) – Best Posts of the Week around Windows Server, Exchange, SystemCenter and more – #60 - TechCenter - Blog - TechCenter - Dell Community on December 28, 2013 at 21:07
  2. Hyper-V 2012R2 failing network connectivity using fully converged networking SOLVED! | MS Sec by Ben on January 3, 2014 at 22:36

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">