Hyper-v.nu
Powered by System Center
Powered by System Center
Jan 27th
My fellow blogger MVP Hans Vredevoort installed the most recent updates on an operational Windows Azure for Windows Server environment. After applying the updates the connectivity from the Service Management Portal to System Center VMM failed.
In the Service Management Portal the Service Provider Foundation was still registered. The amount of VM clouds returned to zero. Before the updates the amount of VM clouds in the Service Management Portal matched the amount Clouds in SCVMM. Uninstalling the updates from the servers did not resolve the issue.
Applying the most recent updates in my lab caused the same issue. I created a temporary environment for troubleshooting. This test setup consists of five virtual machines.
The end to end solution for Service Providers can have different designs. This setup is the bare minimum required for troubleshooting the issue. I’m writing a complete guide for all the individual parts, the possible designs and related configuration, which will be posted on www.hyper-v.nu soon.
The installation and configuration of the test setup without applying the updates resulted in a correct functioning connection between the Service Management Portal and System Center VMM through System Center SPF.
The first thing to do is create some snapshots of the environment. The Windows Update feature detected about thirty available updates for each server. I decided to run the updates in batches of ten updates to speed up the process of finding the troublesome update. I starting with the SPF server, since I had a feeling that this server was causing the error.
After installing all the required updates on the SPF server the VM clouds still matched the number of clouds in System Center VMM. I rebooted the SPF server and run IISReset.exe on the SMP server just to be sure. It was still functioning correctly. So far for my hunch.
With the SPF server now fully patched I turned to the VMM server. I repeated the process of installing batches of ten updates. Rebooting the VMM server and running IISReset.exe on the SMP server after every batch. When the VMM server was fully patched the Service Provider Foundation connectivity from the Service Management Portal showed no errors.
I really doubted that the SQL server or even the domain controller were interfering in the issue, so the logical next server to patch was the server hosting the Service Management Portal itself. As you might guess by now, after applying all the critical updates in the SMP server the connectivity to the Service Provider Foundation was still functioning correctly.
Before moving my attention to the SQL server and the domain controller I enabled update support for Microsoft Update. A rescan of available updates on the VMM server displayed two optional updates.
Installing both updates did not cause the issue.
Jan 18th
Kind of quietly or better said unnoticed by many, the list below has been published on two sources (here and here). The list includes KB articles describing a number of issues or support tips regarding System Center 2012 SP1.
As to quote the first link “Nothing really major, just a couple support tips and FYIs we saw during that beta that might save you some time if you happen to run across them. Enjoy!”
2709539 – Regional settings default to English when deploying a virtual machine using a template on System Center 2012 Virtual Machine Manager (http://support.microsoft.com/kb/2709539)
2800073 – System Center 2012 Virtual Machine Manager SP1 Maintenance Mode Causes Refresh Errors 13926, 2606 (http://support.microsoft.com/kb/2800073)
2798383 – System Center 2012 Virtual Machine Manager SP1 Does Not Recognize Newly Imported Highly Available Virtual Machine (http://support.microsoft.com/kb/2798383)
2798507 – Creating a VM from a template on an ESX host fails with error 2947 in System Center 2012 Virtual Machine Manager SP1 (http://support.microsoft.com/kb/2798507)
2798842 – System Center 2012 Virtual Machine Manager SP1 cannot increase the number of CPU Cores during a P2V conversion (http://support.microsoft.com/kb/2798842)
2798911 – System Center 2012 Virtual Machine Manager SP1 cannot create a VM in the cloud that has an underscore in the name (http://support.microsoft.com/kb/2798911)
2798926 – System Center 2012 Virtual Machine Manager SP1 fails to shut down a virtual machine (http://support.microsoft.com/kb/2798926)
2797597 – Suse Linux Enterprise Server 11 is missing from the Linux OS list in System Center 2012 Virtual Machine Manager (http://support.microsoft.com/kb/2797597)
2795033 – A Virtual Machine Manager Library share on Highly Available File Server displays an incorrect status (http://support.microsoft.com/kb/2795033)
2799257 – How to convert between VHD and VHDX formats in System Center 2012 Virtual Machine Manager (http://support.microsoft.com/kb/2799257)
2798401 – System Center 2012 Virtual Machine Manager SP1 Service Deployment fails with error 22011 (http://support.microsoft.com/kb/2798401)
2800610 – Static IP is missing from a virtual switch created with System Center 2012 Virtual Machine Manager SP1 (http://support.microsoft.com/kb/2800610)
Jan 15th
I encountered a pesky issue recently. Before I get into the details, first a quick overview of the setup. A Windows Server 2012 cluster consisting of two cluster nodes. The cluster nodes are brand new HP DL 360 G8p servers with 256Gb Memory and two six-core processors. Networking is based on 10Gb Emulex NICs for converged fabric connected to a HP Procurve 5406zl. The storage for the cluster has two members, an Equallogic 4100E and an Equallogic 4100XV. The ISCSI traffic is on a dedicated network with separate 1Gb NICs in the cluster nodes.
When I connected to a cluster node the response in the RDP session sometimes had a little delay. Typing in PowerShell for example felt like watching a movie with the audio out of sync from time to time. The first time I thought the lack of sleep was taking its toll. But after experiencing a couple of delays I concluded that I had some troubleshooting to do.
After bypassing the Remote Desktop Gateway that I connected through, I singled out one cluster node having the issue. I looked at the event log, but came up empty handed. My next thought made me look at the networking infrastructure. I checked that both servers had the correct and identical NIC firmware and drivers. I also verified that the switch had the latest firmware applied. I compared the complete converged fabric configuration on both servers. All parts checked out fine. I looked at the task manager and the processor utilization was close to idle.
The next thing to rule out was the NIC hardware. Since only one of the two servers was subject to the issue I decided to swap the 10Gb NICs between the servers. After this swap the issue seemed to have disappeared. I did not experience the issue on the other server.
I am unable to let go of an issue without a proper technical explanation and since the NIC hardware swap seemed to make the issue disappear I run a diagnostic test on both servers. All green checkmarks. Suddenly the delay appeared again on the same server where I experienced the issue before. We can now rule out the NIC hardware.
Jan 14th
This blog series consists of four parts
With the insights from the results of the tests, it is possible to look at multiple scenario’s for the traffic classes live migration and virtual machine.
Live migration moves machines from one host to another without noticeable downtime. This can be live migration within a cluster or moving virtual machines with “shared nothing” live migration. Live migrations uses one TCP stream for control messages (low throughput) and one TCP stream for transfer of virtual machine memory and state (high throughput utilization). When live migration includes migrating the VHD, SMB will be used for that. SMB itself will use one or multiple TCP streams depending on your SMB multichannel settings.
Scenario 1 : Server with two quad port 1Gb NICs
If you have invested in new 1Gb hardware before Windows Server 2012 was available, upgrading your NICs to 10Gb hardware is not a requirement. The NIC Teaming functionality allows for teaming up to 32 physical NICs. It is possible to reuse the dedicated 1Gb NICs you used for your Windows Server 2008 R2 or your (obsolete!!) VMware environment and create a single team.
The disadvantage with VMQ and LBFO based on Address Hash is that all the settings for the individual physical NICs in the team must be identical. Whereas NIC Teaming based on HyperVPorts allows for overlapping processor settings.
I have tested with additional live migration networks with the same metric in Switch Independent / HyperVPorts mode. Each live migration network will get its own port on the Hyper-V switch allowing for distribution of the individual live migration networks amongst the team members on a round-robin basis.
I created single NIC team with 8 1Gb team members in Switch Independent / HyperVPorts. After configuring a Hyper-V switch on top of this NIC team, I created six live migration networks with the same metric.
I also adjusting the maximum number of simultaneous Live Migration settings to ten simultaneous live migrations on each cluster node. Running a live migration of ten virtual machines (ten high throughput TCP streams) resulted in only one team member being utilized.
Live migration will use only one available network for moving virtual machine memory and state. Even if other live migration networks are configured with the same metric.
With 2 quad port NICs it is possible to create a different configuration for more live migration bandwidth without losing all VMQ overlapping. Create two NIC teams. One team with four 1Gb team members in Switch independent / HyperVPorts and one team with four 1Gb team members in LACP / Address Hash (you might even configure two team member per quad NIC in a single team for added redundancy).
The Switch independent / HyperVPorts NIC team is configured with a Hyper-V switch for converged Fabric. The LACP / Address Hash NIC team is dedicated for live migration. Since there is no Hyper-V switch on top of this NIC team, RSS is used for load balancing the individual stream.
Jan 11th
This blog series consists of four parts
Part 1 of this blog series explained the theory of NIC Teaming, Hyper-V switch and QoS. Theory is essential but we don’t run Hyper-V clusters in theory. We run them in production. Windows Server 2012 NIC Teaming and converged fabric allows for more bandwidth. Live migration and virtual machines are two traffic classes where more bandwidth can be useful. The following tests will look at the possible configurations to get the most bandwidth out of your Hyper-V environment on these traffic classes.
NIC Teaming to NIC Teaming
Now that we have the tools configured we can run our first test. It is a good idea to do this one step at a time so the differences in configuration will show exactly how this influences the results.
The first step is two create a NIC team on each server and connect them directly to each other. I have used the quad port 1Gb NIC on each server to create NIC team.
Each NIC team is configured in LACP / Address Hash. Running IPerf with a single stream results in a bandwidth of 113 MBytes per second.
As stated before the NIC team will force a single TCP over a single team member, so this is the expected result. Opening performance monitor during the test will verify this.
Adding more streams will balance the sessions over the team members. After adding one TCP stream per test, all four team members were active at ten parallel TCP streams.
Jan 10th
Today we see the birth of the official build of Windows Azure Services on Windows Server. It will allow you to use Windows Server 2012, System Center 2012 SP1 and Antares (Websites service) in your own environment, just like you might already have experienced with Windows Azure.
As we speak I am installing Windows Azure Services in a new Windows Server 2012 Hyper-V environment with System Center 2012 at one of the large hosters in the Netherlands offering IaaS services based on Hyper-V.
My fellow blogger Marc van Eijk already wrote a fantastic blog on his experiences while the product was still called KATAL, the beta name for Windows Azure Services.
Please take a look at his blog which was published:
http://www.hyper-v.nu/archives/marcve/2012/11/windows-azure-services-for-windows-server/
The first two services that are offered in this release are high scale, multi-tenant web site hosting and high density virtual machine provisioning and management. These services are lit up in the Service Management Portal and API. We will continue to bring more Windows Azure services on-premise in subsequent releases. You can find more information and download the bits at http://www.microsoft.com/hosting/en/us/services.aspx.
Jan 10th
I ran into a problem with a misbehaving Failover Cluster Manager (FCM) on a Windows Server 2012 cluster after allowing Cluster Aware Updating (CAU) to update the cluster. The following updates were gracefully installed, automatically placing the nodes in maintenance mode, installing the updates and moving on to the next node:
After the cluster had been updated it was no longer possible to view the roles (Hyper-V VM’s in this case) on the cluster. Instead the following screen popped up:
Because another unpatched cluster did not have this symptom, I decided to uninstall all updates that had been installed by CAU and after a reboot all is well.
After I reported this problem, I got a quick note from the Microsoft cluster team that KB2750149 was the cause of this problem and that the current advice is to uninstall this update until further notice.
It is important to note that this is only a problem of the Failover Cluster Manager (CluAdmin.msc) snap-in. The cluster and all its roles continue to run fine, which can be verified by opening a PowerShell screen on one of the affected cluster nodes and run Get-ClusterGroup. I was also able to run FCM successfully from another cluster or a management station with the Failover Clustering management tools.
Jan 9th
This blog series consists of four parts
The test lab consist of two servers (HP Proliant DL 360 G5, nothing fancy but it will give a good picture on the processor demand). Each server contains a dual port 10Gb NIC and a quad port 1Gb NIC. The NICs have RSS and VMQ support. The quad port 1Gb NIC in the servers are directly connected to each other. This will give the best picture since a switch configuration might interfere with the results.
Performance is influenced by a lot of factors. For example, copying a large file between the servers will not be very representative. Server 2012 supports SMB multichannel, whereby multiple TCP streams are used for a single file copy. This requires Physical NICs with RSS support. SMB multichannel will work with NIC teaming since RSS is exposed through the team on the default interface. The Hyper-V switch does not support RSS and does not expose it to upper level protocols. SMB Multichannel will not function for the vNICs. A file copy initiated from a vNIC is single TCP stream. NIC Teaming is designed a single TCP stream to assign to a single team member. When the file is written to the destination, disk I/O can also impact the performance.
Luckily there are some good tools available for measuring bandwidth. During the tests JPerf will display detailed information on the bandwidth, Performance Monitor shows the load distribution and Task Manager gives insight into the processor load and distribution.
NTttcp, IPerf and JPerf
In my initial test I used NTttcp, that was rewritten by Microsoft in 2008. Microsoft is using an updated version of NTttcp that enables additional parameters, but this updated version is not publicly available. Therefore I resorted to IPerf. IPerf is a commonly used network testing tool that can create multiple TCP streams and measure the bandwidth of a network connection. IPerf can run as a server or as a client. The server listens on port 5001 and one or multiple clients can send a single TCP stream or multiple TCP streams to the server. IPerf was originally created for Linux, but there are compiled version for Windows publicly available. I have used a graphical front end for IPerf called JPerf. JPerf gives some nice graphs but requires Java so I wouldn’t recommend installing it on your production servers. If you want to run the same test in your production environment you can use the compiled version of IPerf (which will leave no footprint on the server) or create two virtual machines and install JPerf inside them.
Installation
If you want to use the command line version of IPerf (no footprint) copy the content of the compiled IPerf version to your server. For JPerf you will need to install Java first. JPerf does not require a separate IPerf file. You can just copy the content of JPerf to your server. Before you can run JPerf you will need to add the path to javaw.exe to the Path variable.
In the System Properties of your server open the Advanced tab and select the Environment Variables. Search for the Path variable and (if you installed Java in the default folder) add ;C:\Program Files (x86)\Java\jre7\bin to the end of Path variable.
Now you can open JPerf by running jperf.bat located in the root of folder you copied.
To configure JPerf as receiver select Server as IPerf Mode and click Run IPerf. IPerf listens on port 5001 by default. This port should be allowed in the firewall. With the Num Connections value of 0 IPerf will keep listening on port 5001 after a successful run.
To configure JPerf as sender select Client as IPerf Mode. In the server address specify the IP address of the server where JPerf is in listening mode. During the test I concluded that JPerf will only function on interfaces with a default gateway configured.
Jan 8th
This blog series consists of four parts
One of the basics of every Hyper-V configuration is networking. Set aside the missing flexibility, the choices for a Hyper-V cluster design in Windows Server 2008 R2 were clear. A dedicated network interface for each type of traffic (management, cluster, live migration). With this configuration in production NICs were underutilized most of the time and when you needed the bandwidth it was capped at the maximum of a single interface. In the (rare) case of a NIC dying on you there was no failover. In Windows Server 2008 R2 there was no NIC Teaming support. For load balancing and failover the only option was resorting to the NIC Teaming software provided by the hardware vendor.
From experience I can say that a lot of customers were having trouble designing their networking in a Windows Server 2008 R2 cluster correctly. Problems with 3rd party NIC Teaming, live migration over VM networks, not enough physical adapters, you name it, we’ve seen the most “creative” configurations.
Most customers are stuck in the Windows 2008 R2 thinking pattern. This is understandable as Microsoft strongly recommended that each network in a Windows 2008 R2 Hyper-V cluster had its own dedicated physical NIC in each host.
In Windows Server 2012, NIC Teaming is delivered by Microsoft out of the box. The official term is NIC Teaming, it is also referenced as Load Balancing and Failover (LBFO). NIC Teaming is an integral part of the Windows 2012 operating system. With NIC Teaming you can team multiple NICs into a single interface. You can mix NICs from different vendors, as long as they are physical Ethernet adapters and meet the Windows Logo requirement. The NICs must operate at the same speed. Teaming NICs operating at different speeds is not supported. But flexibility comes with complexity and many choices.
With Hyper-V in Windows Server 2012 it is even possible to create a Hyper-V switch on top of a NIC team. The Hyper-V switch is a full-fledged software based layer 2 switch with features like QOS, port ACLs, 3rd party extensions, resource metering and so on. You can create virtual adapters and attach them to the Hyper-V switch. These developments provide us with the proper tools to create converged fabrics.
Usually the first thing tested after initial configuration is copying a large file between two hosts. With a Hyper-V Switch configured on a NIC team composed of two 10Gb adapters you might expect the file to copy with (2 x 10 Gbits / 8 =) 2.5 GBytes per second. When you copy the file you find that actual throughput is a lot lower (about 400 MB/s to 800 MB/s).
The first reaction : it doesn’t work!!
Let me clarify. It’s a little more complicated than just combining two 10Gb NICs and expecting a 2.5 GB/s file copy. It is possible to get these bandwidth results but you need to understand that there are a lot of factors of influence on the actual throughput.
Before we dive in to testing first we will have to look at the choices provided by Windows Server 2012 and how the inner workings of these choices are of influence on the actual bandwidth.
Transmission Control Protocol
TCP is one of the main protocols in the TCP/IP suite.
Transmission Control Protocol (TCP) is a transport protocol (layer 4). TCP provides reliable, ordered delivery of a stream of octets. TCP provides the mechanism to recover from missing or out-of-order packets. Reordering packets generates great impact on the throughput of the connection. Microsoft’s NIC Teaming (or any other serious NIC Teaming solution) will try to keep all packets associated with a single TCP stream on the same NIC to minimize out-of-order packets.
Hardware
There are some NIC hardware functionalities you should be aware of.
Receive side scaling
Receive side scaling (RSS) enables the efficient distribution of network receive processing across multiple processors.
It is possible to specify which processors are used for handling RSS requests. You can check if your current NIC hardware has RSS support by running the following PowerShell Get-SmbServerNetworkInterface
Virtual machine queue
Virtual machine queue (VMQ) creates a dedicated queue on the physical network adapter for each virtual network adapter that requests a queue. Packets that match a queue are placed in that queue. Other packets, along with all multicast and broadcast packets, are placed in the default queue for additional processing in the Hyper-V switch. You should enable VMQ for every virtual machine (and it is enabled by default). The new WS2012 feature, D-VMQ, will automatically assign the queues to the right VMs as needed based on their current activity.
Note Hyper-threaded CPUs on the same core processor share the same execution engine. RSS and VMQ will ignore hyper-threading.
Receive Side Coalescing
Receive Side Coalescing (RSC) improves the scalability of the servers by reducing the overhead for processing a large amount of network I/O traffic by offloading some of the work to network adapters.
For advanced configuration of these NIC hardware features Microsoft has released a great document on performance tuning guidelines for Windows Server 2012.
Dec 27th
In the quiet days between Christmas and New Year, I had some time to research how DPM2012 SP1 performed with protecting guests on a Windows Server 2012 Hyper-V cluster using CSV v2.0.
According to the SP1 release notes we can expect improved backup performance of Windows Server 2012 Hyper-V over CSV deployments with the following benefits:
Let me first point out that my setup is based on the following configuration:
Storage Server
Hyper-V Cluster Nodes
DPM 2012 SP1 Server
Network Configuration
Both on the iSCSI Target Server and Hyper-V cluster nodes, the two 10Gb network adapters have been teamed using a switch independent teaming mode with Hyper-V Port as the load balancing algorithm. A Hyper-V Extensible Switch is connected to the NIC Team and several virtual networks have been configured using the Converged Fabric method of Windows Server 2012. Each network has a minimum bandwidth Quality of Service configured on the virtual switch level. On the backend, the servers use a Virtual Connect Flex-10 interconnect.