Windows Azure Pack High Availability – Lessons Learned

Exactly one year ago I published a blog on configuring high availability for Windows Azure Services for Windows Server. A lot has happened since then. Windows Azure Pack was released shortly after that and we did numerous implementations of Windows Azure Pack for Service Providers and Enterprise Organizations. And boy, did we learn… I promised back then that if any changes to the procedure were required I’d update that blog. I decided to create a new blog altogether since there is a lot to discuss.

Windows Azure Pack is an application that consists of a web tier and a database tier. The web tier can be installed in a distributed configuration and allows for high availability through the use of load balancing. The database tier leverages clustering for high availability (with SQL AlwaysOn or SQL WFC). This has not changed compared to Windows Azure Services for Windows Server.

Authentication in Windows Azure Pack was subject to some serious changes. A lot has been blogged on updating the URLs for four Windows Azure Pack components:

  • Tenant Site
  • Tenant Authentication Site
  • Admin Site
  • Admin Authentication Site

Besides changes to the IIS configuration for these sites, they also have references in the database that need to be updated. We found out that there are more Windows Azure Pack components that need to be updated for high availability. If you only update these four components and you shut down the first configured admin server in the environment your services are still unavailable, despite the load balancing of these four components.

This blog is a guide for configuring load balancing and high availability for Windows Azure Pack. It will describe the steps to configure Windows Azure Pack after a default installation. It also describes the step to take on the data tier configured with SQL AlwaysOn after a default installation. If you need guidance to setup a SQL AlwaysOn Cluster you can use this blog.

1. Design

In this presentation at TechEd I described a distributed configuration scenario for Windows Azure Pack. In this design the Tenant roles and the Admin roles were divided.

WAP Design

The following components where installed on the roles.

The Tenant Roles

  • Tenant Site
  • Tenant Authentication Site
  • Tenant Public API

The Admin Role

  • Admin Site
  • Admin Authentication Site
  • Admin API
  • Tenant API
  • PowerShell API
  • Best Practice Analyzer

Although I will describes the steps to configure the Tenant Authentication Site and the Admin Authentication Site, please consider to drop the two authentication sites for production environments and use ADFS instead. The configuration steps in for ADFS are similar, so you can use the scripts from this blog for both scenarios.

2. High Availability

Windows Azure Pack stores its information in SQL databases, which can be made high available with clustering.  Load balancers can be used to enable load balancing and high availability for the web server tier.

After the initial configuration the following steps must be performed to prepare Windows Azure Pack for load balancing.

  1. Create DNS records
  2. Import trusted web server certificate
  3. Change the virtual directory bindings
  4. Define variables for PowerShell scripts
  5. Update the database with the new endpoints
  6. Update the federation endpoints for the authentication sites
  7. Update the resource provider endpoints
  8. Optional- Configure a webpage for the load balancer validation process

2.1 Create DNS records

For most humans (no, not you geeks) is it easier to remember a simple name than an IP Address. DNS is a naming system that translates names into IP addresses and the other way around. It allows for a simple naming convention for Window Azure Pack that tenants and admins can easily remember. The following records must be created in a DNS zone, that will form FQDNs and can then be used in the Windows Azure Pack configuration.

Name Record Type Value
manage A IP Address of Load Balancer Tenant Role
logon A IP Address of Load Balancer Tenant Role
api A IP Address of Load Balancer Tenant Role
admin A IP Address of Load Balancer Admin Role
adminlogon A IP Address of Load Balancer Admin Role

2.2 Import trusted web server certificate

After a default Windows Azure Pack installation, the websites are configured with self-signed certificates. To prevent users from being prompted with certificate warnings we need to create and import a web server certificate. To prevent users from being prompted with a security warning the certificate that is used must comply with the following rules.

  • The certificate must not be expired
  • The certificate issuer must be trusted by the client
  • The FQDN of the user request must match the Common Name in the certificate

Since multiple sites are running on the same server, we can comply with these rules by requesting

  • a web server certificate with subject alternate names, containing the FQDNs of all the URLs that you would like to provide with this certificate (can contain multiple domains)
  • a web server certificate with a wildcard, matching the domain of the FQDNs of all the URLs that you would like to provide with this certificate (can only contain one domain)

The certificate can be requested from

  • an internal Certificate Authority, if the clients are using machines that are member of the same domain infrastructure at the internal CA
  • a public CA, for all other scenarios

The certificate, containing the private key (.pfx), is then imported with the private key in the local machine personal certificate store on all Windows Azure Pack servers.

2.3 Change the virtual directory bindings

After a default Windows Azure Pack installation the websites are configured to use custom ports. To allow users to open the Admin Site and Tenant Site more easily it is recommended to change the default ports of the following components

  • Admin Site
  • Admin Authentication Site
  • Tenant Site
  • Tenant Authentication Site
  • Tenant Public API

A single server will host multiple virtual directories on the same SSL port (443). Previously this was not possible out of the box, due to the inability to work with host headers. Internet Information Server 8.5 provides Server Name Indication (SNI) that enables multiple virtual directories running on the same SSL port (443), differentiating through host headers.

Admin Site

Open IIS on the Windows Azure Pack Admin Role Server. Right click the virtual directory Mgmtvc-AdminSite and select Edit Bindings. Select the existing bindings and click edit. Change the port to 443. Specify the FQDN (see table in section 2.1) for the Admin Site. Enable the checkmark for Require Server Name Indication and select the Web Server Certificate (that was imported as described in section 2.2).

admin site

Submit the changes

Admin Authentication Site

Right click the virtual directory Mgmtvc-WindowsAuthSite and select Edit Bindings. Select the existing bindings and click edit. Change the port to 443. Specify the FQDN (see table in section 2.1) for the Admin Auth Site. Enable the checkmark for Require Server Name Indication and select the Web Server Certificate (that was imported as described in section 2.2).

admin auth site

Submit the changes.

Tenant Site

Open IIS on the Windows Azure Pack servers that hold the Tenant Roles. Right click the virtual directory Mgmtvc-TenantSite and select Edit Bindings. Select the existing bindings and click edit. Change the port to 443. Specify the FQDN for the Tenant Site. Enable the checkmark for Require Server Name Indication and select the Web Server Certificate (that was imported as described in section 2.2).

tenant site

Submit the changes

Tenant Authentication Site

Right click the virtual directory Mgmtvc-AuthSite and select Edit Bindings. Select the existing bindings and click edit. Change the port to 443. Specify the FQDN for the Auth Site. Enable the checkmark for Require Server Name Indication and select the Web Server Certificate (that was imported as described in section 2.2).

tenant auth site

Submit the changes

Tenant Public API

Right click the virtual directory MgmtSvc-TenantPublicAPI and select Edit Bindings. Select the existing bindings and click edit. Change the port to 443. Specify the FQDN for the Tenant Public API. Do not enable the checkmark for Require Server Name Indication and select the Web Server Certificate (that was imported as described in section 2.2).

Tenant Public API

Submit the changes.

FQDN references in the Windows Azure Pack databases

Windows Azure Pack also stores references of the websites in its databases. These values need to be updated. Most configuration blogs out on the internet (including the ones I did previously) will update the following Windows Azure Pack component references.

  • Admin Site
  • Admin Authentication Site
  • Tenant Site
  • Tenant Authentication Site

We found out that high availability of Windows Azure Pack depends on additional components that also need to be updated and added to the load balancers.

  • Admin API
  • Tenant API
  • Tenant Public API

When these components are not updated and the first configured Windows Azure Pack server that holds the Admin Roles is unavailable, the tenant is able to logon to the tenant site, but is presented with exclamation marks on each menu item and is unable to interact or even see his services.

This is caused by references that are hard coded to the FQDN of the first configured Windows Azure Pack server that holds the Admin Roles.

Additional resource providers also have references in the database. These resource providers need to be updated as well.

  • Market Place
  • Usage Service
  • Monitoring

The following steps will operate as a single script, but I have broken it down in individual sections to explain each part of the script. To run the cmdlets the Windows Azure Pack PowerShell API component is required, which can be installed through the Web Platform Installer. The Windows Azure Pack PowerShell API component must be updated to at least Update Rollup 1.

2.4 Define variables for PowerShell scripts

Open a PowerShell console on a Windows Azure Pack server that holds the Admin Roles and run the following cmdlets to create the variables that will be used throughout the next steps.

2.5 Update the database with the new endpoints

To updates FQDNs in the database, run the following cmdlets in the same PowerShell console session where the variables from the previous section where defined.

2.6 Update the federation endpoints for the authentication sites

The Admin Site and Tenant Site have a reference to the location of their corresponding authentication sites. These references must be updated with the new values. The authentication sites have a reference to the Admin Site and the Tenant Site. To updates these values run the following cmdlets in the same PowerShell console session where the variables from section 2.4 where defined.

2.7 Update the resource provider endpoints

Windows Azure Pack provides access to resources through resource provider endpoints. The Windows Azure Pack databases contain references to these resource provider endpoints. These references must also be updated with the new values. To get a list of all the resource providers, run the following cmdlets in the same PowerShell console session where the variables from section 2.4 where defined.

Each resource provider is configured with a two endpoints. An Admin endpoint and a Tenant endpoint. To get a list of these two values for all resource providers run the following cmdlets in the same PowerShell console session where the variables from section 2.4 where defined. (Thanks to Menno for the format).

 

To define the new endpoint value for a resource provider run the following cmdlet in the same PowerShell console session where the variables from section 2.4 where defined.

 

To updates the endpoint value for a resource provider run the following cmdlets in the same PowerShell console session where the variables from section 2.4 where defined.

 

Repeat the previous for the Resource provider Monitoring

 

And the resource provider UsageService

 

To get a list with the updated values run the following cmdlets in the same PowerShell console session where the variables from section 2.4 where defined.

 

2.8 Load Balancer

Now that all the services are configured with the new values we need to define these values in the load balancer. We can divide these services in two lists. A list of services that are directed at the web tier that hold the Windows Azure Pack admin roles.

FQDN Port Service
https://admin.hyper-v.nu 443 Admin Site
https://admin.hyper-v.nu 30018 Market Place
https://admin.hyper-v.nu 30020 Monitoring
https://admin.hyper-v.nu 30022 Usage Service
https://admin.hyper-v.nu 30004 Admin API
https://admin.hyper-v.nu 30005 Tenant API
https://adminlogon.hyper-v.nu 443 Admin Authentication Site

And a list of services that are directed at the web tier that hold the Windows Azure Pack tenant roles.

FQDN Port Service
https://manage.hyper-v.nu 443 Tenant Site
https://logon.hyper-v.nu 443 Tenant Authentication Site
https://api.hyper-v.nu 443 Tenant Public API

2.9 Load balancer validation process

The web tier in Windows Azure Pack is a stateless application. All the application data is stored in the database tier. A load balancer is used for load balancing the request to the Windows Azure Pack web tier. This can be requests from end users, administrators or even internal system requests to an API.

The hardware load balancer can use different methods for validating availability of a node in a configuration. A couple of examples.

  • Availability based on pings to a node.
  • Availability based on requesting an IIS web page
  • Availability based on application logic built in to an IIS web page.

Availability based on pings to a node.

In this configuration the load balancer sends pings to each node in a configuration. The load balancer marks a node as active as long as the pings reply. Requests are sent to that node. When the load balancer receives timeouts the node is marked as inactive and requests are no longer sent to this node.

This availability detection method is easy to implement, but does not detect issues with availability of the services running on the operating system.

Availability based on requesting an IIS web page

In this configuration the load balancer requests a page from IIS. If the page displays some value like “Server is up and running” the load balancer marks a node as active. Requests are sent to that node. When the load balancer receives a timeouts one the page, the node is marked as inactive and requests are no longer sent to this node.

This availability detection method requires additional steps to setup, but does detect issues with availability of the web services running on the operating system. To setup this configuration a web page is created with similar content to

Create a file named default.aspx and place it in the c:\inetpub\wwwroot\healthcheck folder. Configure the default website in IIS with an additional site binding on SSL port 443, with the same SSL certificate used for the other services in this tier. Configure the load balancer with the corresponding settings.

Healthcheck

Server Name Indication (SNI) is not configured. Most load balancers are not capable of using the SNI feature and is therefore probing to the website that is not configured with SNI. You can test the page by opening https://localhost/healthcheck/default.aspx on the Windows Azure Pack server.

Availability based on application logic built in to an IIS web page.

In this configuration the load balancer requests a page from IIS. Additional application logic is built in to the page that performs a query to the SQL database running on the data tier. The page will only display the value “Server is up and running” if all application logic is responds correctly. If the page displays the value “Server is up and running” the load balancer marks a node as active. Requests are sent to that node. When the load balancer receives a timeouts one the page or displays another value, the node is marked as inactive and requests are no longer sent to this node.

This availability detection method requires a more complex configuration with application logic, but does detect issues with availability of the web services and within the data tier.

3. SQL High Availability

When you have configured Windows Azure Pack and specified the SQL AlwaysOn DNS Listener for the SQL entry, the Windows Azure Pack databases are configured on Primary node of the SQL cluster. They must be added to the Availability group.

  • Logon to the primary SQL node
  • Windows Azure for Windows Server uses five databases
    • Microsoft.MgmtSvc.Config
    • Microsoft.MgmtSvc.PortalConfigStore
    • Microsoft.MgmtSvc.Store
    • Microsoft.MgmtSvc.Usage
    • Microsoft.MgmtSvc.WebAppGallery
  • Change the recovery model of all five databases to full and then make a backup of all five databases
  • Run the Add database to Availability Group wizard and add the five databases the Availability Group

The Availability Group should now contain all the Windows Azure Pack Databases.

SQL Availability Groups replicates databases but not user accounts. SQL Server provides an option called contained databases that will allow contained users to be created against instance level users. In case of failover, all contained databases and contained users are replicated. Windows Azure Pack does support the contained database feature but only for tenant databases (database as a Service) and not for the configuration databases of Windows Azure Pack itself. So besides adding the databases to the availability group we also need to replicate the SQL accounts used for Windows Azure Pack to the secondary node in the availability group.

Opening SQL Server management studio to compare Logins on both SQL Servers will show that the Windows Azure Pack configuration created a bunch of user accounts on the primary node that are not available on the secondary.

SQL Logins

The user accounts created by Windows Azure Pack are SQL Server accounts and not domain accounts. Even if we managed to create exactly the same accounts we would not know how to match the password that were created by the installer. It is possible to export the user accounts (domain and SQL accounts) to a plain text file with hashed password by following this knowledge base article.

Basically you create two Stored Procedures by running the script in the knowledge base article on your master database. Run the script on the primary node (which contains the local user accounts).

Open a new query and run EXEC sp_help_revlogin

The output from the Stored Procedure contains the domain and SQL accounts.

SQL Stored Procedure

Compare the output to the user accounts on the secondary SQL node and remove the lines that contain user accounts that are already available on the secondary node. Connect with the SQL Server Management Studio to secondary node and open a new query screen. Paste the output from the Stored Procedure we just adjusted and execute it. This will create the required local SQL accounts on that node with the correct passwords.

With this step complete we have configured our Windows Azure Pack environment for High Availability.

More Information

Configure SQL AlwaysOn Availability Groups in Windows Azure Pack

Windows Azure Pack high availability

Lessons Learned: Designing and Deploying the Windows Azure Pack in the Real-World

6 Comments

  1. Vytautas Kelmelis's Gravatar Vytautas Kelmelis
    June 6, 2014    

    I thing is mistakein script:
    # Admin Authentication Site
    $WinAuthSiteLB = “adminauth.hyper-v.nu”

    must be:
    # Admin Authentication Site
    $WinAuthSiteLB = “adminlogon.hyper-v.nu”

    • June 7, 2014    

      Hi Vytautas,
      Sharp observation there :) I adjusted the line to reflect the correct value. Thanks for your comment.
      Marc

      • Vytautas Kelmelis's Gravatar Vytautas Kelmelis
        June 13, 2014    

        It is Also hard to balance non standart ssl ports like 30005, 30018 and etc … Some HW LB not suporting high ports, or SNI on non 443 port. Prefered way will be named instances on standart 443 port. like usage.hyper-v.nu:443. and so on

  2. June 8, 2014    

    Hi Marc,

    this article is great! You helped me alot with my demo enviroment.

    Thanks Buddy!

  3. Steve Belton's Gravatar Steve Belton
    June 17, 2014    

    Marc, shouldn’t section 2.8 come before 2.4? Otherwise without configuring the Load Balancers you won’t be able to run Get-MgmtSvcToken on the auth site referenced by the new FQDN…

  1. Windows Azure Pack Tenant Public API | hyper-v.nu on June 11, 2014 at 20:12

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Our Sponsors





Powered by