What is vCenter High Availability?
vCenter High Availability is a new feature of vSphere 6.5. It only works with the linux vCenter appliance. By the time you are done, you end up with an active, a passive and a witness nodes. In 6.5, there is an RTO of about 5 minutes; which varies depending on loads and hardware. File level replication is done through Linux rsync (asynchronous). Native postgres replication handles VCdb and VUMdb replication (synchronous).
Requirements:
SSH needs to be enabled on the vCenter prior to implementing vCenter HA. It will fail otherwise. The heartbeat ip addresses need to be on a different subnet. I used one switch with the Management Network, VM Network and VM Network 2. Documentation states that your vCenter should have 4 vcpus and 16gbs of RAM (Small Configuration). Tiny was used in this case (2 vcpus and 10gbs). This works but it's not supported.
Here is the architecture illustrated:
Here are the configuration steps:
Step 1: Deploy a vCenter Server appliance. Create a cluster and enable HA. If you want to enable DRS, you will need 3 hosts. In this demo, a two node cluster was built with HA Only. A basic installation allows all 3 appliances to be placed on the same host, although you can modify this during the setup. Look at the specs of my original vCenter appliance. This is NOT supported. You really need 4 vcpus and 16gbs (small configuration, not tiny).
Step 2: Select your vCenter, click on the Configure tab and select vCenter HA. Click on Configure on the upper right corner. Two choices exist, Basic and Advanced. Basic is more automated but will NOT work if the vCenter appliance is not in the inventory (meaning a vm inside of a datacenter controlled by that same vCenter Server.
Step 3: Select the heartbeat ip address for the active vCenter. and specify the ip addresses of the backup vCenter and the heartbeat vCenter. The active vCenter had an ip of 10.1.1.150 and the heartbeat network was the 10.1.1.2 network.
Step 4: Now specify the IP addresses to be used by the passive node and the witness.
Step 5: Review the information and click on next and click on Finish and wait. Monitor the recent tasks. This deployment will take a while. This is what you should see by the time you are done.
Step 6: Verify the results. You should have all 3 vms.
Step 7: Select the active vCenter, click on Monitor and select vCenter HA. All three vms should be UP. Then, take a look at the Configure tab.
Step 8: Notice that the other two vms (passive and witness) only use about 1gb of ram. Using Basic deployment, the witness is created with only 1vcpus and 1gb, yet the passive one has the same number of vcpus and RAM as the active one.
Step 9: Test the failover. Select the vCenter appliance and click on Initiate Failover in the upper right corner. All three vms will continue running but the passive one will take over and start all the services.
Step 10: After about one minute, the Web Client will disconnect. If you reload the page, you should see something like this (browser dependent; I used Chrome).
Step 11: My failover took about 12 minutes since all three vms were in the same host and I only had 16gbs on my host. The hypervisor activated ballooning and gave the RAM formerly used by the original active node to the one taking over. This is what I saw once it succeeded. Notice how .151 is not the active one instead of .150 (the original one).