Tuesday, May 30, 2017

Inducing Failures on a Virtual SAN cluster

The coolness of python scripts.

One of the many tools available for testing Virtual SAN is a buried python script called vsan.DiskFaultInjection.pyc. Located in the /usr/lib/vmware/vsan/bin directory, this utility can generate permanent or transient errors. Furthermore, it can emulate unplugging of disks.

Using the -h option (for help), an administrator can see the options available for this command. Only to be used pre-production, this script can generate failures to allow the user to understand what happens in such cases.

Below is an example of what happens when a capacity disk is affected by such permanent failure. In the case of a raid5 virtual machine, the virtual machine would continue to run. If enough servers and/or disks are available, the rebuilding of the date would take place immediately. The -p option is used for permanent errors and the -u option to unplug a disk.

Errors would be seen everywhere, notice the capture below.

The -c (clear option) is used to remove the permanent error. If using the -u option, simply rescan the storage via esxcfg-rescan -A.

Sunday, May 28, 2017

Installing the vCenter Appliance on a One Node vSAN Cluster

Virtual SAN 6.6 introduces a graphical method to install a vCenter appliance on a freshly installed esxi host in order to eventually install and configure v vSAN cluster. The required software versions are: ESXi 5310538 , VC 5318154

As you start a fresh install, notice that the latest version of vSphere 6.5 introduces a new option that allows to "Install on a new Virtual SAN cluster containing the target host". Proceed with a normal installation.

Select the vCenter option with the embedded PSC.

Select the esxi host that will host the new vCenter appliance.

Name the appliance and provide the root password.

Here is where you see the big difference. Notice the option at the end. Select it.

Name your future datacenter and cluster.

Specify which drives will be used for the Virtual SAN datastore. Indicate which drives will be used for cache and capacity.

The rest is pretty much the same, provide the network related information and continue  as usual.

Once the installation is done, the administrator can verify that the vCenter is in working order.

Once the vCenter appliance is running, log in, create the vmkernel port for Virtual SAN on that node and proceed as usual. Add the remaining servers and their vsan ip addresses and you are done.

Tuesday, May 9, 2017

When 100% cpu utilization is not really 100%

100% does not always mean 100%

Some people mistakenly look at tools inside of a guest operating system (for example,  the task manager) and when faced with 100% cpu utilization, they automatically believe that such virtual machine needs more vcpus.  Not necessarily. You really need to look at what is taken place on the host and compare the results. Remember that the guest OS is not aware of what is actually happening on the host. 

Notice that this case this virtual machine running Windows displays 100% cpu utilization. 

However, notice that the esxi host does not have any of the logical cpus at 100% and that virtual machine is NOT using 100% of the actual lcpu (core). Notice the %MLMTD column and %RDY.

In this case, the reason is due to a cpu limit. Notice the capture below. This virtual machine has the limit set to 50% of the maximum number of cpu mhz. Yet, the guest OS is not aware of this.