Wednesday, March 26, 2014

How to build and embedded VSAN Cluster

Preface: Over the last few days, I decided to work on building an entire embedded vsan environment and document the details and possible issues one may encounter. The biggest problem I faced was forgetting about setting the port groups to Accept Promiscous mode on the real esxi server. A special thanks to my buddy J.K. for assisting me with that. I knew he was the right person to solve that issue for me.

Facts about VSANS:

32 esxi hosts maximum, 100 vms per host and 3200 vms max.
Each host can add up to 5 disk groups with 1 SSD and 7 disks. Max= 5 and 35
Fault Tolerance, storage DRS, deduplication and Metro Clustering are NOT supported.

1. First, I took one of my physical servers and I installed esxi on it. This server (really a pc) simply had 2 lcpus, a few network cards (only need one here) and 12GB of memory. Typical install, then I used the dcui and changed it's ip to 10.1.1.1/24 and called it realesxi1. I used ESXI 5.5u1.

2. Then, I connected to my esxi server using the vsphere client from my Windows 7 pc. I proceeded to create two new vswitches, called vSwitch1 and vSwitch2. These are internal vswitches (attached to no uplinks). These vswitches eventually were used for the vmotion network and vsan network for the embedded esxi servers. It is CRITICAL to remember to change Promiscous Mode to ACCEPT to allow for internal communication. These switches had VM networks, VM Network 2 and VM Network 3.


3. Now, it was time to create the three fake esxi servers. Using the vsphere client, I created three identical esxi servers (esxi01/10.1.1.101), (esxi02/10.1.1.102) and (esxi03/10.1.1.103). I gave each embedded esxi server 2 vcpus, 5 gbs of ram and four disks. 5 gb is the minimum for Virtual Sans. Also, 5gbs is only good for 1 disk group. You need more memory if you want more than one disk group. The four disks were 8gbs for the boot disk, 2x5gbs for the ssd disk (I only needed one per host really) and 50gbs for the spinning disk to be used by vsan. 


4. After the installation of each of the embedded esxi servers, I changed the disk type of each of the future ssd disks using the esxcli command. Here are the commands. I rebooted them afterwards.

  • esxcli storage nmp satp rule add --satp VMW_SATP_LOCAL --device mpx.vmhba1:C0:T1:L0 --option “enable_local enable_ssd”
  • esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T1:L0

Note: 

Another method to emulate ssd is to modify the .vmx file and add  "scsiX:Y.virtualSSD = 1"

5. Now, it was time to create the vcenter server. I used the linux-based vcenter appliance. I installed it via ovf on another real esxi server (10.1.1.2), powered it on and connected to it using port 5480. I gave him an ip address that was 10.1.1.150 so I used chrome and connected to it using the following url. https://10.1.1.150:5480. It's default user is root and it's default ip is vmware. The appliance defaults to dhcp, so if no dhcp servers are available, it defaults to 0.0.0.0. If that is the case, opened up the console of the vcenter appliance, log in as root and run the following script.

localhost:~ # cd /opt/vmware/share/vami

localhost:/opt/vmware/share/vami # ls *net*config*

vami_config_net      

localhost:/opt/vmware/share/vami # ./vami_config_net

 Main Menu

0)      Show Current Configuration (scroll with Shift-PgUp/PgDown)
1)      Exit this program
2)      Default Gateway
3)      Hostname
4)      DNS
5)      Proxy Server
6)      IP Address Allocation for eth0
Enter a menu number [0]:

6. Using port 5480, I decided to go with the equivalent of a simple install, using the embedded postgres db. Nothing special here, standard stuff.I waited about 10 minutes and my trusty vcenter was ready for production. 

7. To speed things up, I connected to the vcenter server with the vsphere client, logged in and I created a datacenter and added the three esxi servers. Although the capture shows a cluster, don't create it yet. Also, despite showing a linux vm, this was created later as well, this was not the vcenter server. 


8. Using the vsphere client, it was time to create the vmotion network. Notice carefully the relationship between the fake esxi servers and the real esxi server underneath. The three esxi servers used their second (fake) uplinks to connect to vSwitch1. The ips used were 10.1.2.101, 10.1.2.102 and 10.1.1.103.Although the capture shows yet another switch, that one was created with the web client. That will be the vsan network, which can't be created with the vsphere client.


Note: The vmotion network can also be created via cli. 5 commands needed. Examples below:

1. esxcfg-vswitch -a vSwitch10
2. esxcfg-vswitch -A vmotion vSwitch10
3. escxfg-vswitch -L vmnic10 vSwitch10
4. esxcfg-vmknic -a -i 100.100.100.1 -n 255.255.255.0 vmotion
5. vim-cmd hostsvc/vmotion/vnic_set vmk10

9. Using the web client, proceed to create new virtual switches for vsan capabilities. Do this for all three esxi servers. The ips used here were 10.1.3.101, 10.1.3.102 and 10.1.3.103. Make sure that you use the web client since the vsphere client does not have the checkbox for vsan traffic. By the time you are done, you can refer to the previous capture. So to recap, the fake esxi servers have three vswitches, the first one connects to the first one on the real esxi server (which is connected to vmnic0) and the other two connect to internal switches (no uplinks below). DVS can be used as well.

10. Using the web client (port 9443), create a cluster for vsan and create the disk groups necessary. If you select manual configuration, you need to create three disk groups (one per host) using the fake ssd disk and the fake spinning disk. By the time you are done, you should have a datastore that is roughly 150gb. This size does not count the space of the ssd disks, which are used for performance gains. While creating the cluster, I also enabled DRS (fully automatic) and HA (which uses the vsan network for heartbeats).





11. Now, it was time to create a virtual machine inside of the vsan cluster. I decided to create a linux vm with 1gb of RAM and an 8gb hard disk. Because of the new default vsan policies, the datastore ended up using roughly speaking 20gbs of space. Here are some captures.










Note: Notice that HA now can select a master without common datastores by using the vsan network.

12. Now, my favorite part. It's time to learn about new vsan related commands.

# esxcli vsan
Usage: esxcli vsan {cmd} [cmd options]

Available Namespaces:
datastore                          Commands for VSAN datastore configuration
network                            Commands for VSAN host network configuration
storage                             Commands for VSAN physical storage configuration
cluster                              Commands for VSAN host cluster configuration
maintenancemode             Commands for VSAN maintenance mode operation
policy                               Commands for VSAN storage policy configuration
trace                                Commands for VSAN trace configuration

# esxcli vsan datastore name get
   Name: vsanDatastore

# esxcli vsan network ipv4 add -i vmk2

# esxcli vsan network list
Interface
   VmkNic Name: vmk2
   IP Protocol: IPv4
   Interface UUID: aca23053-3cb8-151f-7a70-000c291ad309
   Agent Group Multicast Address: 224.2.3.4
   Agent Group Multicast Port: 23451
   Master Group Multicast Address: 224.1.2.3
   Master Group Multicast Port: 12345
   Multicast TTL: 5

# esxcli vsan network ipv4 remove -i vmk2

# esxcli vsan cluster get
Cluster Information
   Enabled: true
   Current Local Time: 2014-03-25T00:51:08Z
   Local Node UUID: 533058dc-3af3-85d9-7c45-000c291ad309
   Local Node State: MASTER
   Local Node Health State: HEALTHY
   Sub-Cluster Master UUID: 533058dc-3af3-85d9-7c45-000c291ad309
   Sub-Cluster Backup UUID:
   Sub-Cluster UUID: 52d13c54-4d78-af85-beb6-4923ac8532ec
   Sub-Cluster Membership Entry Revision: 0
   Sub-Cluster Member UUIDs: 533058dc-3af3-85d9-7c45-000c291ad309
   Sub-Cluster Membership UUID: 86d13053-5c4b-5c41-8a96-000c291ad309

# esxcli vsan storage list
mpx.vmhba1:C0:T2:L0
   Device: mpx.vmhba1:C0:T2:L0
   Display Name: mpx.vmhba1:C0:T2:L0
   Is SSD: true
   VSAN UUID: 52208001-b789-ed6e-c319-1043e82182d6
   VSAN Disk Group UUID: 52208001-b789-ed6e-c319-1043e82182d6
   VSAN Disk Group Name: mpx.vmhba1:C0:T2:L0
   Used by this host: true
   In CMMDS: true
   Checksum: 3279737555916297435
   Checksum OK: true

mpx.vmhba1:C0:T3:L0
   Device: mpx.vmhba1:C0:T3:L0
   Display Name: mpx.vmhba1:C0:T3:L0
   Is SSD: false
   VSAN UUID: 52712d14-388e-e085-50e9-6f9bf32beddb
   VSAN Disk Group UUID: 52208001-b789-ed6e-c319-1043e82182d6
   VSAN Disk Group Name: mpx.vmhba1:C0:T2:L0
   Used by this host: true
   In CMMDS: true
   Checksum: 13822921904604001925
   Checksum OK: true

# esxcli vsan policy getdefault
Policy Class  Policy Value
------------  --------------------------------------------------------
cluster       (("hostFailuresToTolerate" i1))
vdisk         (("hostFailuresToTolerate" i1))
vmnamespace   (("hostFailuresToTolerate" i1))
vmswap        (("hostFailuresToTolerate" i1) ("forceProvisioning" i1))

# grep -i vsan /etc/vmware/esx.conf
/vsan/network/child[0000]/agentPort = "23451"
/vsan/network/child[0000]/agentGroup = "224.2.3.4"
/vsan/network/child[0000]/ifaceUuid = "e7db3253-bb87-2641-fa23-000c291ad309"
/vsan/network/child[0000]/masterPort = "12345"
/vsan/network/child[0000]/masterGroup = "224.1.2.3"
/vsan/network/child[0000]/vmknic = "vmk0"
/vsan/network/child[0000]/ttl = "5"
/vsan/datastoreName = "vsanDatastore"
/vsan/hostDecommissionState = "decom-state-none"
/vsan/hostDecommissionMode = "decom-mode-none"
/vsan/enabled = "true"
/vsan/subClusterUuid = "52d13c54-4d78-af85-beb6-4923ac8532ec"
/vsan/faultDomainName = "Self"
/vsan/autoClaimStorage = "true"
/firewall/services/vsanvp/allowedall = "true"
/firewall/services/vsanvp/enabled = "true"

# esxcfg-advcfg -l | grep -i vsan
/VSAN/ClomRepairDelay [Integer] : Minutes to wait for absent components to come back before starting repair (REQUIRES clomd RESTART!)
/VSAN/ClomMaxComponentSizeGB [Integer] : Max component size in GB. Should be no larger than 80% of the typical HDD size in the VSAN cluster. (REQUIRES clomd RESTART!)
/VSAN/DomLongOpTraces [Integer] : Trace ops that take more than the specified option value in second(s)
/VSAN/DomBriefIoTraces [Integer] : Enables a brief set of per-IO DOM traces for debugging
/VSAN/DomFullIoTraces [Integer] : Enables full set of per-IO DOM traces; if disabled, IO op traces go to the per-op trace table
/VSAN/TraceEnableDom [Integer] : DOM tracing enabled
/VSAN/TraceEnableDomIo [Integer] : DOMIO tracing enabled
/VSAN/TraceEnableLsom [Integer] : LSOM tracing enabled
/VSAN/TraceEnableCmmds [Integer] : CMMDS/CMMDSResolver tracing enabled
/VSAN/TraceEnableRdt [Integer] : RDT tracing enabled
/VSAN/TraceEnablePlog [Integer] : PLOG tracing enabled
/VSAN/TraceEnableSsdLog [Integer] : SSDLOG tracing enabled

13. I also took some captures of the state of my esxi servers, both real and embedded. 

Captures from realesxi01 (the real esxi host hosting the three embedded esxi servers)




From realesxi2 (the host hosting the vcenter appliance)