Tuesday, 27 March 2018

Embedded Replication Server Disconnected In vSphere Replication 5.8

A vSphere replication server comes with an embedded replication service to manage all the traffic and vR queries in addition to an option of deploying add on servers. In 5.8 or older vSphere replication servers, there are scenarios where this embedded replication server is displayed as disconnected. Since this embedded service is disconnected, the replications will be in RPO violation state as the replication traffic is not manageable.

In the hbrsrv.log on the vSphere replication appliance, located under /var/log/vmware, we see the below:

repl:/var/log/vmware # grep -i "link-local" hbrsrv*

hbrsrv-402.log:2018-03-23T11:25:24.914Z [7F70AC62E720 info 'HostCreds' opID=hs-init-1d08f1ab] Ignoring link-local address for host-50: "fe80::be30:5bff:fed9:7c52"

hbrsrv.log:2018-03-23T11:25:24.914Z [7F70AC62E720 info 'HostCreds' opID=hs-init-1d08f1ab] Ignoring link-local address for host-50: "fe80::be30:5bff:fed9:7c52"

So, this is seen when the VMs being replicated are on an ESX host which has IPv6 link local address enabled and the host is using an IPv4 addressing. 

The logs, here speak in terms on host MoID, so you can find out the host name from the vCenter MOB page, https://<vcenter-ip/mob

To navigate to the host MoID section:

Content > group-d1 (Datacenters) > (Your datacenter) under childEntity > group-xx under hostFolder > domain-xx (Under childEntity) > locate the host ID

Then using this hostname, disable the IPv6 on the referenced ESX:
> Select the ESXi 
> Select Configuration
> Select Networking
> Edit Settings for vmk0 (Management) port group
> IP Address, Un-check IPv6

Then reboot that ESX host. Repeat the steps for the remaining ESX too and then finally reboot the vSphere Replication Appliance. 

Now, there should no longer be link-local logging in hbrsrv.log and the embedded server should be connected allowing the RPO syncs to resume.

Hope this helps!

Friday, 16 March 2018

Unable To Make Changes On A Virtual Machine - The operation is not allowed in the current state of the datastore

Recently, while working on a case we noticed that we were unable to make changes to any virtual machine on a particular NetApp NFS datastore. We were unable to add disks, increase existing VMDKs or create virtual machines on that datastore.

The error we received was:

When we login to the host directly via UI client then we were able to perform all the above mentioned changes. This pointed out that, there seemed to be an issue with the vCenter server and not the datastore.

So looking into the vpxd.log for vCenter, this is what we saw:

2018-03-16T09:11:48.485Z info vpxd[7FB8F56ED700] [Originator@6876 sub=Default opID=VmConfigFormMediator-applyOnMultiEntity-93953-ngc:70007296-f3] [VpxLRO] -- ERROR task-252339 -- vm-44240 -- vim.VirtualMachine.reconfigure: vim.fault.InvalidDatastoreState:
--> Result:
--> (vim.fault.InvalidDatastoreState) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    datastoreName = "dc1_vmware01"
-->    msg = ""
--> }
--> Args:
--> Arg spec:
--> (vim.vm.ConfigSpec) {
-->    changeVersion = "2018-03-15T08:04:06.266218Z",
-->    name = <unset>,

To fix this we had to change the thin_prov_space_flag from 1 to 0 on the vCenter server database. In my case, the vCenter was an appliance (The process remains more or less same for Windows based vCenter as well) 

The fix:

1. Always have a snapshot of the vCenter server before making any changes within it.

2. Stop the vCenter server service using:
# service-control --stop vmware-vpxd

3. Connect to the vCenter database using:
# /opt/vmware/vpostgres/current/bin/psql -d VCDB -U vc

The password for the vCenter DB can be found in the below file:
/etc/vmware-vpx/vcdb.properties file

Run this below query to list out the datastores with this vCenter:
select * from vpx_datastore where name='<enter-your-datastore-name>';

The output would be something like:
   11 | dc1_vmware01        | ds:///vmfs/volumes/7a567de9-3e3c0969/                   | 5277655814144 | 1745005346816 | NFS  |      | <obj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns="urn:vim25" versionId="6.5" xsi:type="DatastoreCapability"><directoryHierarchySupported>true</directoryHierarchySupported><rawDiskMappingsSupported>false</rawDiskMappingsSupported><perFileThinProvisioningSupported>true</perFileThi
orted><vmfsSparseSupported>true</vmfsSparseSupported><vsanSparseSupported>false</vsanSparseSupported></obj> |             2 |            0 |                        30 |                0 |                    1 | automatic
  |                        90 |                  1

Too much information here, so we can filter it out using the below query
select id,thin_prov_space_flag from vpx_datastore;

Now you can see:

VCDB=> select id,thin_prov_space_flag from vpx_datastore;
  id  | thin_prov_space_flag
 5177 |                    0
 6449 |                    0
 5178 |                    0
   12 |                    0
  795 |                    0
  149 |                    0
  793 |                    0
   11 |                    1

Now, we need to change the thin_prov_space_flag from 1 to 0 for id=11

So run this query:
update vpx_datastore set thin_prov_space_flag=0 where id=<enter-your-id>;

Quit the database view using \q

Start the web client service using:
# service-control --start vmware-vpxd

Re-login back to the vCenter and now you should be able to make the necessary changes. 

Hope this helps!

Tuesday, 13 March 2018

Creating A VMFS Volume From Command Line

One of the alternate methods to formatting a new VMFS volume from the GUI is to create the same from the SSH of the ESXi host.

The process is quite simple and you can follow them as mentioned below:

1. Make sure the device is presented to the ESX and visible. If not, perform a Rescan Storage and check if the device is visible.

2. You can get the device identifier from the SSH of the ESX by navigating to:
# cd /vmfs/devices/disks

In my case, the device I was interested was mpx.vmhba1:C0:T3:L0

3. Next, we need to create a partition on this device and we no longer use fdisk for ESX as this is deprecated. So we will use partedUtil

So, we will create a partition (Number=1) at an offset of 128. The partition identifier is 0xfb which is a VMFS partition. 0xfb = 251. Along with this we will specify the ending sector.

To calculate ending sector:
The disk has 512 bytes per sector. In my case the device is 12 GB.
So number of bytes is 12884901887.99998
Dividing this by 512 is 25165824 sectors.

Do not use the complete sector value as it might complain out of bound sector value, so use one number less.

The command would then be:
# partedUtil set /vmfs/devices/disks/device-name "1 128 <ending-sector> 251 0"

Sample command:
# partedUtil set /vmfs/devices/disks/mpx.vmhba1:C0:T3:L0 "1 128 25165823 251 0"

A successful output would be:
0 0 0 0
1 128 25165823 251 0

4. Next, you format a VMFS volume using the vmkfstools -C command. 

The command would be:
# vmkfstools -C <vmfs-version> -b <block-size> -S <name-of-datastore> /vmfs/devices/disks/<device-name>:<partition-number> 

So the command for me would be (For a VMFS5 partition with 1 mb block size)
# vmkfstools -C vmfs5 -b 1m -S Test /vmfs/devices/disks/mpx.vmhba1:C0:T3:L0:1

A successful output would be:
Checking if remote hosts are using this device as a valid file system. This may take a few seconds...
Creating vmfs5 file system on "mpx.vmhba1:C0:T3:L0:1" with blockSize 1048576 and volume label "Test".
Successfully created new volume: 5aa7d4e8-1e99a608-f609-000c292cd901

Now, back in the GUI just do a refresh on the storage section and this volume is visible for the host. 

Hope this helps!

Friday, 2 March 2018

SRM CentOS 7.4 IP Customization Fails

If you are using SRM 6.0.x or SRM 6.1.x and you are trying to test failover a CentOS 7.4 machine with IP Customization the Customize IP Section of the recovery fails with the message

The guest operating system '' is not supported

In the vmware-dr.log on the DR site SRM, you will notice the following:

2018-03-02T02:10:43.405Z [01032 error 'Recovery' ctxID=345cedf opID=72d8d85a] Plan 'CentOS74' failed: (vim.fault.UnsupportedGuest) {
-->    faultCause = (vmodl.MethodFault) null, 
-->    property = "guest.guestId", 
-->    unsupportedGuestOS = "", 
-->    msg = ""
--> }

This is because the CentOS7.4 is not a part of supported guest in the imgcust binaries of the 6.0 release. For CentOS 7.4 customization to work, the SRM needs to be on a 6.5 release. In my case, I upgraded vCenter to 6.5 Update 1 and SRM to 6.5.1 post which the test recovery completed without issues.

If there is no plan for immediate upgrade of your environment, but would still like to have the customizations completing, then use this workaround.

If you look at the redhat-release file
# cat /etc/redhat-release

The contents are:
CentOS Linux release 7.4.1708 (Core)

So you remove this and then add:
Red Hat Enterprise Linux Server release 7.0 (Maipo)

Since RHEL 7.0 is supported in imgcust for 6.0 the test recovery completes fine. Hope this helps!