Tuesday, 27 March 2018

Embedded Replication Server Disconnected In vSphere Replication 5.8

A vSphere replication server comes with an embedded replication service to manage all the traffic and vR queries in addition to an option of deploying add on servers. In 5.8 or older vSphere replication servers, there are scenarios where this embedded replication server is displayed as disconnected. Since this embedded service is disconnected, the replications will be in RPO violation state as the replication traffic is not manageable.

In the hbrsrv.log on the vSphere replication appliance, located under /var/log/vmware, we see the below:

repl:/var/log/vmware # grep -i "link-local" hbrsrv*

hbrsrv-402.log:2018-03-23T11:25:24.914Z [7F70AC62E720 info 'HostCreds' opID=hs-init-1d08f1ab] Ignoring link-local address for host-50: "fe80::be30:5bff:fed9:7c52"

hbrsrv.log:2018-03-23T11:25:24.914Z [7F70AC62E720 info 'HostCreds' opID=hs-init-1d08f1ab] Ignoring link-local address for host-50: "fe80::be30:5bff:fed9:7c52"

So, this is seen when the VMs being replicated are on an ESX host which has IPv6 link local address enabled and the host is using an IPv4 addressing. 

The logs, here speak in terms on host MoID, so you can find out the host name from the vCenter MOB page, https://<vcenter-ip/mob

To navigate to the host MoID section:

Content > group-d1 (Datacenters) > (Your datacenter) under childEntity > group-xx under hostFolder > domain-xx (Under childEntity) > locate the host ID

Then using this hostname, disable the IPv6 on the referenced ESX:
> Select the ESXi 
> Select Configuration
> Select Networking
> Edit Settings for vmk0 (Management) port group
> IP Address, Un-check IPv6

Then reboot that ESX host. Repeat the steps for the remaining ESX too and then finally reboot the vSphere Replication Appliance. 

Now, there should no longer be link-local logging in hbrsrv.log and the embedded server should be connected allowing the RPO syncs to resume.

Hope this helps!