Wednesday, 3 August 2016

VDP Automatic Backup Verification Fails With VMware Tools Heartbeat Timeout



So recently I was working on a colleague's case where the backup verification, both manual and automatic were failing with VMware tools timeout. Now, there were multiple restore points being verified here and only one of them was failing. The VDP admin guide for verification states the below:

"Set the heartbeat timeout interval to its optimal value, depending on the environment. Note that some VMs may take longer to send and receive the VMware Tools heartbeat than others"


On the verification job logs, the following was observed. " >> " Marks the explanation of the event:


2016-07-14T21:52:10.541-02:00 avvcbimage Info <19672>: vmAction powerOnVM()
>> The VM power On function is called

2016-07-14T21:52:10.549-02:00 avvcbimage Info <17787>: Modifying E1000 adapter: Network adapter 1 to not Connect at Power On
>> The network adapter is disconnected on power on to avoid IP conflict 

2016-07-14T21:52:17.665-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' Power On task completed, moref=
>> VM is powered ON 

2016-07-14T21:52:17.665-02:00 avvcbimage Info <19674>: vmAction waitforToolRunning()
>> Waiting for heartbeat verification

2016-07-14T21:57:22.177-02:00 avvcbimage Warning <19675>: wait for toolrunning is timed out or cancelled
>> Verification timed out

2016-07-14T21:57:22.178-02:00 avvcbimage Info <19685>: vmAction poweroffVM()
>> power off VM function is called

2016-07-14T21:57:25.225-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' Power Off task completed, moref=
>> power off completed

2016-07-14T21:57:25.225-02:00 avvcbimage Info <19686>: vmAction deleteVM()
>> delete VM function is called

2016-07-14T21:57:28.295-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' deletion task completed, moref=
>> VM is deleted successfully

2016-07-14T21:57:28.311-02:00 avvcbimage Error <19702>: ABV failed
>> Automatic backup verification failed. 

So, there is a parameter that we need to add in the avvcbimageAll.cmd file to increase the heartbeat wait timeout for the appliance.

The steps would be:

1. Open a SSH to the VDP appliance
2. Browse to the following directory
# usr/local/avamarclient/var/avvcbimageAll.cmd
3. Open this file in a vi editor and add the following parameter:
--validate_script_max_in_min=20
4. Save the file and restart the avagent daemon using the below command:
service avagent-vmware restart

Post this, the VDP appliance will wait for 20 minutes for VM tools heartbeat to be received before it could time out.
If the tools validation complete before 20 minutes, then it will proceed to the next steps. So setting the timeout to a large value will not have any performance impact on the VMs which are not facing the VM tools timeout during verification.

Simple, yes!