Saturday, 21 January 2017

VDP 6.1.3 Backups Hang At 92 Percent When Backing Up VMs On vSAN

Earlier, we had seen an issue where the VDP appliance crashes when we backup a virtual machine residing on a vSAN datastore. This was documented in this VMware KB article here.

On VDP 6.1.3 we use VDDK version 6.5, VixDiskLib (6.5) Release build-4241604. The earlier issue in VDDK 6.0 as mentioned in the above KB article was unable to mount a vSAN sparse delta disk on a proxy server or the VDP server (when using internal proxy) when it was residing on a non vSAN datastore.

In VDDK 6.5, the behavior was changed. The issue still exists but rather than crashing the VDP appliance, the VDDK library crashes causing the backups to fail. The workaround as per the same article was to:

1. If internal proxy was being used, then deploy an external proxy and place it on the vSAN datastore.
2. If space usage is not an issue and internal proxy is being used, move the entire VDP appliance to the vSAN datastore.

However, in some cases this did not work and the backups still used to fail, however the task and events in vSphere would show the progress at 92 percent and mccli activity show --active would not show any progress bytes.

I do not have a vSAN storage in my setup, the back-trace I am using is from couple of cases that I worked on and a community discussion thread which can be found here.

If you view the backup failure vimagew.log under /usr/local/avamarclient/var, you will notice the following back-trace:

2017-01-17T04:02:16.101+04:00 avvcbimage FATAL <5889>: Fatal signal 11 in pid 15065
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000006b93b1
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000006ba127
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000006ba44b
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000006ba53e
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 0000000000628810
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e56c90850
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5050193f
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5051c277
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5051c75e
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5052b142
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5052b678
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e56ecab73
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e56ecb123
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000004284bc
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000004291f9
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000004993c4
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 000000000070ac4c
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e56c88806
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e558cd9bd
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  ERROR: <0001> uapp::handlefatal: aborting program pid=15065, sig=11
2017-01-17T04:02:16.103+04:00 avvcbimage FATAL <5890>: handlefatal: Aborting program with code 176, pid=15065, sig=11
2017-01-17T04:02:16.103+04:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Exit: Unmatched Init calls so far: 1.

To fix this, change the protocol of backup to nbd. The steps would be:

1. SSH into VDP and browse to the below location:
# cd /usr/local/avamarclient/var
2. Edit the file called as avvcbimageAll.cmd using a vi editor

3. Add the below parameter into this file:

4. Save the file and restart the avagent service using:
 # service avagent-vmware restart

Now your backups should complete successfully. I will update this thread when a fix is released for this bug.
Hope this helps.