Thursday, 23 February 2017

VDP FLR Does Not Expand Mount Points

File Level Recovery client or FLR is used when you want to restore certain files rather than an entire VM. To understand completely on login mode of FLR and how to perform a FLR you can refer this link here

Now, you might come across an issue where you are able to mount the required backup. However, you are unable to expand that mount point and view the drives and the folders. All you will see is the name of the VM image and an arrow that does not expand.

 Occasionally, you might see an error stating:

"Failed to get disks: Unable to browse as proxies are unavailable"

And you might also have persistent or intermittent login issues to FLR client (either basic login or advanced) with the below message.

There are few explanations for this, but the base cause is unresponsive proxy.

Case 1: When internal proxy is being used

By default VDP will be having an internal proxy which caters to the backup and restore tasks. If the internal proxy is down, then FLR authentication will fail persistently. You will not even have an opportunity to mount the backup.
Login in to the https://vdp-ip:8543/vdp-configure page and verify if the internal proxy is in a healthy state.

Restart the proxy service using the below command:
# service avagent-vmware restart

If this does not provide relief, disable and re-enable the internal proxy from the vdp-configure page.

Case 2: When multiple external proxies are used. 

There might also be a case where multiple external proxies (Maximum of 8) are being used. Out of these few might be responsive and the rest are unresponsive. In this case, the issue is intermittent. Let's say, you have 4 proxies, and 2 are responding and the rest 2 are not. When a login in request or a backup/restore comes in, it will utilize any one of these proxies. If the request goes to a proxy that is up and running, then your FLR login and expand restore point will work. If the request is handed to that unresponsive proxy, then the login / expand mount point fails.

To restart external proxy, login to the vdp-configure page, select the unresponsive proxy, then click the Gear Icon and click "Restart Proxy". Once all the proxies are confirmed to be ticked green, attempt the FLR again.

Hope this helps.

Wednesday, 8 February 2017

VDP Backup Fails With "Failed To Remove Snapshot"

There might be scenarios where you execute a backup for a virtual machine. It starts successfully, takes a snapshot successfully and completes the backup process, however at the very end, it fails to remove the snapshot for the VM. This would be seen persistently for one or more virtual machine.

At the very end of the backup job log you would see something like:

2017-02-06T11:53:39.552+04:00 avvcbimage Warning <16004>: Soap fault detected, Query problem, Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"Connection timed out"
Detail: connect failed in tcp_connect()"

2017-02-06T11:53:39.552+04:00 avvcbimage Error <17773>: Snapshot (snapshot-5656) removal for VM '[Suhas-Store-2] VM01/VM01.vmx' task failed to start

2017-02-06T11:53:39.552+04:00 avvcbimage Info <18649>: Removal of snapshot 'VDP-1486397576f70379edb62fb81285abbf68dfadc0bd0758ba83' is not complete, moref 'snapshot-5656'.

2017-02-06T11:53:39.552+04:00 avvcbimage Info <9772>: Starting graceful (staged) termination, Problem with the snapshot removal. (wrap-up stage)

If you see there is a Connection time out message once the snapshot remove call is handed down to the virtual machine. For this VDP-ID if you look into the vmware.log, you will notice the following:

2017-02-06T16:13:02.636Z| vmx| I125: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'VDP-1486397576f70379edb62fb81285abbf68dfadc0bd0758ba83': 55

2017-02-06T16:58:30.826Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-02-06T16:59:11.235Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-02-06T16:59:13.117Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

We see there is a lot of timeout occurring from the VMtools. And at the same time if you notice the datastore where this VM resides, you will see that it is on a NFS storage:

2017-02-06T11:13:40.355+04:00 avvcbimage Info <0000>: checking datastore type for special processing, checking for type VVOL, actual type = NFS

And if you see the mode of backup you see it is a hot-add mode of backup:

2017-02-06T11:13:40.337+04:00 avvcbimage Info <9675>: Connected with hotadd transport to virtual disk [Suhas-Store-2] VM01/VM01.vmdk

Now, when the VM is residing on NFSv3 there are issues with timeout due to NFS lock during snapshot consolidation. This KB explains the cause of it. The workaround here is to disable hot-add mode of backup and switch to NBD or NBDSSL.

1. SSH into the VDP appliance and browse to the below directory:
# cd /usr/local/avamarclient/var
2. Edit the avvcbimageAll.cmd using a vi editor and enter the below line:

3. Save the file and restart avagent using:
# service avagent-vmware restart
Post this the backup should complete successfully. Hope this helps.