Wednesday, 29 June 2016

VDP Backup Fails: Error In Creating Quiesce Snapshot


So there are many causes of failure for unable to quiesce a VM while taking a snapshot and this is one of them I came across while working on an issue. 

A brief summary what was happening here:
There was a virtual machine which was being backed up by VDP and it used to fail when VDP was initiating a quiesce snapshot on it. 

The backup job logs were showing the below:

2016-06-10T14:00:47.451+04:00 avvcbimage Info <0000>: vm-2412
2016-06-10T14:00:47.469+04:00 avvcbimage Info <0000>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task still in progress, sleep for 3 sec
2016-06-10T14:00:50.524+04:00 avvcbimage Info <0000>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task still in progress, sleep for 3 sec
2016-06-10T14:00:53.576+04:00 avvcbimage Warning <19733>: vSphere Task failed (quiesce, snapshot error=45): 'An error occurred while saving the snapshot: Failed to quiesce the virtual machine.'.
2016-06-10T14:00:53.576+04:00 avvcbimage Error <17775>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task creation encountered a quiesce problem
2016-06-10T14:00:53.576+04:00 avvcbimage Warning <0000>: The VM could not be quiesced prior to snapshot creation and this backup will not be used as a base for subsequent CBT backups if successful.

So we tried a couple of basic troubleshooting steps such as uninstalling VMware tools and reinstalling it, moving the virtual machine to a different ESXi host and a different datastore. Everything produced the same result, Failed to quiesce the virtual machine. So if I quiesce snapshot the virtual machine from vCenter, it completes successfully. Only quiesce from VDP failed. 

The next step was to check the vss writer status on the virtual machine. So from an elevated command prompt, I ran the below command
vssadmin list writers

All the VSS writers were in a healthy state without any error. Then, I proceeded to check the Event Logs for VSS issues, and there was none. 

Next, over to the vmware.log for this virtual machine, displayed the below logging:

2016-06-13T18:01:55.531Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e', deviceState=0, lazy=0, logging
=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=B986C80, cbData=323CE050
2016-06-13T18:01:55.739Z| vcpu-2| I120: ToolsBackup: changing quiesce state: IDLE -> STARTED
2016-06-13T18:01:57.770Z| vcpu-0| I120: Msg_Post: Warning
2016-06-13T18:01:57.770Z| vcpu-0| I120: [msg.snapshot.quiesce.vmerr] The guest OS has reported an error during quiescing.
2016-06-13T18:01:57.770Z| vcpu-0| I120+ The error code was: 5
2016-06-13T18:01:57.770Z| vcpu-0| I120+ The error message was: 'VssSyncStart' operation failed: IDispatch error #8451 (0x80042303)
2016-06-13T18:01:57.770Z| vcpu-0| I120: ----------------------------------------
2016-06-13T18:01:57.775Z| vcpu-0| I120: ToolsBackup: changing quiesce state: STARTED -> ERROR_WAIT
2016-06-13T18:01:59.797Z| vcpu-0| I120: ToolsBackup: changing quiesce state: ERROR_WAIT -> IDLE
2016-06-13T18:01:59.797Z| vcpu-0| I120: ToolsBackup: changing quiesce state: IDLE -> DONE
2016-06-13T18:01:59.797Z| vcpu-0| I120: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e': 0
2016-06-13T18:01:59.797Z| vcpu-0| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (40).
2016-06-13T18:02:08.245Z| vmx| I120: SnapshotVMX_Consolidate: Starting
2016-06-13T18:02:08.245Z| vmx| I120: SnapshotVMXConsolidateOnlineCB: nextState = 0 uid 0


So from vmware.log I saw logging regarding guest was unable to quiesce the virtual machine and VSS sync driver failed to start which led to failure of snapshot creation. 

Something was going on the VSS end. The VSS writers were good, so I proceeded to check the VSS providers. 
vssadmin list providers

This listed two providers with their own UUID. However, in the registry, there were three UUID entries. See the below image:


If you see here, 7669 is not a provider UUID. So, I took a backup of registry, and deleted this entry from the registry post which quiesce snapshots were successful from VDP. 

Hope this helps to someone out there facing this issue.