Wednesday, 29 June 2016

VDP Backup Fails: Error In Creating Quiesce Snapshot


So there are many causes of failure for unable to quiesce a VM while taking a snapshot and this is one of them I came across while working on an issue. 

A brief summary what was happening here:
There was a virtual machine which was being backed up by VDP and it used to fail when VDP was initiating a quiesce snapshot on it. 

The backup job logs were showing the below:

2016-06-10T14:00:47.451+04:00 avvcbimage Info <0000>: vm-2412
2016-06-10T14:00:47.469+04:00 avvcbimage Info <0000>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task still in progress, sleep for 3 sec
2016-06-10T14:00:50.524+04:00 avvcbimage Info <0000>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task still in progress, sleep for 3 sec
2016-06-10T14:00:53.576+04:00 avvcbimage Warning <19733>: vSphere Task failed (quiesce, snapshot error=45): 'An error occurred while saving the snapshot: Failed to quiesce the virtual machine.'.
2016-06-10T14:00:53.576+04:00 avvcbimage Error <17775>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task creation encountered a quiesce problem
2016-06-10T14:00:53.576+04:00 avvcbimage Warning <0000>: The VM could not be quiesced prior to snapshot creation and this backup will not be used as a base for subsequent CBT backups if successful.

So we tried a couple of basic troubleshooting steps such as uninstalling VMware tools and reinstalling it, moving the virtual machine to a different ESXi host and a different datastore. Everything produced the same result, Failed to quiesce the virtual machine. So if I quiesce snapshot the virtual machine from vCenter, it completes successfully. Only quiesce from VDP failed. 

The next step was to check the vss writer status on the virtual machine. So from an elevated command prompt, I ran the below command
vssadmin list writers

All the VSS writers were in a healthy state without any error. Then, I proceeded to check the Event Logs for VSS issues, and there was none. 

Next, over to the vmware.log for this virtual machine, displayed the below logging:

2016-06-13T18:01:55.531Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e', deviceState=0, lazy=0, logging
=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=B986C80, cbData=323CE050
2016-06-13T18:01:55.739Z| vcpu-2| I120: ToolsBackup: changing quiesce state: IDLE -> STARTED
2016-06-13T18:01:57.770Z| vcpu-0| I120: Msg_Post: Warning
2016-06-13T18:01:57.770Z| vcpu-0| I120: [msg.snapshot.quiesce.vmerr] The guest OS has reported an error during quiescing.
2016-06-13T18:01:57.770Z| vcpu-0| I120+ The error code was: 5
2016-06-13T18:01:57.770Z| vcpu-0| I120+ The error message was: 'VssSyncStart' operation failed: IDispatch error #8451 (0x80042303)
2016-06-13T18:01:57.770Z| vcpu-0| I120: ----------------------------------------
2016-06-13T18:01:57.775Z| vcpu-0| I120: ToolsBackup: changing quiesce state: STARTED -> ERROR_WAIT
2016-06-13T18:01:59.797Z| vcpu-0| I120: ToolsBackup: changing quiesce state: ERROR_WAIT -> IDLE
2016-06-13T18:01:59.797Z| vcpu-0| I120: ToolsBackup: changing quiesce state: IDLE -> DONE
2016-06-13T18:01:59.797Z| vcpu-0| I120: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e': 0
2016-06-13T18:01:59.797Z| vcpu-0| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (40).
2016-06-13T18:02:08.245Z| vmx| I120: SnapshotVMX_Consolidate: Starting
2016-06-13T18:02:08.245Z| vmx| I120: SnapshotVMXConsolidateOnlineCB: nextState = 0 uid 0


So from vmware.log I saw logging regarding guest was unable to quiesce the virtual machine and VSS sync driver failed to start which led to failure of snapshot creation. 

Something was going on the VSS end. The VSS writers were good, so I proceeded to check the VSS providers. 
vssadmin list providers

This listed two providers with their own UUID. However, in the registry, there were three UUID entries. See the below image:


If you see here, 7669 is not a provider UUID. So, I took a backup of registry, and deleted this entry from the registry post which quiesce snapshots were successful from VDP. 

Hope this helps to someone out there facing this issue. 

Friday, 24 June 2016

Configure Syslog For VDP Appliance

So you want to forward VDP logs to a syslog server? Okay. Is it supported officially? No, VDP does not support forwarding its logs to a different machine or a log solution. Is a workaround available? Yes.

The VDP appliance is based on a SUSE box and we have the syslog-ng file which can be used to configure remote syslog for this machine.

In my lab, I have a VDP 6.1.2 appliance which is configured to a 6.0 U2 vCenter Server. The syslog collector solution I have here is VMware vRealize Log Insight. 

So how do we achieve this?

1. So William Lam has written a script which can be used to configure syslog for various VMware products. We will be using that script to configure syslog for VDP Server. Download the script from this link here.

2. So once this shell script is downloaded, copy this file into your VDP appliance root directory or /tmp directory. I used WinSCP to move this file to the appliance. 

3. Run the script using the below format:
# ./configurevCloudSuiteSyslog.sh vdp <Remote-Syslog-IP>

4. Once this command is executed browse to the following directory to view the syslog-ng configuration file.
# cd /etc/syslog-ng
#less syslog-ng.config 

5. The end of the file will have these lines appended:

# Configured using vCloud Suite Syslog Configuration Script by William Lam
source vdp {
       file("/space/avamar/var/log/av_boot.rb.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/space/avamar/var/log/dpnctl.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/space/avamar/var/log/dpnnetutil-av_boot.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/log/dpnctl.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/log/av_boot.rb.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/log/av_boot.rb.err.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/log/dpnnetutil-av_boot.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/avi/server_log/flush.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/avi/server_log/avinstaller.log.0" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/vdr/server_logs/vdr-server.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/vdr/server_logs/vdr-configure.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamar/var/flr/server_logs/flr-server.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/data01/cur/err.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamarclient/bin/logs/VmMgr.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamarclient/bin/logs/MountMgr.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamarclient/bin/logs/VmwareFlrWs.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
       file("/usr/local/avamarclient/bin/logs/VmwareFlr.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));
};

# Remote Syslog Host
destination remote_syslog {
      udp("192.168.10.6" port (514));

};

log {
        source(vdp);
        destination(remote_syslog);
};

The port used here is UDP and the IP 192.168.10.6 is my Log Insight server IP, this will of course vary for you. 

6. Over to the Log Insight, Under the Hosts section, you can see my VDP section added:


7. When I click vdpnew (Hostname of my VDP appliance), I see the logging being available:



Well, that's pretty much it. Just a workaround, not officially supported though. 

Monday, 6 June 2016

Unable To Connect VDP To Web Client: Failed To Validate vCenter certificate

So out of nowhere today, when I tried to connect the VDP appliance to web client, I got the following error message:

"Failed to validate the vCenter certificate. Either install or verify the certificate by using the vSphere Data configuration utility"

A screenshot of the message:


The vdr-configure logs displayed the following: 
Location: /usr/local/avamar/var/vdr/server_logs

2016-06-06 20:42:26,067 INFO  [http-nio-8543-exec-6]-rest.AuthenticationService: Vcentre Certificate validation had failed
2016-06-06 20:42:27,454 INFO  [Thread-15]-vi.ViJavaServiceInstanceProviderImpl: vcenter-ignore-cert ? true
2016-06-06 20:42:27,514 INFO  [Thread-15]-vi.ViJavaServiceInstanceProviderImpl: visdkUrl = https://192.168.1.1:443/sdk
2016-06-06 20:42:31,127 ERROR [Thread-15]-vi.ViJavaServiceInstanceProviderImpl: Failed To Create ViJava ServiceInstance owing to Remote VCenter connection error

So I was not using any custom certificate here. All default VMware certificate on my 5.5 vCenter. 

The resolution:

1. Restart all VDP services using the below command:

dpnctl stop all
dpnctl start all

2. When I logged into https://VDP-IP:8543/vdp-configure page I noticed the following:


I have an external proxy here, and it looked like the proxy VM was down. Restarted the proxy VM, yet the services and proxies status was not updated, which is when I proceeded to Step (3).

3. Re-register your VDP appliance to vCenter. You can follow this link here to get this task done.

Once the re-register (Same VDP user or a different user) was done (Without any changes to vCenter port or IP) the appliance was able to connect successfully to the web client.

Interesting enough, I was unable to find the cause of this, although I suspect that the connectivity between vCenter and VDP was broken. If there is something more detailed, do comment below. Always looking for VDP deep-dives.

That's pretty much it!

Thursday, 2 June 2016

Unable To Add A Client To A Backup Job In VDP: Already Registered

So today I was working on a case where a vCenter virtual machine and a PSC virtual machine were added to a backup job. One fine day, the PSC machine started behaving weirdly and had to be discarded. So the end solution was to re-deploy the PSC machine completely. The name of the faulty PSC machine was XYZ. This was renamed to XYZ-Old and a new PSC was deployed with the same old name, XYZ. 

The vCenter was back up and running good now. So, we went to the web client and connected to the VDP plugin and discarded the old backup job. So now, the backup tab had no backup jobs at all. Then, we went ahead and created a new backup job and added the new PSC machine, XYZ to this and clicked Finish. The screen grayed out for few seconds and the task finished with an error stating:

Backup job was created successfully, but the system failed to add the following clients to the job. ERROR: Host Named (<host-name>) is already registered as........



Why this is caused:

It is saying that the PSC, XYZ is already registered because it still had the old machine entry in the protected client list in the VDP backup client domain. This stale entry was preventing us to add the new machine with the old name, even though the old PSC machine was renamed and the backup job corresponding to it was removed. 

The solution:

1. Open a SSH to the VDP appliance and run the below command:
mccli client show --recursive

This will show all the protected clients associated with this VDP appliance.
Here I noticed that the XYZ PSC machine entry was still present. I need to remove this stale entry.

2. Run the below command to retire that VM from the protected list 
mccli client retire --domain=/vCenter-IP/VirtualMachines --name="VM-Name"

You will receive an output saying virtual machine is retired successfully. 
Once this was done, we went ahead and added the XYZ PSC virtual machine to the backup job successfully. 

Sharing is caring! You would not find much of VDP articles, believe me!!

Thank you!

Unable to vMotion Due To Difference In vDS Vendor Information Between Source And Destination vDS

So I was trying to vMotion my virtual machines between two vDS and it was failing with the following error: 

"The destination distributed switch has a different version or vendor than the source distributed switch"

A quick search brings up the following KB article which is great, however, there are certain cases where people do not want to upgrade the DVS to change the vendor ID. 

In that situation, there are two workarounds:

1. Changing the entry from the MOB page of the vCenter:

  • Open a browser and go to the following IP address to open the Managed Object Browser page:

https://<vcenter-IP/mob
  • Click Content
  • Under Name: rootFolder select the value: group-d1 (Datacenters)
  • Under Name: childEntity select the value data-center
  • Under Name: networkFolder select group-n* (network)
  • Under Name: childEntity locate the DV Switch and prior to the name of the DVSwitch you will have the moid of the Switch in the format dvs-*number*
  • Go to the following location once the moID is obtained. 
                                         https://vCenter-IP/mob/?moid=dvs-141
    • Click PerformDvsProductSpecOperation_Task on the page
    • Enter the following, For Fill operation: Upgrade
    • Fill productSpec:

                    <productSpec>
                         <vendor>VMware, Inc.</vendor>
                   </productSpec>
    • Click Invoke Method
    Go back to the vCenter Server and verify the Vendor Name now on your vDS. 

    2. If you are using a SQL database, you can run the below query: 
    update vpx_dvs set product_vendor='VMware, Inc.' where id=XXX(replace XXX with your dvs id);
    dvs ID can be found from the steps above. 

    However, before performing any steps, have snapshots, backup of your vCenter and database.