Friday, 18 November 2016

VDP Backups Stuck In "Waiting-Client" State

I was recently working on a case where VDP 5.5.6 never started its backup jobs. When I select the backup job and select Backup Now, it shows the job has been started successfully, however, there is no task created at all for this.

And, when I login to SSH of the VDP appliance to view the progress of the backup the state is in "Waiting-Client". So, in SSH the below command is executed to view backup status:
# mccli activity show --active

The output:

ID               Status            Error Code Start Time           Elapsed     End Time             Type             Progress Bytes New Bytes Client   Domain
---------------- ----------------- ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- -------- --------------------------
9147944954795109 Waiting-Client    0          2016-11-18 07:12 IST 07h:40m:23s 2016-11-19 07:12 IST On-Demand Backup 0 bytes        0%        VM-1 /10.0.0.27/VirtualMachines
9147940920005609 Waiting-Client    0          2016-11-17 20:00 IST 18h:52m:51s 2016-11-18 20:00 IST Scheduled Backup 0 bytes        0%        VM-2 /10.0.0.27/VirtualMachines

The backups were always stuck in this status and never moved further. If I look at the vdr-server.log, I do see the job has been issued to MCS:

2016-11-18 14:33:06,630 INFO  [http-nio-8543-exec-10]-rest.BackupService: Executing Backup Job "Job-A"

However, If I look at the MCS logs, the mcserver.log, then I see the Job is not executed by MCS as MCS thinks that the server is in read-only state:

WARNING: Backup job skipped because server is read-only

If I run status.dpn, I see the Server Is In Full Access state. I checked the dispatcher status using the below command:

# mcserver.sh --status

You will have to be in admin mode to run the mcserver.sh script. The output of this script was:

Backup dispatching: suspended

This is a known issue on the 5.5.6 release of VDP. 

To fix this:

1. Cancel any existing backup jobs using the command:
# mccli acitivity cancel --id=<job-id>

The Job ID is the first section in the above mccli activity show command.

2. Browse the below location:
# cd /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

3. Open the mcserver.xml file in a vi editor.

4. Locate the parameter "stripeUtilizationCapacityFactor" and edit the value to 2.0.
**Do not change anything else in this file at all. Just the value needs to be changed**

5. Save the file and restart the MCS using the below command:
# dpnctl stop mcs
# dpnctl start mcs

6. Run the mcserver.sh to check the dispatcher status:
# mcserver.sh --status | grep -i dispatching

This time the output should be:

Backup dispatching: running

7. Run the backup again and it should start it immediately now.

If this does not work, raise a support request with VMware. Hope this helps.