Thursday, 23 March 2017

Unable To Start Backup Scheduler In VDP 6.x

You might come across issues, where backup scheduler does not start when you try it from the vdp-configure page or the command line using dpnctl start sched. It fails with:

2017/03/22-18:58:53 dpnctl: ERROR: error return from "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mccli mcs resume-scheduler" - exit status 1

And the dpnctl.log will have the following:

2017/03/22-18:58:53 - - - - - - - - - - - - - - - BEGIN
2017/03/22-18:58:53 1,22631,Server has reached the capacity health check limit.
2017/03/22-18:58:53 Attribute Value
2017/03/22-18:58:53 --------- -------------------------------------------------------------------------------
2017/03/22-18:58:53 error     Cannot enable scheduler until health check limit reached event is acknowledged.
2017/03/22-18:58:53
2017/03/22-18:58:53 - - - - - - - - - - - - - - - END
2017/03/22-18:58:53 dpnctl: ERROR: error return from "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mccli mcs resume-scheduler" - exit status 1

If you run the below command you can see there are quite a few unacknowledged alarm that speaks about health check events not being acknowledged.

# mccli event show --unack=true | grep "22631"

1340224 2017-03-22 13:58:53 CDT WARNING 22631 SYSTEM   PROCESS  /      Server has reached the capacity health check limit.
1340189 2017-03-22 13:58:01 CDT WARNING 22631 SYSTEM   PROCESS  /      Server has reached the capacity health check limit.

To resolve this, acknowledge these events using the below command:

# mccli event ack --include=22631

Post this start the schedule either from GUI or command line using dpnctl start sched

Hope this helps.

Wednesday, 22 March 2017

Cowsay For Linux SSH Login

Cowsay has been around for quite a while now, but I came across it recently. I wanted to have a more interesting login for couple of data protection VMs and other CentOS boxes. If you follow this blog, you will know my domain is happycow.local, as the name "HappyCow" is quite fascinating, also it is my GamerTag on GTA5 (Hehe!).

Cowsay came to the rescue here to get this up and running in few steps. First, I had to get the cowsay package. You can download the package from here. SSH into your Linux box and have this package copied over.

Unzip the tar file by:
# tar -zxvf cowsay_3.03+dfsg2.orig.tar.gz
Post this, get into the directory cowsay-3.03+dfsg2 and run the installation script
# sh install.sh
Post this, create the below file:
# vi ~/.ssh/rc
Paste the content you want here for SSH login. My content was:
#!/bin/bash
clear
echo -e "Welcome to VDP \n If it is broken, redeploy" | cowsay
echo -e "\nYour system is been up for $(uptime | cut -d ' ' -f 4,5,6,7)"

Provide chmod u+x to rc file and then restart the sshd service
# service sshd restart
Log back into the terminal and you will see the "Zen-Cow" greeting you.


Looks fun!

Thursday, 16 March 2017

Automating Backup Cancellation From Command Line

This script allows you to mass cancel active backup jobs from command line of vSphere Data Protection Appliance.
#!/bin/bash
# This script cancels all active backup jobs from the command line
value=$(mccli activity show --active | cut -c1-16 | sed -e '1,3d')
if [ -z "$value" ]
then
echo "No active job to cancel"
else
for p in $value
do
mccli activity cancel --id=$p
done
fi

If you would like to cancel a set of backup jobs, like 13 jobs out of 20 running jobs, then you need to add those Job ID's to a file and then run the script to pull inputs from that file
#!/bin/bash
# This script cancels jobs from IDs provided in the id.txt file
while read p; do
mccli activity cancel --id=$p
done <id.txt

This script can be modified for other backup states like waiting-client. Just Grep, and cut, and remove the first three rows and feed the job ID's to a loop.

A much more interactive script to cancel "Active" "Waiting-Queued" and "Waiting-Client" jobs.

#!/bin/bash
# This block is for help parameters.
usage()
{
cat << EOF

Below are the available fields

OPTIONS:
   -h      Help
   -a      Active Job
   -w      Waiting Job
EOF
}
# This block saves status of active/waiting-client/waiting-queued backups
value=$(mccli activity show --active | cut -d ' ' -f 1 | sed -e '1,3d')
value_client=$(mccli activity show | grep -i "Waiting-Client" | cut -d ' ' -f 1)
value_queued=$(mccli activity show | grep -i "Waiting-Queued" | cut -d ' ' -f 1)

# This block does a flag input
while getopts "haw" option
do
        case $option in
                a)

        if [ -z $value ]
        then
                printf "No active jobs to cancel\n"
        else

                printf "Cancelling active jobs\n"
                for i in $value
                do
                mccli activity cancel --id=$i
                done

        fi
                ;;
                w)
                if [ -z $value_client ]
                then
                        echo $value_client
                        printf "No jobs in waiting client state\n"
                else
                        printf "Cancelling waiting clients\n"
                        for i in $value_client
                        do
                                mccli activity cancel --id=$i
                        done
                fi
                if [ -z $value_queued ]
                then
                        printf "No jobs in waiting queued state\n"
                else
                        printf "Cancelling queued clients\n"
                        for i in $value_queued
                        do
                                mccli activity cancel --id=$i
                        done
                fi
                ;;
h)
usage
;;
?)
printf "type -h for list\n"
;;
esac
done

Chmod a+x to the file for execute. Hope this helps!

Monday, 6 March 2017

Deploying vSphere Data Protection From govc CLI

In vSphere 6.5 GA there have been a lot of reported instances where we are unable to deploy any ova template. In this article, I will be talking in specific to vSphere Data Protection. As you know, vSphere 6.5 supports only 6.1.3 of VDP. If you try to deploy this via the Web Client, you will run into issues stating "Use a 6.5 version of web client". The workaround would be to use the OVF-tool to have this appliance deployed on a vCenter. Personally, I find ovf tool to be a bit challenging for first time users. A simple way would be to use the govc CLI to have this template deployed. William Lam has this written about this in a greater detail as to what this tool is all about and you can read it here

Here, I am using a CentOS machine to stage the deployment. 

1. The first step would be to download the appropriate govc binary. To access the list, you can visit the gitHub link here. Once you have the required binary listed out in that link, run the below command to download the binary on to your Linux box.
# curl -L https://github.com/vmware/govmomi/releases/download/v0.5.0/govc_linux_386.gz | gunzip -d > /usr/local/bin/govc

I am using govc_linux_386.gz as it is compatible with my CentOS, if you are using a different distro or a windows based system, choose accordingly.

You should see a below task to indicate the download is in progress:

[root@centOS /]# curl -L https://github.com/vmware/govmomi/releases/download/v0.5.0/govc_linux_386.gz | gunzip -d > /usr/local/bin/govc
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6988k  100 6988k    0     0   713k      0  0:00:09  0:00:09 --:--:-- 1319k

2. Once the download is done, provide execute permissions to this binary. Run this command:
# chmod +x /usr/local/bin/govc

Note: The download can be done to any required directory.

3. Verify the download is successful and govc is working by checking for the version:
# govc version

You should see:
govc v0.5.0

4. We will have to set few environment variables to define which host, storage and network this VDP virtual machine should be deployed on. 

export GOVC_INSECURE=1
export GOVC_URL=<Specify-ESXi-or-VC-FQDN>
export GOVC_USERNAME=<User-login-for-the-above>
export GOVC_PASSWORD=<Your-Password>
export GOVC_DATASTORE=<Datastore-Name>
export GOVC_NETWORK=<Network-Portgroup>
export GOVC_RESOURCE_POOL=<Resource-pool-if-you-have-one>

5. Next, we will have to create a json specification file to provide the details of the VDP appliance. Run the below command to view the specification:
# govc import.spec /vSphereDataProtection-6.1.3.ova | python -m json.tool

You will notice the below:


6. Redirect this output to a file so that we can edit and provide the necessary details. Run this command:
# govc import.spec /vSphereDataProtection-6.1.3.ova | python -m json.tool > vdp.json

7. Open the file in a vi editor and enter the details for networking. Remove the first line which says "Deployment": "small", If this is not done, your deployment will fail with:
" govc: ServerFaultCode: A specified parameter was not correct: cisp.deploymentOption " 

You should have something similar post you edit the file:


Save the file.

8. Lastly, we will be deploying the ova template using the import.ova function. And during this deployment we will make use of the json file we created which has the networking details and the environment variables where we specified the location for ova deployment. The command would be:
# govc import.ova -options=vdp.json /vSphereDataProtection-6.1.3.ova

You should now see a progress bar:
[root@centOS ~]# govc import.ova -options=vdp.json /vSphereDataProtection-6.1.3.ova
[06-03-17 14:14:57] Warning: Line 139: Invalid value 'Isolated Network' for element 'Connection'.
[06-03-17 14:15:03] Uploading vSphereDataProtection-0.0TB-disk1.vmdk... (1%, 5.1MiB/s)

9. Post this you can power on your VDP appliance and begin the configuration. 

Hope this helps. 

Friday, 3 March 2017

VDP Backup Fails With The Error "Failed To Attach Disks"

Earlier, we had seen a compatibility issue with VDP 6.1.3 on an ESXi 5.1 environment, where the backup used to fail with the message "Failed to attach disks". More about this can be read here
However, this is a very generic message and can mean different if we are running VDP on a compatible version.

In this case, the VDP was 6.1.3 on a 6.0 vSphere environment and the backup used to fail only when an external proxy was deployed. If the external proxy was discarded the backups utilized the internal VDP proxy and completed successfully.

With the external proxy, the logs are on the proxy machine under /usr/local/avamarclient/var
The backup job logs had the following entry:

2017-03-02T16:13:18.762Z avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_PrepareForAccess: Disable Storage VMotion failed. Error 18000 (Cannot connect to the host) (fault (null), type GVmomiFaultInvalidResponse, reason: (none given), translated to 18000) at 4259.

2017-03-02T15:46:19.092Z avvcbimage Info <16041>: VDDK:VixDiskLibVim: Error 18000 (listener error GVmomiFaultInvalidResponse).

2017-03-02T15:46:19.092Z avvcbimage Warning <16041>: VDDK:VixDiskLibVim: Login failure. Callback error 18000 at 2444.

2017-03-02T15:46:19.092Z avvcbimage Info <16041>: VDDK:VixDiskLibVim: Failed to find the VM. Error 18000 at 2516.

2017-03-02T15:46:19.093Z avvcbimage Info <16041>: VDDK:VixDiskLibVim: VixDiskLibVim_FreeNfcTicket: Free NFC ticket.

2017-03-02T15:46:19.093Z avvcbimage Info <16041>: VDDK:VixDiskLib: Error occurred when obtaining NFC ticket for [Datastore_A] Test_VM/Test_VM.vmdk. Error 18000 (Cannot connect to the host) (fault (null), type GVmomiFaultInvalidResponse, reason: (none given), translated to 18000) at 2173.

2017-03-02T15:46:19.093Z avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_OpenEx: Cannot open disk [Datastore_A] Test_VM/Test_VM.vmdk. Error 18000 (Cannot connect to the host) at 4964.

2017-03-02T15:46:19.093Z avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Open: Cannot open disk [Datastore_A] Test_VM/Test_VM.vmdk. Error 18000 (Cannot connect to the host) at 5002.

2017-03-02T15:46:19.093Z avvcbimage Error <0000>: [IMG0008] Failed to connect to virtual disk [Datastore_A] Test_VM/Test_VM.vmdk (18000) (18000) Cannot connect to the host

In the mcserver.log, the following was noted:

WARNING: com.avamar.mc.sdk10.McsFaultMsgException: E10055: Attempt to connect to virtual disk failed.
at com.avamar.mc.sdk10.util.McsBindingUtils.createMcsFaultMsg(McsBindingUtils.java:35)
at com.avamar.mc.sdk10.util.McsBindingUtils.createMcsFault(McsBindingUtils.java:59)
at com.avamar.mc.sdk10.util.McsBindingUtils.createMcsFault(McsBindingUtils.java:63)
at com.avamar.mc.sdk10.mo.JobMO.monitorJobs(JobMO.java:299)
at com.avamar.mc.sdk10.mo.GroupMO.backupGroup_Task(GroupMO.java:258)
at com.avamar.mc.sdk10.mo.GroupMO.execute(GroupMO.java:231)
at com.avamar.mc.sdk10.async.AsyncTaskSlip.run(AsyncTaskSlip.java:77)

The cause of this is due to an Ipv6 AAAA record. VDP does not support a dual stack networking and needs to have either IPv4 settings or IPv6 settings.

Resolution:
1. Login to the external proxy machine using root credentials
2. Run the below command to test DNS resolution:
# nslookup -q=any <vcenter-fqdn>

An ideal output should be as follows:

root@vdp-dest:~/#: nslookup -q=any vcenter-prod.happycow.local
Server:         10.109.10.140
Address:        10.109.10.140#53

Name:   vcenter-prod.happycow.local
Address: 10.109.10.142

But, if you see the below output, then you have an IPv6 AAAA record as well:

root@vdp-dest:~/#: nslookup -q=any vcenter-prod.happycow.local
Server:         10.109.10.140
Address:        10.109.10.140#53

Name:   vcenter-prod.happycow.local
Address: 10.109.10.142
vcenter-prod.happycow.local   has AAAA address ::9180:aca7:85e7:623d

3. Run the below command to set IPv4 precedence over IPv6:
echo "precedence ::ffff:0:0/96  100" >> /etc/gai.conf

4. Restart the avagent service using the below command:
# service avagent-vmware restart

Post this, the backups should work successfully. If the Ipv6 entry is not displayed in the nslookup and the backup still fails, then please raise a support request with VMware.

Thursday, 2 March 2017

VDP Backup Fails After A Storage vMotion

If you have a virtual machine added to a VDP backup job and then a Storage vMotion is performed, the next backup of this client might fail. The failure error you will see in the vSphere Client is:

VDP: An unexpected error occurred with the following error code: 10058



The backup jobs log would record the following:

2017-03-02T03:18:42.561-05:-30 avvcbimage Info <18664>: Login(https://vcenter-prod.happycow.local:443/sdk) Datacenter: 'HappyCow-DC'
2017-03-02T03:18:42.561-05:-30 avvcbimage Info <19728>:      - connected to 'VirtualCenter' - version: 'VMware vCenter Server 6.0.0 build-3634793',  apiVersion:'6.0'
2017-03-02T03:18:42.604-05:-30 avvcbimage Warning <16004>: Soap fault detected, Find VM - NOT ok, Msg:''
2017-03-02T03:18:42.605-05:-30 avvcbimage Error <0000>: [IMG0014] Problem opening vCenter:'HappyCow-DC', path:'[datastore1 (1)] VM-A/VM-A.vmx'.
2017-03-02T03:18:42.605-05:-30 avvcbimage Info <9772>: Starting graceful (staged) termination, Failed to log into web service. (wrap-up stage)
2017-03-02T03:18:42.606-05:-30 avvcbimage Warning <40657>: Login failed
2017-03-02T03:18:42.606-05:-30 avvcbimage Info <40654>: isExitOK()=208
2017-03-02T03:18:42.606-05:-30 avvcbimage Info <17823>: Body- abortrecommended(t)
2017-03-02T03:18:42.606-05:-30 avvcbimage Info <40658>: vmparams (vcenter-prod.happycow.local)
2017-03-02T03:18:42.606-05:-30 avvcbimage Info <40654>: isExitOK()=208
2017-03-02T03:18:42.615-05:-30 avvcbimage Info <18664>: Login(https://vcenter-prod.happycow.local:443/sdk) Datacenter: 'HappyCow-DC'
2017-03-02T03:18:42.616-05:-30 avvcbimage Info <19728>:      - connected to 'VirtualCenter' - version: 'VMware vCenter Server 6.0.0 build-3634793',  apiVersion:'6.0'
2017-03-02T03:18:42.651-05:-30 avvcbimage Warning <16004>: Soap fault detected, Find VM - NOT ok, Msg:''
2017-03-02T03:18:42.651-05:-30 avvcbimage Error <0000>: [IMG0014] Problem opening vCenter:'HappyCow-DC', path:'[datastore1 (1)] VM-A/VM-A.vmx'.
2017-03-02T03:18:42.658-05:-30 avvcbimage Info <18664>: Login(https://vcenter-prod.happycow.local:443/sdk) Datacenter: 'HappyCow-DC'

A similar scenario is discussed in this release notes here. Refer section "Backup of a VM fails when ESX is moved from one datacenter to other within same vCenter inventory (207375)"

This is a known issue that occurs in the VDP due to the vmx_path variable not being updated after a Storage Migrate operation.

In this case the initial datastore of my virtual machine was datastore1 (1) and then later it was moved to datastore2. However, post this migration the backup job still picked the vmx_path as datastore1 (1) and since the VM is no longer available there, the backup failed.

Couple of solutions:
The update of this cache value will take up-to 24 hours. So, it should automatically update after some time without any manual changes.

Another workaround is to perform another storage migrate of this virtual machine to any datastore.

To force sync the cache use the "mccli vmcache sync" command from the above release notes.

A sample command would be:
# mccli vmcache sync --domain=/vcenter-FQDN(or)IP/VirtualMachines --name=<VM-Name>

Post this, run a backup to verify if things are in place. Hope this helps!